Fine-tuning Large Language Model for Automated Algorithm Design

Paper · arXiv 2507.10614 · Published July 13, 2025

The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a first step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rankbased (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments, conducted on Llama-3.2-1B-Instruct and Llama- 3.1-8B-Instruct, span three distinct algorithm design tasks. Results suggest that finetuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem.

Introduction. The emerging field of automated algorithm design (AAD) with large language models (LLMs) has attracted growing attention for its potential to automate the synthesis of expert-level algorithms (Liu et al., 2024c;b; Romera-Paredes et al., 2024). A prevailing paradigm in this space combines LLMs within search strategies, where the LLM focuses on generating candidate algorithms and the search procedures controls the quality and refinement of these algorithms in an iterative manner (Zhang et al., 2024). This framework has lead to notable advances across a spectrum of algorithmic development tasks, including combinatorial optimization (Romera-Paredes et al., 2024; Liu et al., 2024b; Ye et al., 2024), Bayesian optimization (Yao et al., 2024), and black-box optimization (van Stein & B ̈ack, 2024), to name a few.

Discussion / Conclusion. This paper presents a preliminary study on the necessity and effectiveness of fine-tuning an LLM tailored to the algorithm design task. We adopt DPO and introduce a diverse-aware rank-based sampling strategy, which balances training data diversity and quality for effective finetuning on algorithm design tasks. Our experiments on three tasks demonstrate the effectiveness of the fine-tuned LLM across different algorithm design scenarios, including: algorithm design with LLM-based random sampling, algorithm design with LLM-based iterative search, and generalizing to related algorithm design tasks. Notably, Llama-3.2-1B- Instruct trained with our method matches the performance of Llama-3.1-8B-Instruct. Moreover,

Fine-tuning Large Language Model for Automated Algorithm Design

Synthesis notes that discuss concepts related to this paper