PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval- Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multiagent framework. This systematizes domainspecific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multiagent systems in domain-specific problemsolving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4’s performance, while effectively managing costs and ensuring data privacy.
Introduction. Advanced LLMs like GPT-4, enhanced with engineered prompts or Retrieval-Augmented Generation (RAG), show great potential in handling complex tasks across various domains (Wang et al., 2023; Nori et al., 2023; Zhang et al., 2024). However, deploying these models involves a critical tri-lemma of performance, cost, and data privacy. While domain-specific applications benefit from meticulously fine-tuned models (Ling et al., 2024), this approach incurs high costs due to the extensive resources needed for training and data acquisition. Alternatively, multi-agent systems have proven effective (Talebirad and Nadiri, 2023; Hong et al., 2023; Li et al., 2023; Wu et al., 2023; Wang et al., 2024b), especially in complex tasks with distinct and conflicting role requirements that challenge even advanced models. However, current implementations often involve dynamic and complex workflows, increasing costs and complicating reproducibility. Consequently, enterprises are shifting from proprietary models like GPT-4 to custom models that better balance cost, security, and performance.
Discussion / Conclusion. In this work, we introduced the PEER framework to address the tri-lemma of performance, cost, and data privacy in domain-specific applications. The framework balances flexibility and controllability through effective pattern design, meeting industrial demands for efficiency and costeffectiveness. We also developed industrial practices that use online data and user feedback for effective model tuning, promoting continuous model evolution. Our empirical studies, particularly in the financial question-answering domain, demonstrate that this approach achieves 95.0% of GPT-4’s performance while managing costs and safeguarding data privacy.