INQUIRING LINE

Can other RAG hyperparameters like chunk size be learned through generator feedback?

This explores whether the trick behind learnable RAG — using the generator's success as a training signal — extends from the parameters researchers have already made adaptive (document count, ordering, retrieval timing) to ones still usually hand-set, like chunk size.


This reads the question as asking whether chunk size belongs to the growing family of RAG settings that no longer have to be fixed by hand, but can instead be tuned by feedback from whether the final answer came out right. The corpus doesn't contain a paper that learns chunk size directly — but it contains the exact machinery that would make it possible, applied to almost every neighboring knob, which is the more useful thing to know.

The clearest precedent is DynamicRAG, which throws out the fixed top-k assumption entirely and trains a reranker as a reinforcement-learning agent whose reward is the quality of the generator's output — letting it learn both how many documents and in what order to pass along, per query Can document count be learned instead of fixed in RAG?. That's the template: take a setting normally frozen at config time, wire the generator's success back as a reward, and let the system calibrate it to query complexity. Chunk size is structurally the same kind of knob, so there's no in-principle reason the same loop couldn't tune it.

The deeper version of this idea is CLaRa, which doesn't just adjust a discrete count but propagates the generator's loss back through continuous document representations, so retrieval learns to favor documents that *actually help answer the question* rather than ones that merely look similar Can retrieval learn what actually helps answer questions?. This matters for chunking specifically: chunk size is really a proxy for "how much context is the useful unit?" — and CLaRa's whole point is that usefulness and surface relevance diverge. A system optimizing chunk boundaries on generator feedback would be closing that same gap from a different angle. StructRAG pushes the idea even further out, learning to pick the *form* of knowledge — table, graph, chunk, catalogue — based on query demands via a trained router Can routing queries to task-matched structures improve RAG reasoning?. Once you can learn the structure type, learning the granularity within "chunk" is a smaller step.

Two caveats the corpus surfaces. First, *what* feedback signal you use matters: process-level supervision on intermediate retrieval steps substantially beats rewarding only the final answer, because a single end-of-pipeline reward is a noisy teacher for a multi-part decision Does supervising retrieval steps outperform final answer rewards?. Chunk size affects retrieval early, so it might learn better from step-level than outcome-only feedback. Second, there's a counter-current worth respecting: simple calibrated uncertainty estimates often beat elaborate learned adaptive-retrieval schemes at a fraction of the cost Can simple uncertainty estimates beat complex adaptive retrieval?. The lesson isn't "don't learn chunk size" — it's that a learned knob has to clear the bar of a good cheap heuristic before it's worth the training.

The thing you might not have known you wanted: across this corpus, "hyperparameter" is quietly becoming the wrong word. Document count, ordering, retrieval triggering, knowledge structure, and the retriever-generator boundary itself How should systems retrieve and reason with external knowledge? are all migrating from things-you-set to things-the-system-learns from how well it answered. Chunk size is simply the next obvious resident of that list — and the techniques to move it there already exist in pieces.


Sources 6 notes

Can document count be learned instead of fixed in RAG?

DynamicRAG trains a reranker as an RL agent using LLM output quality as reward, learning to adjust both document ordering and count for each query. Two-phase training with behavior cloning followed by RL with generator feedback enables the agent to calibrate document selection to query complexity.

Can retrieval learn what actually helps answer questions?

CLaRa propagates generator loss back through continuous document representations, allowing retrievers to optimize for documents that actually improve answers rather than surface similarity. The gap between relevance and usefulness closes when retrieval receives direct feedback from generation success.

Can routing queries to task-matched structures improve RAG reasoning?

StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.

Does supervising retrieval steps outperform final answer rewards?

Fine-grained feedback on intermediate retrieval steps significantly boosts agentic RAG performance compared to final-answer-only rewards. DPO trained with both positive and negative step feedback outperforms PPO and single-direction training by directly contrasting good and bad retrieval chains.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

How should systems retrieve and reason with external knowledge?

Research shows retrieval should adapt dynamically rather than follow fixed patterns, reasoning and retrieval must integrate closely, and embedding-based retrieval has fundamental limits requiring architectural alternatives.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a RAG systems researcher evaluating whether chunk size can be learned via generator feedback. The question remains open: no published work directly optimizes chunk size this way, but the machinery exists elsewhere in the RAG stack.

What a curated library found — and when (dated claims, not current truth):

Findings span May 2024–November 2025. Key precedents:
• DynamicRAG (2025-05) trains a reranker as an RL agent using generator output quality as reward, learning both document count *and* order per query — establishing the template for feedback-driven hyperparameter tuning.
• CLaRa (2025-11) propagates generator loss through continuous document representations, optimizing for *usefulness* rather than surface relevance — directly addressing why chunk size matters (defining the useful context unit).
• StructRAG (2024-10) learns to route queries to task-appropriate knowledge *forms* (table, graph, chunk, catalogue) via trained routing, showing that knowledge *granularity itself* can be learned, not just retrieval count.
• Process-level supervision (implicit in RAG-Gym 2025-02, UR2 2025-08) substantially outperforms end-of-pipeline outcome reward for multi-step decisions like retrieval.
• Uncertainty-based heuristics (2025-01) often match or beat learned adaptive retrieval at lower cost, raising the bar for when learned tuning is worth the training overhead.

Anchor papers (verify; mind their dates):
• DynamicRAG, arXiv:2505.07233 (2025-05)
• CLaRa, arXiv:2511.18659 (2025-11)
• StructRAG, arXiv:2410.08815 (2024-10)
• RAG-Gym, arXiv:2502.13957 (2025-02)

Your task:
(1) RE-TEST EACH CONSTRAINT. For DynamicRAG's template, has feedback-driven knob tuning been extended to *continuous* hyperparameters (chunk size, overlap) in the last 6 months, or does it remain confined to discrete choices (count, order)? Probe whether CLaRa's continuous optimization path or StructRAG's routing have since spawned work that directly tunes chunking. Separately, confirm whether process-level supervision has become standard in learned-RAG systems, or whether outcome-only reward still dominates published work. Finally, verify the cost/benefit claim: do newer systems measure training cost against uncertainty-heuristic baselines?
(2) Surface the strongest work from the last 6 months that *contradicts* the idea that learned hyperparameters outperform fixed ones, or shows chunk-size learning backfiring (e.g., overfitting, instability across domains).
(3) Propose two research questions that assume the regime may have moved: (a) Can chunk-size learning transfer across corpora and query distributions, or is it query-collection-specific? (b) Does end-to-end chunk tuning via generator feedback interact adversely with other learned RAG components (reranker, retriever), or do they compose cleanly?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines