SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation

Can schema-free graphs objectively evaluate open-ended search?

Can a directed graph with no preset structure capture the complexity of real search outputs while still enabling objective, fine-grained evaluation? This matters because existing evaluation methods trade objectivity for rigidity or richness for subjectivity.

Synthesis note · 2026-05-28 · sourced from Deep Research

Open-ended search evaluation faces a dilemma. Fixed-schema scoring — against items, sets, or tables — is objective and stable but cannot represent the complex, irregular knowledge structures real search produces. Free-text evaluation captures that richness but requires rubric design that is subjective and unstable. VibeSearchBench's resolution is a schema-free ground-truth knowledge graph: a directed graph carries no preset structure, so it can model arbitrary relationships relevant to the search intent, yet because it is a graph it still supports fine-grained, objectively verifiable matching. Each task pairs a user persona with such a graph and is scored through a graph-matching framework, escaping both horns of the dilemma.

The pattern generalizes beyond search: whenever the target output is structured but its structure cannot be fixed in advance, a graph ground truth plus graph-matching evaluation offers objectivity without rigidity. The cost is that constructing high-quality ground-truth graphs is labor-intensive — VibeSearchBench's 200 tasks were manually curated — and graph-matching introduces its own scoring choices. The counterpoint is that even with this method the best model reaches only 30.30 F1, partly because models produce structurally flat graphs; the evaluation is demanding precisely because it is faithful. This matters because it provides a reusable template for evaluating any open-ended generation task whose correct answer is a web of relations rather than a list.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 116 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a schema-free ground-truth knowledge graph enables objective evaluation of open-ended search