Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

Paper · Source

We focus on the cross-domain contextdependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ ryanzhumich/sparc_atis_pytorch.

Introduction. Generating SQL queries from user utterances is an important task to help end users acquire information from databases. In a real-world application, users often access information in a multi-turn interaction with the system by asking a sequence of related questions. As the interaction proceeds, the user often makes reference to the relevant mentions in the history or omits previously conveyed information assuming it is known to the system. Therefore, in the context-dependent scenario, the contextual history is crucial to understand the follow-up questions from users, and the system often needs to reproduce partial sequences generated in previous turns. Recently, Suhr et al. (2018) proposes a context-dependent text-to-SQL model including an interaction-level encoder and an attention mechanism over previous utterances. To reuse what has been generated, they propose to copy complete segments from the previous query.

Lines of inquiry this paper opens 24

Research framings built by reading the notes related to this paper — the questions it feeds into.

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why do conversational queries drift away from what triggered them?

How should conversational agents balance goal-driven initiative with user control?

How should dialogue recommender systems manage conversation history and state?

How does AI assistance affect human cognitive development and reasoning autonomy?

How does anomalous state of knowledge affect user self-assessment?

Which computational strategies best support reasoning in language models?

Do language models learn genuine linguistic structure or just surface patterns?

Why do generative and discriminative language model procedures disagree?

Why does verification consistently lag behind AI generation?

Why can generative verifiers scale verification compute more effectively than fixed-output discriminative models?

How do prompt structure and constraints affect model instruction reliability?

How do neural networks separate factual knowledge from reasoning abilities?

What is the difference between procedural knowledge and factual retrieval in reasoning?

What makes specific clarifying questions more effective than generic ones?

Why do question types determine retrieval and decomposition strategy in QA?

What memory architectures best support persistent reasoning across extended interactions?

What makes structured memory schemas more stable than freeform text summaries?

What dimensions of recommendation quality do standard metrics miss?

What makes a standardized artifact unit measurable across different research domains?

How should memory consolidation strategies shape agent performance over time?

What drives the choice between storing raw episodes versus abstracted rules?

How should retrieval systems optimize for multi-step reasoning during inference?

Why do fixed-size document chunks break complex procedural question answering?

What role does compression play in language model capability and generalization?

Why does keeping full key-value blocks matter more than compressing them?

Why do multi-turn conversations degrade AI intent and coherence?

How does single-turn training undermine multi-turn strategic dialogue?

Can single-axis benchmarks accurately predict agent deployment success?

What specific metrics distinguish single-turn versus multi-turn collaboration success?

How does AI-generated content transformation affect public discourse quality?

How does AI lose correct information under conversational persuasive pressure?

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

Synthesis notes from this paper's topics 8

Lines of inquiry this paper opens 24