LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings

Paper · arXiv 2510.08338 · Published October 9, 2025

Consumer research costs companies billions annually yet suffers from panel biases and limited scale. Large language models (LLMs) offer an alternative by simulating synthetic consumers, but produce unrealistic response distributions when asked directly for numerical ratings. We present semantic similarity rating (SSR), a method that elicits textual responses from LLMs and maps these to Likert distributions using embedding similarity to reference statements. Testing on an extensive dataset comprising 57 personal care product surveys conducted by a leading corporation in that market (9,300 human responses), SSR achieves 90% of human test–retest reliability while maintaining realistic response distributions (KS similarity > 0.85). Additionally, these synthetic respondents provide rich qualitative feedback explaining their ratings. This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability.

Introduction. Established consumer research plays a central role in guiding corporations’ product development decisions [1–3], costing them billions of U.S. dollars globally every year [3]. Before investing in costly production and launch activities, companies routinely evaluate product concepts by surveying representative consumer panels. The most consequential question in such studies typically concerns purchase intent (PI), i.e., the likelihood that a respondent would buy the product if available [4–6]. Standard practice is to elicit purchase intent on a Likert scale, usually ranging from 1 (e.g., “definitely not”) to 5 (e.g., “definitely yes”) [7]. While widely used, this method faces well-known limitations: responses may be distorted by satisficing, acquiescence, and positivity biases, among other factors [8, 9]. Thus, traditional consumer panels often provide noisy measurements of demand, despite the considerable resources invested. Recent advances in LLMs raise the possibility of augmenting or partially replacing human survey panels with synthetic consumers.

Discussion / Conclusion. Our results show that LLM-based synthetic consumers can reproduce core outcomes of traditional consumer concept testing with surprising fidelity. In particular, the semantic similarity rating (SSR) approach yields both realistic distributions of Likert responses and robust product rankings that attain over 90% of the maximum correlation with human data, based on test–retest reliability. These findings suggest that many of the shortcomings of prior attempts at using LLMs as survey respondents—such as skewed distributions, over-positivity, or regression-to-the-mean—are not intrinsic limitations of LLMs, but rather artifacts of how responses were elicited. By shifting from direct elicitation of Likert responses to textual elicitation and SSR, we resolve many of these artifacts and unlock richer, more interpretable data. Importantly, no training data or fine-tuning on consumer responses was required. This makes the method widely applicable and inexpensive compared to training or calibration-heavy alternatives.

LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings

Synthesis notes that discuss concepts related to this paper