Pricing decisions often rely on demand forecasts that fail to capture uncertainty, especially for products with limited historical sales data but rich product information like text descriptions and images. A new research paper introduces a virtual population model powered by large language models (LLMs) to simulate aggregate demand for counterfactual prices, producing both mean predictions and uncertainty estimates.
The Business Problem
Traditional demand simulation models struggle when products are described by unstructured data such as product descriptions and images. Decision makers need not only point forecasts but also uncertainty estimates for alternative price points. According to the paper "LLM-Powered Virtual Population for Demand Simulation and Pricing" by Huang and Wang on arXiv, the model represents exposed customers as draws from a finite mixture of customer personas. For each persona, product, and candidate price, an LLM elicits a persona-level purchase probability using both structured persona information and unstructured product information. These probabilities are aggregated through calibrated mixture weights to form a predictive distribution of aggregate demand.
How the LLM-Powered Virtual Population Works
The framework is designed for settings where products are described by rich unstructured information such as text descriptions and images. The LLM generates persona-level purchase probabilities, which are then combined using calibrated mixture weights. This produces a full predictive demand distribution rather than a single point forecast. The resulting simulator can evaluate counterfactual prices under various pricing objectives, including expected revenue and risk-aware criteria such as conditional value at risk (CVaR).
Technical Architecture
The model uses an LLM to elicit purchase probabilities for each persona-product-price combination. The personas are drawn from a finite mixture, and the LLM processes both structured persona attributes and unstructured product data (text and images). The mixture weights are calibrated to historical data. The paper tested the framework on an online H&M fashion dataset with product descriptions and images. The calibrated LLM-based simulator achieved the best overall predictive performance among the models considered and supported sample-efficient pricing decisions.
Business Outcomes
The key advantage is that the model enables managers to compare candidate prices, quantify demand uncertainty, and choose prices targeting either average-case revenue or risk-aware objectives like CVaR. The table below summarizes the different pricing objectives the simulator can handle:
| Pricing Objective | Description | Business Use Case |
|---|---|---|
| Expected Revenue | Maximizes average revenue across demand scenarios | Standard pricing for established products |
| Conditional Value at Risk (CVaR) | Minimizes worst-case losses in tail of revenue distribution | Risk-averse pricing for new product launches |
By producing a full predictive demand distribution rather than only a point forecast, it enables managers to make more informed decisions, particularly for products with limited historical demand data but rich product information.
Competitive Context and Future Applications
The framework provides a practical way to use LLMs as demand simulators. Unlike traditional models that require extensive historical transaction data, this approach leverages product descriptions and images, making it suitable for new product introductions or fashion items with short life cycles. The paper notes that the model supports sample-efficient pricing decisions, meaning it can generate useful insights from relatively few data points. This could be valuable for industries like fashion retail, consumer electronics, and any sector where product attributes change rapidly.
The research, available on arXiv, demonstrates a novel application of LLMs in pricing and demand simulation. While the study focuses on fashion, the authors suggest the methodology could extend to other domains. The work was conducted by researchers Huang and Wang, with no specific institutional affiliation mentioned in the abstract.
As enterprises increasingly seek AI-driven pricing tools, this LLM-powered virtual population model offers a data-efficient alternative to traditional conjoint analysis or demand forecasting methods. It addresses the gap between rich product information and limited demand data, a common challenge in omnichannel retail.