Travelers increasingly turn to large language model (LLM) assistants for hotel booking recommendations, making these systems gatekeepers of property visibility. However, according to a pre-specified algorithm audit by Baig, Mirza Samad Ahmed, Gillani, and Ali (arXiv, 2026), what moves these recommendations has been undocumented. The researchers conducted a randomized choice-based conjoint experiment across personas, prompt templates, and twelve open-weight and proprietary models to estimate the average marginal component effect of each reputation signal on the probability of recommendation.
Methodology and Design
The audit simulated five hotels whose attributes were independently randomized: guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position. The assistants then chose which hotel to recommend. By varying these signals across thousands of trials, the team could isolate the causal effect of each factor.
Key Findings: What Drives LLM Recommendations?
The results reveal a clear pattern of valence-and-price primacy, but also unexpected biases:
| Signal | Effect on Recommendation Probability |
|---|---|
| Guest rating (top vs. bottom) | +31.6 percentage points |
| Price (high vs. low) | -30.0 percentage points |
| Eco-certification | Overweighted relative to human norms |
| Management response | Ignored (no significant effect) |
| List position (content-free artifact) | Causal shift worth ~$12 per night |
Guest rating and price dominate, reproducing human decision-making patterns. However, eco-certification receives more weight than in typical human judgments, while management response—a signal of property engagement—is completely overlooked. Most notably, list position, which carries no real information, causally shifts recommendations; the researchers estimate this artificial effect is worth about $12 per night in perceived value.
Stated vs. Revealed Preferences
An additional analysis compared the models' stated reasons for recommendations with their actual revealed weights. The paper reports that stated reasons track revealed weights imperfectly, meaning the explanations LLMs provide for their choices may not fully align with the factors that truly influenced the decision.
Implications for AI Accountability
The audit grounds generative engine optimization and the accountability of AI infomediaries in causal evidence. For enterprise technology buyers, the findings underscore the need to scrutinize AI recommendation systems for hidden biases. While the study focuses on hotel selection, similar methodology can apply to any domain where LLMs act as recommender agents—such as vendor selection, logistics providers, or trade services. The transparency of these models' decision-making processes is critical for trust and fairness.
According to the preprint, the work provides a framework for auditing algorithmic reputation signals, with implications for platform design and regulation. As AI assistants become ubiquitous in commerce, understanding the causal drivers of their recommendations becomes a business imperative.