Algorithm Audit Reveals LLM Hotel Recommendations Biased by Eco-Labels, Ignore Management Responses

A pre-specified algorithm audit of 12 large language models (LLMs) found that guest rating and price dominate hotel recommendations, while eco-certification is overweighted and management response is ignored. List position—a content-free artifact—also causally shifts recommendations, worth about $12 per night. The study grounds generative engine optimization and the accountability of AI infomediaries.

iGEN Editorial

June 16, 2026

Algorithm Audit Reveals LLM Hotel Recommendations Biased by Eco-Labels, Ignore Management Responses

Travelers increasingly turn to large language model (LLM) assistants for hotel booking recommendations, making these systems gatekeepers of property visibility. However, according to a pre-specified algorithm audit by Baig, Mirza Samad Ahmed, Gillani, and Ali (arXiv, 2026), what moves these recommendations has been undocumented. The researchers conducted a randomized choice-based conjoint experiment across personas, prompt templates, and twelve open-weight and proprietary models to estimate the average marginal component effect of each reputation signal on the probability of recommendation.

Methodology and Design

The audit simulated five hotels whose attributes were independently randomized: guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position. The assistants then chose which hotel to recommend. By varying these signals across thousands of trials, the team could isolate the causal effect of each factor.

Key Findings: What Drives LLM Recommendations?

The results reveal a clear pattern of valence-and-price primacy, but also unexpected biases:

Signal	Effect on Recommendation Probability
Guest rating (top vs. bottom)	+31.6 percentage points
Price (high vs. low)	-30.0 percentage points
Eco-certification	Overweighted relative to human norms
Management response	Ignored (no significant effect)
List position (content-free artifact)	Causal shift worth ~$12 per night

Guest rating and price dominate, reproducing human decision-making patterns. However, eco-certification receives more weight than in typical human judgments, while management response—a signal of property engagement—is completely overlooked. Most notably, list position, which carries no real information, causally shifts recommendations; the researchers estimate this artificial effect is worth about $12 per night in perceived value.

Stated vs. Revealed Preferences

An additional analysis compared the models' stated reasons for recommendations with their actual revealed weights. The paper reports that stated reasons track revealed weights imperfectly, meaning the explanations LLMs provide for their choices may not fully align with the factors that truly influenced the decision.

Implications for AI Accountability

The audit grounds generative engine optimization and the accountability of AI infomediaries in causal evidence. For enterprise technology buyers, the findings underscore the need to scrutinize AI recommendation systems for hidden biases. While the study focuses on hotel selection, similar methodology can apply to any domain where LLMs act as recommender agents—such as vendor selection, logistics providers, or trade services. The transparency of these models' decision-making processes is critical for trust and fairness.

According to the preprint, the work provides a framework for auditing algorithmic reputation signals, with implications for platform design and regulation. As AI assistants become ubiquitous in commerce, understanding the causal drivers of their recommendations becomes a business imperative.

Sources:

Algorithm Audit Reveals LLM Hotel Recommendations Biased by Eco-Labels, Ignore Management Responses

Methodology and Design

Key Findings: What Drives LLM Recommendations?

Stated vs. Revealed Preferences

Implications for AI Accountability

Recommended Stories

Everyone Is Freaking Out About OpenAI and Anthropic’s Race for Dominance

Chinese Open AI Models Rival Silicon Valley, Spark US Policy Backlash

New Method LUCID Detects Hallucinations in LLM-Based Knowledge Graph Reasoning

Research Shows Code Execution Outperforms Natural Language for AI Algorithmic Reasoning