RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity

A research paper proposes a four-module system that uses Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) to generate reading content tailored to user queries and complexity preferences. Experiments with Meta LLaMA 4 Scout, LLaMA 3.1 8B Instant, and Google Gemma2 9B show that RAG improves relevance and groundedness by 26–35 percentage points across all models and prompting strategies.

iGEN Editorial

June 16, 2026

RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity

Generating personalized reading material that matches both a user's topic interest and their desired complexity level remains a challenge for content recommendation systems. A new research paper from Sooyeon Kim and Piotr S. Maciąg, posted on arXiv, presents a system that combines Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) to address this problem. The architecture demonstrates that grounding LLM output with real-time web retrieval can significantly boost the quality and relevance of generated reading passages.

System Architecture: Four Modules for Personalized Content

The proposed system is built around four modules: Input, RAG, Generation, and Judging. Users provide a question and specify a target reading complexity level. The RAG module retrieves relevant information from the Internet to enrich and ground the content produced by three modern LLMs: Meta LLaMA 4 Scout, LLaMA 3.1 8B Instant, and Google Gemma2 9B. The Generation module employs three prompting strategies — Chain-of-Thought, zero-shot, and few-shot — to create reading materials. Finally, a LLM-as-a-Judge module automatically evaluates answer quality and alignment with the desired readability level.

Module	Function
Input	Accepts user query and target complexity
RAG	Retrieves relevant information from the Internet
Generation	Uses LLMs with prompting strategies to produce content
Judging	LLM-as-a-Judge evaluates quality and readability alignment

Experimental Results: RAG Drives Measurable Improvements

The researchers conducted experiments to evaluate the system's performance. According to the arXiv paper, RAG consistently improved system performance across all models and prompting techniques. Specifically, RAG increased relevance and particularly groundedness by 26 to 35 percentage points. The paper notes that the RAG-augmented architecture effectively produces reading content tailored to user queries and desired textual complexity.

Model	Prompting Strategy	RAG Improvement (Groundedness)
Meta LLaMA 4 Scout	Zero-shot	+26–35 pp
LLaMA 3.1 8B Instant	Few-shot	+26–35 pp
Google Gemma2 9B	Chain-of-Thought	+26–35 pp

Implications for Enterprise Content Systems

For technology leaders evaluating AI-driven personalization, this research demonstrates a practical architecture that combines retrieval and generation. The use of a Judging module to automatically verify content alignment with readability targets offers a path toward automated quality assurance in content generation. While the study focuses on reading recommendations, the same architecture — RAG with LLMs and a quality checker — could be adapted for other domains such as technical documentation, training materials, or compliance communications.

As LLMs like Meta LLaMA and Google Gemma become more accessible, the ability to ground their output in real-time retrieved data becomes critical for enterprise adoption where accuracy and relevance are paramount. The 26–35 percentage point improvement in groundedness reported in the paper underscores the value of integrating retrieval mechanisms before generation.

Technical Stack and Open Questions

The system uses openly available LLMs and standard RAG techniques. The paper does not specify a particular retrieval database or vector store, but it notes that RAG retrieves information from the Internet. The three prompting strategies — Chain-of-Thought, zero-shot, and few-shot — are widely used in the LLM community. The LLM-as-a-Judge module automatically scores outputs, reducing the need for human evaluation in development cycles.

For enterprise buyers, key considerations include the latency of web retrieval, the cost of running multiple LLMs, and the accuracy of the Judge module. The paper does not provide latency or cost figures. Nonetheless, the architecture offers a template for building content recommendation systems that adapt to both topic and complexity preferences — a capability sought after in e-learning, knowledge management, and customer-facing support portals.

Sources:

RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity

System Architecture: Four Modules for Personalized Content

Experimental Results: RAG Drives Measurable Improvements

Implications for Enterprise Content Systems

Technical Stack and Open Questions

Recommended Stories

DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents

AI-Powered Tutorials: A New Era in Supply Chain Training

Maharashtra’s ₹500 crore AI agriculture policy targets data, traceability and farm advisory

Hugging Face CEO demands AI firms answer for rogue bot attacks