iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
India, Canada Agree to Conclude Free Trade Pact Talks by Year-End After G7 Meeting Oil Prices Dip Near $70 per Barrel as Middle East Turmoil Cools After US-Iran Deal New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline New Hybrid Neuro-Symbolic Framework Achieves 78.1% Accuracy in Irony Detection Without Fine-Tuning UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion New Legal QA Benchmark Exposes Hallucination Risks in Statute-Centric AI Retrieval CrossMaps: Real-Time Open-Vocabulary Semantic Mapping for Autonomous Rover Navigation AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy India, Canada Agree to Conclude Free Trade Pact Talks by Year-End After G7 Meeting Oil Prices Dip Near $70 per Barrel as Middle East Turmoil Cools After US-Iran Deal New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline New Hybrid Neuro-Symbolic Framework Achieves 78.1% Accuracy in Irony Detection Without Fine-Tuning UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion New Legal QA Benchmark Exposes Hallucination Risks in Statute-Centric AI Retrieval CrossMaps: Real-Time Open-Vocabulary Semantic Mapping for Autonomous Rover Navigation AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy
Home ›› Technology ›› Ai ›› Llms ›› AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

A new arXiv study by Sara Fish uses the EC 2025 'Stable Menus of Public Goods' problem to test AI research workflows. It finds that providing human intuition in prompts and using multi-turn interactions can improve LLM performance, but the LLM is still slightly less effective than a first-year PhD student.

iG
iGEN Editorial
June 17, 2026
AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

A recent study posted on arXiv (arxiv.org/abs/2606.16989) examines the effectiveness of large language models (LLMs) in research workflows within Economics and Computer Science (EconCS). Using an open problem from the EC 2025 paper "Stable Menus of Public Goods" as a testbed, the study investigates three specific questions: whether providing human intuition in the prompt helps, whether automated multi-turn interaction helps, and whether an LLM outperforms a first-year PhD student.

arXiv social sharing icon.

Using an open problem from the EC 2025 paper "Stable Menus of Public Goods" as a testbed, we conduct experiments to understand the effectiveness of different AI-for-EconCS research workflows. Specifically, we study three questions: Does providing human intuition in the prompt help? Does automated multi-turn interaction help? And, does an LLM outperform a first-year PhD student? Regarding the first two questions, we provide evidence for the following workflow suggestions: (1) prompting with human intuition can encourage the LLM to have better "taste", (2) multi-turn workflows help when the pipeline encourages "ambitious" steps. Regarding the third question, using an unpublished manuscript written by the paper's senior authors prior to collaborating with the first-year PhD student, we compare the effectiveness of the LLM with that of the first-year PhD student, and find that the LLM is slightly less effective.

Key Findings on AI Research Workflows

The study provides two main workflow suggestions. First, prompting with human intuition can encourage the LLM to have better "taste", according to the paper. Second, multi-turn workflows help when the pipeline encourages "ambitious" steps, the study reports.

LLM vs. First-Year PhD Student

When comparing the LLM's performance to that of a first-year PhD student, the study finds that the LLM is slightly less effective. The comparison used an unpublished manuscript written by the paper's senior authors prior to collaborating with the first-year PhD student, the study notes.

Implications for AI-Facilitated Research

The findings suggest that while LLMs can assist in research, they are not yet a replacement for human researchers, even at the first-year PhD level. The study highlights the importance of incorporating human intuition and structured multi-turn interactions to improve LLM performance in complex problem-solving tasks.

Research Question Finding
Does providing human intuition in the prompt help? Yes; promotes better "taste" in LLM outputs.
Does automated multi-turn interaction help? Yes; effective when the pipeline encourages ambitious steps.
Does an LLM outperform a first-year PhD student? No; LLM is slightly less effective.

Reddit sharing icon on arXiv.


Sources:

Keep Reading

Recommended Stories

'Dangerous' AI Models: Enterprise Leaders Must Prepare for Broad Availability Technology

'Dangerous' AI Models: Enterprise Leaders Must Prepare for Broad Availability

Anthropic took its Claude Fable 5 and Mythos 5 AI models offline after a US government export-control directive. Experts warn that similar dangerous capabilities will be broadly available from other companies within months, urging enterprise leaders to prepare now.

June 16, 2026
AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation Technology

AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation

Researchers propose AL-GNN, a continual graph learning framework that uses analytic learning to avoid replay buffers and backpropagation. It achieves 10% higher average performance on CoraFull, reduces forgetting by over 30% on Reddit, and cuts training time by nearly 50% while preserving data privacy.

June 16, 2026
LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation Technology

LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation

A new arXiv paper introduces SciAidanBench, a benchmark for measuring the scientific creativity of large language models. The research finds that LLM capabilities are jagged—uneven across tasks and domains—but that this jaggedness can be harnessed through ensemble methods to produce superior scientific ideas.

June 16, 2026
First Model-Free Universal AI Agent Proved Asymptotically Optimal in General Reinforcement Learning Technology

First Model-Free Universal AI Agent Proved Asymptotically Optimal in General Reinforcement Learning

Researchers introduced Universal AI with Q-Induction (AIQI), the first model-free agent proven asymptotically ε-optimal in general reinforcement learning. Unlike previous model-based optimal agents like AIXI, AIQI performs induction over action-value functions. The proof also establishes optimality for Self-AIXI without ad-hoc assumptions.

June 16, 2026