AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

A new arXiv study by Sara Fish uses the EC 2025 'Stable Menus of Public Goods' problem to test AI research workflows. It finds that providing human intuition in prompts and using multi-turn interactions can improve LLM performance, but the LLM is still slightly less effective than a first-year PhD student.

iGEN Editorial

June 17, 2026

AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

A recent study posted on arXiv (arxiv.org/abs/2606.16989) examines the effectiveness of large language models (LLMs) in research workflows within Economics and Computer Science (EconCS). Using an open problem from the EC 2025 paper "Stable Menus of Public Goods" as a testbed, the study investigates three specific questions: whether providing human intuition in the prompt helps, whether automated multi-turn interaction helps, and whether an LLM outperforms a first-year PhD student.

arXiv social sharing icon.

Using an open problem from the EC 2025 paper "Stable Menus of Public Goods" as a testbed, we conduct experiments to understand the effectiveness of different AI-for-EconCS research workflows. Specifically, we study three questions: Does providing human intuition in the prompt help? Does automated multi-turn interaction help? And, does an LLM outperform a first-year PhD student? Regarding the first two questions, we provide evidence for the following workflow suggestions: (1) prompting with human intuition can encourage the LLM to have better "taste", (2) multi-turn workflows help when the pipeline encourages "ambitious" steps. Regarding the third question, using an unpublished manuscript written by the paper's senior authors prior to collaborating with the first-year PhD student, we compare the effectiveness of the LLM with that of the first-year PhD student, and find that the LLM is slightly less effective.

Key Findings on AI Research Workflows

The study provides two main workflow suggestions. First, prompting with human intuition can encourage the LLM to have better "taste", according to the paper. Second, multi-turn workflows help when the pipeline encourages "ambitious" steps, the study reports.

LLM vs. First-Year PhD Student

When comparing the LLM's performance to that of a first-year PhD student, the study finds that the LLM is slightly less effective. The comparison used an unpublished manuscript written by the paper's senior authors prior to collaborating with the first-year PhD student, the study notes.

Implications for AI-Facilitated Research

The findings suggest that while LLMs can assist in research, they are not yet a replacement for human researchers, even at the first-year PhD level. The study highlights the importance of incorporating human intuition and structured multi-turn interactions to improve LLM performance in complex problem-solving tasks.

Research Question	Finding
Does providing human intuition in the prompt help?	Yes; promotes better "taste" in LLM outputs.
Does automated multi-turn interaction help?	Yes; effective when the pipeline encourages ambitious steps.
Does an LLM outperform a first-year PhD student?	No; LLM is slightly less effective.

Reddit sharing icon on arXiv.

Sources:

AI-Enabled Progress in Public Goods: LLMs Slightly Less Effective Than First-Year PhD Students, Study Finds

Key Findings on AI Research Workflows

LLM vs. First-Year PhD Student

Implications for AI-Facilitated Research

Recommended Stories

'Dangerous' AI Models: Enterprise Leaders Must Prepare for Broad Availability

AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation

LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation

First Model-Free Universal AI Agent Proved Asymptotically Optimal in General Reinforcement Learning