IMPACTeen Dataset Provides New Resource for Detecting Manipulation in Teen Communication

Researchers have released IMPACTeen, a dataset of 1,021 textual social influence scenarios in adolescent contexts. Annotated by teenagers, parents, psychologists, communication experts, and teachers, it supports training AI models to detect manipulation, persuasion, and their consequences. The dataset, available in Polish and English, aims to advance research in social influence detection and language model safety.

iGEN Editorial

June 17, 2026

IMPACTeen Dataset Provides New Resource for Detecting Manipulation in Teen Communication

As AI-powered communication tools become more prevalent, the ability to detect manipulation and persuasion in digital interactions—especially those involving teenagers—has become a critical concern. The newly released IMPACTeen dataset, described in a paper on arXiv, provides a structured resource for training and evaluating language models on these subtle yet consequential social dynamics.

The dataset contains 1,021 texts covering social influence scenarios across interpersonal, media-based, and digital settings in an adolescent context. It includes 5,100 individual annotation records with gold labels for social influence techniques.

Multi-Perspective Annotation

A key feature of IMPACTeen is its five-perspective annotation approach. Each text was independently annotated by representatives from five distinct groups:

Perspective	Role
Teenagers	Provide youth-centric interpretation
Parents	Offer familial context
Psychologists	Assess psychological impact
Communication Experts	Analyze rhetorical strategies
Teachers	Evaluate educational implications

This multi-dimensional annotation covers influence presence, techniques, intentions, consequences, resistance, reactions, and annotation confidence. The diversity of perspectives allows researchers to study annotator disagreement and its implications for model training.

Construction and Validation

The dataset was built through constrained LLM generation, followed by a two-step human editing and validation phase aimed at ensuring youth-context realism. According to the paper, this process was designed to produce texts that authentically reflect real adolescent communication patterns.

The resource was created in Polish and is accompanied by a corresponding English version, supporting cross-lingual modeling research.

Potential Applications

IMPACTeen supports research in several areas critical to enterprise AI systems:

Social influence detection: Training models to identify when a message is attempting to persuade or manipulate.
Language model safety: Evaluating whether LLMs generate or amplify manipulative language.
Annotator disagreement analysis: Understanding how different stakeholders perceive the same communication.
Cross-lingual modeling: Adapting detection systems across languages.

For enterprise technology decision-makers, the dataset offers a benchmark for building safer conversational AI—particularly in applications involving minors or sensitive communication channels. By grounding model behavior in validated human judgments across multiple expert and non-expert perspectives, IMPACTeen helps bridge the gap between technical performance and real-world ethical considerations.

The authors—Szczęsny, Aleksander; Mieleszczenko-Kowszewicz, Wiktoria; Markiewicz, Maciej; Bajcar, Beata; Adamczyk, Tomasz; Babiak, Jolanta; Chodak, Grzegorz; and Kazienko, Przemysław—have released the dataset under a Creative Commons Zero license, enabling broad reuse for academic and commercial research.

Sources:

IMPACTeen Dataset Provides New Resource for Detecting Manipulation in Teen Communication

Multi-Perspective Annotation

Construction and Validation

Potential Applications

Recommended Stories

UK to Scan Asylum-Seekers’ Faces with Flawed AI Age Estimation Despite Internal Warnings

From Privacy to Workflow Integrity: Communication-Graph Metadata Threat in Autonomous Agent Interoperability

How SK Telecom's Access to Claude Mythos Triggered US Export Controls on Anthropic's AI

BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync