Visit IGEN World Explore IGEN Expo

EXPLORE UPGRADE PLANS

BREAKING

Moody's Assigns First-Time Baa2 Rating to RBL Bank, One Notch Above India's Sovereign Sebi Bars Zee's Subhash Chandra, Punit Goenka From Market for One Year Zepto Defers IPO by Two to Three Quarters After Tepid Investor Response Tim Cook: India Among Apple's Best Global Markets as June Quarter Records Revenue Domestic funds reach record 21% stake in Indian companies as FPI ownership drops to 17% Cybercriminals widen net as assessees rush to meet I-T return filing deadline Bloomberg Delays India's Sovereign Bond Index Inclusion as Market Reforms Need Further Testing Gold loans jump 93.8% y-o-y, fuel bank credit growth in Q1FY27 Snapchat joins YouTube, LinkedIn and Substack in fight against 'AI slop' Amazon speeds last-mile delivery, expands robotics fleet past 1 million Moody's Assigns First-Time Baa2 Rating to RBL Bank, One Notch Above India's Sovereign Sebi Bars Zee's Subhash Chandra, Punit Goenka From Market for One Year Zepto Defers IPO by Two to Three Quarters After Tepid Investor Response Tim Cook: India Among Apple's Best Global Markets as June Quarter Records Revenue Domestic funds reach record 21% stake in Indian companies as FPI ownership drops to 17% Cybercriminals widen net as assessees rush to meet I-T return filing deadline Bloomberg Delays India's Sovereign Bond Index Inclusion as Market Reforms Need Further Testing Gold loans jump 93.8% y-o-y, fuel bank credit growth in Q1FY27 Snapchat joins YouTube, LinkedIn and Substack in fight against 'AI slop' Amazon speeds last-mile delivery, expands robotics fleet past 1 million

Home ›› Topics ›› text-to-speech

Topic

text-to-speech

2 stories

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

Artificial Intelligence #text-to-speech#style-captioned

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

A research paper introduces cross-attention attribution for style-captioned text-to-speech, adapting the DAAM framework to speech diffusion models. The method extracts per-token heatmaps across layers and steps, analyzing 3,600 combinations to reveal how caption tokens influence waveforms. Key findings include lower temporal variance for style tokens, correlation with F0 and energy, and peak style conditioning in early ODE steps and deep layers.

Jun 20, 2026 2 sources

Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis

Artificial Intelligence #text-to-speech#artificial intelligence

Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis

Researchers propose Pixel-TTS, the first visually grounded text-to-speech framework that renders text as images and processes them with 2D convolutions. This eliminates embedding matrix expansion during fine-tuning and improves robustness to unseen characters and orthographic variations. Experiments show competitive performance with faster convergence and zero-shot generalization.

Jun 16, 2026 1 source