scalable

6 stories

Artificial Intelligence #technology#artificial intelligence

QueryGaussian: Training-Free 3D Instance Retrieval Cuts GPU Memory by 70%, Speeds Inference 180x

QueryGaussian, a new training-free framework for open-vocabulary 3D instance retrieval, reduces GPU memory usage by more than 70% and accelerates inference by 180x compared to existing methods, enabling city-scale scenes on consumer-grade hardware.

Jun 20, 2026 1 source

SMEPilot Boosts LLM Inference Up to 3.94x on CPUs with Scalable Matrix Extensions

Technology

Artificial Intelligence #llm#inference

SMEPilot Boosts LLM Inference Up to 3.94x on CPUs with Scalable Matrix Extensions

Researchers have developed SMEPilot, an LLM inference engine that leverages Arm Scalable Matrix Extension (SME) to optimize execution on CPUs. By selecting CPU-only, SME-only, or cooperative SME+CPU execution per operator shape, SMEPilot improves end-to-end inference by up to 3.94x across multiple models and platforms.

Jun 16, 2026 1 source

Mojo Language Shows 20x–180x Speedups for Financial AI Workloads on Apple Silicon

Technology

Artificial Intelligence #mojo#financial ai

Mojo Language Shows 20x–180x Speedups for Financial AI Workloads on Apple Silicon

A new survey introduces Mojo, Modular's 2026 Python-like systems language, as a solution to the decades-old two-language problem in quantitative finance. Benchmarks on Apple Silicon show 20x to 180x speedups over pure Python for core financial AI workloads, with an open-source library for deterministic kernels.

Jun 16, 2026 1 source

MatchLM2Lite: Scalable MLLM-Lite Framework Cuts Reproduced Video Views by 2.5%

Technology

Artificial Intelligence #ai#llms

MatchLM2Lite: Scalable MLLM-Lite Framework Cuts Reproduced Video Views by 2.5%

The paper presents MatchLM2Lite, a production-grade reproduced content identification system that distills a multimodal large language model into a compact student model. Deployed at scale, it reduced reproduced video views by 2.5% without hurting engagement, with 35x lower computational cost and latency under 30 seconds.

Jun 16, 2026 1 source

AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes

Technology

Cybersecurity #cybersecurity#pseudonymization

AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes

AnonShield, a new pseudonymization system for CSIRT vulnerability data, achieves up to 738x speedup using GPU-accelerated NER and streaming processing. It enables compliant data sharing without sacrificing analytical utility, reducing processing time from over 92 hours to under 10 minutes on datasets up to 550 MB.

Jun 16, 2026 1 source

CHILLGuard: Fine-Grained Chinese LLM Safety Guardrail with Scalable Data and Preference Alignment

Technology

Artificial Intelligence #ai safety#llm

CHILLGuard: Fine-Grained Chinese LLM Safety Guardrail with Scalable Data and Preference Alignment

Researchers introduce CHILLGuard, a dedicated Chinese LLM content safety guardrail featuring a 5-macro, 31-micro category risk taxonomy. The system uses a scalable multi-stage data construction pipeline to create the CHILLGuardTrain dataset (405,007 samples) and achieves a 15.92% F1 score improvement over Qwen3Guard-8B-Strict via Model-aware Direct Preference Optimization.

Jun 16, 2026 1 source