#llm-scaling

[ follow ]
#ai
fromTechCrunch
1 week ago
Silicon Valley

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Tech industry
fromComputerworld
1 week ago

HP will cram a 20-billion-parameter AI model into new AI PCs

HP is launching AI features in its Workforce Experience Platform to enhance remote device management and automate tasks on enterprise PCs.
Data science
fromInfoWorld
2 days ago

How to halve Claude output costs with a markdown tweak

A markdown file can reduce Claude's token output by over 50%, aiding enterprises in managing AI costs during production.
Data science
fromTheregister
25 minutes ago

PrismML debuts 1-bit LLM in bid to free AI from the cloud

PrismML's Bonsai 8B is a 1-bit language model that outperforms larger models, enhancing AI efficiency for mobile applications.
Typography
fromMedium
2 days ago

AI is rewriting the rules. Language is following.

The word 'delve' has surged in usage due to AI's influence on language and communication patterns.
Silicon Valley
fromTechCrunch
1 week ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Data science
fromTheregister
2 days ago

TurboQuant is a big deal, but it won't end the memory crunch

TurboQuant is an AI data compression technology that reduces memory usage for KV caches but may not significantly alleviate memory shortages.
Tech industry
fromComputerworld
1 week ago

HP will cram a 20-billion-parameter AI model into new AI PCs

HP is launching AI features in its Workforce Experience Platform to enhance remote device management and automate tasks on enterprise PCs.
Data science
fromInfoWorld
2 days ago

How to halve Claude output costs with a markdown tweak

A markdown file can reduce Claude's token output by over 50%, aiding enterprises in managing AI costs during production.
#ai-models
Artificial intelligence
fromTNW | Apps
13 hours ago

Microsoft launches three in-house AI models in direct challenge to OpenAI

Microsoft has launched three in-house AI models that compete directly with OpenAI, marking a significant shift in its AI strategy.
Artificial intelligence
fromTNW | Apps
13 hours ago

Microsoft launches three in-house AI models in direct challenge to OpenAI

Microsoft has launched three in-house AI models that compete directly with OpenAI, marking a significant shift in its AI strategy.
Software development
fromMedium
14 hours ago

The Open-Source AI Agent Frameworks That Deserve More Stars on GitHub

Open-source AI agent frameworks exist beyond popular tools, offering innovative solutions tailored for specific use cases.
Tech industry
fromTheregister
1 day ago

Google battles Chinese open weights models with Gemma 4

Google launched new open-weights Gemma models optimized for agentic AI and coding, offering enterprises a domestic alternative to Chinese LLMs.
Scala
fromInfoQ
1 day ago

Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot

Context-Augmented Generation (CAG) enhances Retrieval-Augmented Generation (RAG) by managing runtime context for enterprise applications without requiring model retraining.
Privacy professionals
fromInfoQ
2 days ago

GitHub Will Use Copilot Interaction Data from Free, Pro, and Pro+ Users to Train AI Models

GitHub will use interaction data from Copilot users to improve AI models starting April 24, with users opted in by default.
#ai-development
Online learning
fromwww.businessinsider.com
3 days ago

Inside the OpenAI project where freelancers train ChatGPT on everything from farming to commercial flying

Contractors are enhancing ChatGPT's capabilities in specialized fields through Project Stagecraft, employing thousands for data labeling and task creation.
Artificial intelligence
fromInfoWorld
1 week ago

Final training of AI models is a fraction of their total cost

Developing AI models incurs significant costs, with most expenditures on scaling and research rather than final training runs.
Online learning
fromwww.businessinsider.com
3 days ago

Inside the OpenAI project where freelancers train ChatGPT on everything from farming to commercial flying

Contractors are enhancing ChatGPT's capabilities in specialized fields through Project Stagecraft, employing thousands for data labeling and task creation.
Artificial intelligence
fromInfoWorld
1 week ago

Final training of AI models is a fraction of their total cost

Developing AI models incurs significant costs, with most expenditures on scaling and research rather than final training runs.
Business intelligence
fromeLearning Industry
2 days ago

How Many AI Tools Are There? A Data-Backed Look At The Expanding AI Landscape

The AI tools ecosystem is rapidly expanding, with thousands of tools available across various categories, creating both opportunities and complexities for businesses.
#ai-agents
Python
fromTalkpython
2 days ago

Deep Agents: LangChain's SDK for Agents That Plan and Delegate

Deep Agents framework enables building advanced AI agents using Python functions and middleware, enhancing capabilities beyond standard LLMs.
fromZDNET
1 week ago
Business intelligence

4 tips for building better AI agents that your business can trust

Python
fromTalkpython
2 days ago

Deep Agents: LangChain's SDK for Agents That Plan and Delegate

Deep Agents framework enables building advanced AI agents using Python functions and middleware, enhancing capabilities beyond standard LLMs.
Business intelligence
fromZDNET
1 week ago

4 tips for building better AI agents that your business can trust

AI agents are transforming professional roles, requiring companies to adopt and integrate these technologies effectively.
#meta
Marketing tech
fromForbes
3 days ago

Why AI Models Are Recommending Your Competitors Instead Of You

Generative engine optimization (GEO) is essential for brands to be recommended by AI systems, shifting focus from traditional SEO metrics.
European startups
fromTheregister
4 days ago

Rebellions eyes global expansion with rack-scale AI platform

Rebellions raised $400 million to expand globally with AI accelerators and a new compute platform for enterprises and sovereign clouds.
fromArs Technica
1 week ago

Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

PolarQuant is doing most of the compression, but the second step cleans up the rough spots. Google proposes smoothing that out with a technique called Quantized Johnson-Lindenstrauss (QJL).
Roam Research
Data science
fromInfoWorld
1 day ago

Why 'curate first, annotate smarter' is reshaping computer vision development

Strategic data selection and curation reduce annotation costs and enhance development productivity in computer vision teams.
Software development
fromTechzine Global
1 day ago

Cursor updates its platform with a focus on autonomous AI agents

Cursor 3 enhances software development by integrating AI agents for collaborative coding, reducing manual programming and streamlining workflows.
Gadgets
fromTheregister
1 week ago

HP stuffs OpenAI LLM into new laptops in bid for small biz

HP IQ is a new AI collaboration tool from HP designed to enhance productivity in business laptops.
DevOps
fromInfoWorld
1 week ago

An architecture for engineering AI context

AI systems must intelligently manage context to ensure accuracy and reliability in real applications.
Business intelligence
fromComputerworld
3 days ago

Microsoft adds multi-model AI to Copilot Researcher, raising accuracy stakes

Enterprises must enhance governance frameworks for AI deployment to manage complexity, accountability, and ensure effective decision-making.
Python
fromPyImageSearch
4 days ago

Autoregressive Model Limits and Multi-Token Prediction in DeepSeek-V3 - PyImageSearch

Multi-Token Prediction (MTP) in DeepSeek-V3 allows simultaneous token forecasting, enhancing training speed and contextual understanding.
Artificial intelligence
fromTheregister
1 day ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.
Software development
fromWIRED
1 day ago

Cursor Launches a New AI Agent Experience to Take on Claude Code and Codex

Cursor 3 enables users to deploy AI coding agents for task completion, marking a shift in developer workflows.
Artificial intelligence
fromTechCrunch
1 day ago

Microsoft takes on AI rivals with three new foundational models | TechCrunch

Microsoft AI released three foundational AI models for text, voice, and image generation, emphasizing human-centered design and competitive pricing.
Software development
fromInfoWorld
2 days ago

Meta shows structured prompts can make LLMs more reliable for code review

Code review is evolving towards machine-led verification, improving accuracy but introducing tradeoffs like increased latency and workflow overhead.
#openai
Artificial intelligence
fromwww.businessinsider.com
1 day ago

OpenAI's CFO says the company is passing on opportunities because it does not have enough compute

OpenAI is limiting opportunities due to insufficient computing power, impacting product decisions and prioritization of core AI initiatives.
Artificial intelligence
fromFuturism
5 days ago

OpenAI's Obsession With Data Centers Is Running Into Trouble

OpenAI has significantly reduced its AI infrastructure spending plans from $1.4 trillion to $600 billion amid financial pressures and market expectations.
Artificial intelligence
fromwww.businessinsider.com
1 day ago

OpenAI's CFO says the company is passing on opportunities because it does not have enough compute

OpenAI is limiting opportunities due to insufficient computing power, impacting product decisions and prioritization of core AI initiatives.
Artificial intelligence
fromFuturism
5 days ago

OpenAI's Obsession With Data Centers Is Running Into Trouble

OpenAI has significantly reduced its AI infrastructure spending plans from $1.4 trillion to $600 billion amid financial pressures and market expectations.
#ollama
#ai-efficiency
Data science
fromInfoWorld
3 days ago

A GitHub tinkerer teaches Claude to talk less, and that may matter more than it seems

A markdown file can significantly reduce AI output token usage, enhancing efficiency without code changes.
Artificial intelligence
fromInfoWorld
1 week ago

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.
Data science
fromInfoWorld
3 days ago

A GitHub tinkerer teaches Claude to talk less, and that may matter more than it seems

A markdown file can significantly reduce AI output token usage, enhancing efficiency without code changes.
Artificial intelligence
fromInfoWorld
1 week ago

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.
Software development
fromZDNET
3 days ago

How AI has suddenly become much more useful to open-source developers

AI tools are becoming increasingly useful for open-source maintainers, but legal and quality issues remain.
Artificial intelligence
fromTechCrunch
3 days ago

Anthropic is having a month | TechCrunch

Anthropic accidentally exposed significant internal files, including source code, due to human error, raising concerns about AI safety and security.
Data science
fromTechzine Global
1 week ago

As AI hits scaling limits, Google smashes the context barrier

TurboQuant significantly reduces KV cache size, enhancing AI model performance and expanding context windows for complex workloads.
Artificial intelligence
fromFortune
3 days ago

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.
Data science
fromInfoWorld
2 weeks ago

The 'toggle-away' efficiencies: Cutting AI costs inside the training loop

Simple optimizations can significantly reduce AI training costs and carbon emissions without needing the latest GPUs.
#anthropic
Software development
fromMedium
2 weeks ago

Inside Dify AI: How RAG, Agents, and LLMOps Work Together in Production

Dify AI provides a unified platform for deploying production language model systems with built-in solutions for data freshness, observability, versioning, and safe deployment across multiple cloud environments.
Software development
fromInfoWorld
2 weeks ago

How to build an AI agent that actually works

Successful agents embed intelligence within structured workflows at specific decision points rather than operating autonomously, combining deterministic processes with reasoning models where judgment is needed.
#ai-agent-evaluation
Software development
fromInfoQ
2 weeks ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
2 weeks ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Software development
fromInfoQ
2 weeks ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
2 weeks ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Artificial intelligence
fromMedium
1 week ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Software development
fromInfoQ
3 weeks ago

The Oil and Water Moment in AI Architecture

Software architecture is transitioning to AI architecture, requiring architects to manage the coexistence of deterministic systems with non-deterministic AI behavior while shifting from tool-centric to intent-centric thinking.
Artificial intelligence
fromFast Company
2 weeks ago

OpenAI's new frontier models mark a huge change in how AI will be built

OpenAI released two frontier models in early March: GPT-5.3 optimized for fast responses and GPT-5.4 optimized for deep analytical work, representing a shift toward specialized AI models.
Artificial intelligence
fromTechCrunch
2 weeks ago

Niv-AI exits stealth to wring more power performance out of GPUs | TechCrunch

AI data centers waste significant power due to GPU demand surges, forcing operators to throttle performance by up to 30%, prompting startups like Niv-AI to develop precision power management solutions.
fromInfoWorld
3 weeks ago

Neoclouds run AI cheaper and better

By neoclouds, I'm referring to GPU-centric, purpose-built cloud services that focus primarily on AI training and inference rather than on the sprawling catalog of general-purpose services that hyperscalers offer. In many cases, these platforms deliver better price-performance for AI workloads because they're engineered for specific goals: keeping expensive accelerators highly utilized, minimizing platform overhead, and providing a clean path from model development to deployment.
Artificial intelligence
#llm-safety
fromNature
2 months ago
Artificial intelligence

Training large language models on narrow tasks can lead to broad misalignment - Nature

fromNature
2 months ago
Artificial intelligence

Training large language models on narrow tasks can lead to broad misalignment - Nature

Artificial intelligence
fromTheregister
1 month ago

OpenAI GPT-5.3 Instant less likely to beat around the bush

GPT-5.3 Instant reduces unnecessary refusals and moralizing preambles while decreasing hallucination rates by up to 26.8 percent compared to prior models.
Artificial intelligence
fromTheregister
1 month ago

AI models get better at math but still get low marks

Current LLMs struggle with mathematical accuracy, with even top performers scoring C-grade equivalent on practical math benchmarks, though recent versions show modest improvements.
Artificial intelligence
fromInfoWorld
1 month ago

Inception's Mercury 2 speeds around LLM latency bottleneck

Inception's Mercury 2 is the world's fastest reasoning LLM, using parallel refinement instead of sequential decoding to generate multiple tokens simultaneously for faster production AI responses.
Software development
fromInfoQ
1 month ago

OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

OpenAI scaled a single-primary PostgreSQL to millions of queries per second by optimizing instance size, query patterns, read replicas, and offloading write-heavy workloads.
Artificial intelligence
fromTechCrunch
1 month ago

Running AI models is turning into a memory game | TechCrunch

Rising DRAM prices and sophisticated prompt-caching orchestration make memory management a critical cost and performance factor for large-scale AI deployments.
fromInfoQ
1 month ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
Artificial intelligence
fromInfoQ
1 month ago

Building LLMs in Resource-Constrained Environments: A Hands-On Perspective

Prioritize small, resource-efficient models and iterative, human-in-the-loop data creation to build practical, improvable AI under infrastructure and data constraints.
fromFast Company
1 month ago

Are LTMs the next LLMs? This new type of AI can do what large-language models can't

A major difference between LLMs and LTMs is the type of data they're able to synthesize and use. LLMs use unstructured data-think text, social media posts, emails, etc. LTMs, on the other hand, can extract information or insights from structured data, which could be contained in tables, for instance. Since many enterprises rely on structured data, often contained in spreadsheets, to run their operations, LTMs could have an immediate use case for many organizations.
Artificial intelligence
Artificial intelligence
fromInfoQ
2 months ago

Intel DeepMath Introduces a Smart Architecture to Make LLMs Better at Math

DeepMath uses a Qwen3-4B Thinking agent that emits small Python executors for intermediate math steps, improving accuracy and significantly reducing output length.
Artificial intelligence
fromInfoQ
2 months ago

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Large-scale search and recommendation systems use two-stage retrieval and ranking pipelines to efficiently serve personalized results for hundreds of millions of users and items.
fromComputerworld
2 months ago

OpenAI's GPT is getting better at mathematics

OpenAI's GPT-5.2 Pro does better at solving sophisticated math problems than older versions of the company's top large language model, according to a new study by Epoch AI, a non-profit research institute.
Artificial intelligence
Artificial intelligence
fromInfoWorld
1 month ago

First look: Run LLMs locally with LM Studio

LM Studio provides integrated model discovery, in-app download and management, memory-aware filtering, and configurable inference settings for CPU threads and GPU layer offload.
Artificial intelligence
fromHackernoon
1 month ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromInfoQ
2 months ago

MIT's Recursive Language Models Improve Performance on Long-Context Tasks

Recursive Language Models enable LLMs to handle inputs up to 100x longer by using a programming environment and recursive code to decompose and preprocess prompts.
Artificial intelligence
fromTechzine Global
1 month ago

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.
fromInfoQ
2 months ago

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.
Artificial intelligence
Artificial intelligence
fromTheregister
1 month ago

How AI could eat itself: Using LLMs to distill rivals

Competitors are probing commercial AI models to extract underlying reasoning via distillation attacks to replicate capabilities and lower development costs.
[ Load more ]