#1-bit-weights

[ follow ]
#ai
fromZDNET
4 days ago
Artificial intelligence

What Google's TurboQuant can and can't do for AI's spiraling cost

fromTechCrunch
1 week ago
Silicon Valley

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

fromTechCrunch
1 week ago
Data science

Google unveils TurboQuant, a lossless AI memory compression algorithm - and yes, the internet is calling it 'Pied Piper' | TechCrunch

Data science
fromTheregister
2 days ago

TurboQuant is a big deal, but it won't end the memory crunch

TurboQuant is an AI data compression technology that reduces memory usage for KV caches but may not significantly alleviate memory shortages.
Artificial intelligence
fromZDNET
4 days ago

What Google's TurboQuant can and can't do for AI's spiraling cost

Google's TurboQuant significantly reduces AI memory usage, making AI more efficient and accessible by lowering inference costs.
Data science
fromInfoWorld
2 days ago

How to halve Claude output costs with a markdown tweak

A markdown file can reduce Claude's token output by over 50%, aiding enterprises in managing AI costs during production.
Tech industry
fromComputerworld
1 week ago

HP will cram a 20-billion-parameter AI model into new AI PCs

HP is launching AI features in its Workforce Experience Platform to enhance remote device management and automate tasks on enterprise PCs.
Silicon Valley
fromTechCrunch
1 week ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Data science
fromTechCrunch
1 week ago

Google unveils TurboQuant, a lossless AI memory compression algorithm - and yes, the internet is calling it 'Pied Piper' | TechCrunch

Google's TurboQuant is an ultra-efficient AI memory compression algorithm that significantly reduces memory usage without quality loss.
Software development
fromArs Technica
3 days ago

Running local models on Macs gets faster with Ollama's MLX support

Ollama enhances local language model performance on Apple Silicon with MLX support and improved caching, catering to growing interest in local models.
fromArs Technica
1 week ago

Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

PolarQuant is doing most of the compression, but the second step cleans up the rough spots. Google proposes smoothing that out with a technique called Quantized Johnson-Lindenstrauss (QJL).
Roam Research
DevOps
fromInfoWorld
1 week ago

An architecture for engineering AI context

AI systems must intelligently manage context to ensure accuracy and reliability in real applications.
Artificial intelligence
fromInfoWorld
1 week ago

Final training of AI models is a fraction of their total cost

Developing AI models incurs significant costs, with most expenditures on scaling and research rather than final training runs.
#ai-efficiency
Digital life
fromInfoWorld
2 weeks ago

AI optimization: How we cut energy costs in social media recommendation systems

Optimizing data processing in AI can significantly reduce energy consumption and operational costs.
Digital life
fromInfoWorld
2 weeks ago

AI optimization: How we cut energy costs in social media recommendation systems

Optimizing data processing in AI can significantly reduce energy consumption and operational costs.
Artificial intelligence
fromInfoWorld
1 week ago

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.
Data science
fromFast Company
1 week ago

A top AI researcher explains the limitations of current models

Francois Chollet's ARC-AGI-3 benchmark reveals AI's limitations in navigating novel situations compared to human intelligence.
Tech industry
fromTheregister
2 weeks ago

Nvidia slaps Groq into new LPX racks for faster AI response

Nvidia integrates Groq's language processing units into Vera Rubin systems to dramatically accelerate LLM inference, enabling hundreds to thousands of tokens per second per user.
fromFortune
3 weeks ago

AI can double output. Human biology can't | Fortune

The danger emerges when higher measured output is mistaken for sustainable performance. When organizations equate productivity gains with permanent increases in expectation, they effectively borrow against biological reserves. The debt is paid later in disengagement, turnover, and diminished adaptability.
Business intelligence
Artificial intelligence
fromMedium
1 week ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Software development
fromInfoWorld
2 weeks ago

I ran Qwen3.5 locally instead of Claude Code. Here's what happened.

Smaller, efficient LLMs like Qwen3.5 can run on consumer-grade PCs for local development, but setup complexity and IDE integration remain challenging barriers to widespread adoption.
Software development
fromMedium
2 weeks ago

Inside Dify AI: How RAG, Agents, and LLMOps Work Together in Production

Dify AI provides a unified platform for deploying production language model systems with built-in solutions for data freshness, observability, versioning, and safe deployment across multiple cloud environments.
Data science
fromInfoWorld
2 weeks ago

The 'toggle-away' efficiencies: Cutting AI costs inside the training loop

Simple optimizations can significantly reduce AI training costs and carbon emissions without needing the latest GPUs.
Miscellaneous
fromInfoQ
1 month ago

OpenAI Codex-Spark Achieves Ultra-Fast Coding Speeds on Cerebras Hardware

OpenAI deployed GPT-5.3-Codex-Spark on Cerebras wafer-scale chips, achieving 1,000 tokens per second for real-time interactive coding with 15× faster performance than earlier versions.
Software development
fromInfoWorld
2 weeks ago

How to build an AI agent that actually works

Successful agents embed intelligence within structured workflows at specific decision points rather than operating autonomously, combining deterministic processes with reasoning models where judgment is needed.
#on-device-ai
Artificial intelligence
fromTechCrunch
2 weeks ago

Multiverse Computing pushes its compressed AI models into the mainstream | TechCrunch

Multiverse Computing offers on-device AI models that eliminate counterparty risk by running locally without requiring external compute infrastructure or cloud providers.
fromTechCrunch
2 months ago
Artificial intelligence

Quadric rides the shift from cloud AI to on-device inference - and it's paying off | TechCrunch

Quadric licenses programmable AI processor IP for on-device inference, expanding beyond automotive into laptops and industrial devices while rapidly increasing revenue and valuation.
Artificial intelligence
fromTechCrunch
2 weeks ago

Multiverse Computing pushes its compressed AI models into the mainstream | TechCrunch

Multiverse Computing offers on-device AI models that eliminate counterparty risk by running locally without requiring external compute infrastructure or cloud providers.
fromTechCrunch
2 months ago
Artificial intelligence

Quadric rides the shift from cloud AI to on-device inference - and it's paying off | TechCrunch

Software development
fromInfoQ
3 weeks ago

The Oil and Water Moment in AI Architecture

Software architecture is transitioning to AI architecture, requiring architects to manage the coexistence of deterministic systems with non-deterministic AI behavior while shifting from tool-centric to intent-centric thinking.
Artificial intelligence
fromInfoWorld
2 weeks ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Artificial intelligence
fromPsychology Today
2 weeks ago

What QuantumAI Is, and Why We May Miss Its Importance

Quantum AI combines quantum computing with artificial intelligence to solve complex problems involving massive combinations of possibilities, particularly useful for drug discovery, materials design, logistics, and financial analysis.
Artificial intelligence
fromFast Company
2 weeks ago

OpenAI's new frontier models mark a huge change in how AI will be built

OpenAI released two frontier models in early March: GPT-5.3 optimized for fast responses and GPT-5.4 optimized for deep analytical work, representing a shift toward specialized AI models.
Artificial intelligence
fromTechCrunch
2 weeks ago

Niv-AI exits stealth to wring more power performance out of GPUs | TechCrunch

AI data centers waste significant power due to GPU demand surges, forcing operators to throttle performance by up to 30%, prompting startups like Niv-AI to develop precision power management solutions.
fromInfoWorld
3 weeks ago

Neoclouds run AI cheaper and better

By neoclouds, I'm referring to GPU-centric, purpose-built cloud services that focus primarily on AI training and inference rather than on the sprawling catalog of general-purpose services that hyperscalers offer. In many cases, these platforms deliver better price-performance for AI workloads because they're engineered for specific goals: keeping expensive accelerators highly utilized, minimizing platform overhead, and providing a clean path from model development to deployment.
Artificial intelligence
Silicon Valley
fromTheregister
1 month ago

Meta already deploying Nvidia's standalone CPUs at scale

Meta has deployed Nvidia's standalone Grace CPUs at scale and will deploy Vera CPUs and millions of Superchips to power general-purpose and agentic AI workloads.
#ai-agents
fromEngadget
3 weeks ago
Artificial intelligence

NVIDIA is reportedly working on its own open-source AI agent platform

fromWIRED
3 weeks ago
Artificial intelligence

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

fromTechCrunch
1 month ago
Artificial intelligence

Perplexity's new Computer is another bet that users need many AI models | TechCrunch

Artificial intelligence
fromEngadget
3 weeks ago

NVIDIA is reportedly working on its own open-source AI agent platform

NVIDIA is developing NemoClaw, an enterprise-focused open-source AI agent platform designed to work across non-NVIDIA hardware with enhanced security features.
Artificial intelligence
fromWIRED
3 weeks ago

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Nvidia is launching NemoClaw, an open-source AI agent platform enabling enterprise software companies to deploy AI agents for workforce task automation, accessible regardless of chip dependency.
fromTechCrunch
1 month ago
Artificial intelligence

Perplexity's new Computer is another bet that users need many AI models | TechCrunch

#neuromorphic-computing
Environment
fromFast Company
2 months ago

These invisible factors are limiting the future of AI

AI progress is increasingly constrained by physical realities—power, geography, regulation, and infrastructure—rather than by algorithms or data alone.
fromTechCrunch
2 months ago

Humans& thinks coordination is the next frontier for AI, and they're building a model to prove it | TechCrunch

Humans&, a new startup founded by alumni of Anthropic, Meta, OpenAI, xAI, and Google DeepMind, thinks closing that gap is the next major frontier for foundation models. The company this week raised a $480 million seed round to build a "central nervous system" for the human-plus-AI economy. The startup's " AI for empowering humans " framing has dominated early coverage, but the company's actual ambition is more novel: building a new foundation model architecture designed for social intelligence, not just information retrieval or code generation.
Startup companies
Artificial intelligence
fromTheregister
1 month ago

AI models get better at math but still get low marks

Current LLMs struggle with mathematical accuracy, with even top performers scoring C-grade equivalent on practical math benchmarks, though recent versions show modest improvements.
fromTechzine Global
1 month ago

Anthropic acquires Vercept to optimize Claude's computer use

Computer use enables Claude to perform multi-step tasks in live applications, just as a person would at a keyboard. This means that the AI can solve problems that are impossible with code alone. Recent progress speaks for itself: on the OSWorld benchmark for computer use, the Sonnet models went from below 15 percent at the end of 2024 to 72.5 percent today.
Artificial intelligence
#large-language-models
fromFuturism
2 months ago
Artificial intelligence

AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds

fromFuturism
2 months ago
Artificial intelligence

AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds

Artificial intelligence
from24/7 Wall St.
1 month ago

NVIDIA Cements Its Role as the Backbone of AI Infrastructure

NVIDIA's networking revenue grew 162% year-over-year to $8.2 billion, nearly tripling GPU growth, signaling a shift from chip seller to integrated infrastructure provider selling complete AI data center systems.
Artificial intelligence
fromTechCrunch
1 month ago

Running AI models is turning into a memory game | TechCrunch

Rising DRAM prices and sophisticated prompt-caching orchestration make memory management a critical cost and performance factor for large-scale AI deployments.
Artificial intelligence
fromFast Company
1 month ago

AI's biggest problem isn't intelligence. It's implementation

AI adoption is uneven, yielding clear efficiency gains in some functions yet producing limited measurable profit impacts across most large companies.
Artificial intelligence
fromInfoQ
1 month ago

Hugging Face Introduces Community Evals for Transparent Model Benchmarking

Community Evals enables benchmark datasets on the Hugging Face Hub to host leaderboards, collect reproducible evaluation results via Git-based .eval_results YAML submissions, and display scores.
Artificial intelligence
fromInfoQ
2 months ago

Intel DeepMath Introduces a Smart Architecture to Make LLMs Better at Math

DeepMath uses a Qwen3-4B Thinking agent that emits small Python executors for intermediate math steps, improving accuracy and significantly reducing output length.
Artificial intelligence
fromTechzine Global
1 month ago

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.
Artificial intelligence
fromHackernoon
1 month ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromInfoQ
1 month ago

Building LLMs in Resource-Constrained Environments: A Hands-On Perspective

Prioritize small, resource-efficient models and iterative, human-in-the-loop data creation to build practical, improvable AI under infrastructure and data constraints.
Artificial intelligence
fromInfoQ
2 months ago

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Large-scale search and recommendation systems use two-stage retrieval and ranking pipelines to efficiently serve personalized results for hundreds of millions of users and items.
Artificial intelligence
fromZDNET
2 months ago

AI is quietly poisoning itself and pushing models toward collapse - but there's a cure

Unverified AI-generated data causes model collapse and unreliable AI outputs unless organizations enforce data provenance, verification, and governance.
Artificial intelligence
fromZDNET
1 month ago

AI isn't getting smarter, it's getting more power hungry - and expensive

Total computing power explains more model performance gains than proprietary algorithmic 'secret sauce' across 809 large language models.
Artificial intelligence
fromTheregister
2 months ago

China's Z.ai trained a model using only Huawei hardware

Zhipu AI trained GLM-Image entirely on Huawei Ascend Atlas 800T A2 servers and Ascend 910 AI processors, claiming a fully China-based advanced model.
Artificial intelligence
fromArs Technica
1 month ago

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

Cerebras' Wafer Scale Engine enables high token throughput while OpenAI diversifies hardware beyond Nvidia amid fast-paced coding model competition.
Artificial intelligence
fromLogRocket Blog
2 months ago

How poor chunking increases AI costs and weakens accuracy - LogRocket Blog

Chunking determines AI feature cost, accuracy, and scalability; deliberate chunking reduces costs, improves retrieval accuracy, and enables reliable production systems.
fromCointelegraph
2 months ago

What Role Is Left for Decentralized GPU Networks in AI?

What we are beginning to see is that many open-source and other models are becoming compact enough and sufficiently optimized to run very efficiently on consumer GPUs,
Artificial intelligence
Artificial intelligence
fromZDNET
1 month ago

OpenAI's new Spark model codes 15x faster than GPT-5.3-Codex - but there's a catch

Codex-Spark enables conversational, real-time coding with major latency improvements (15x faster code generation; 80% roundtrip, 50% time-to-first-token) using Cerebras WSE-3.
Artificial intelligence
fromInfoWorld
1 month ago

First look: Run LLMs locally with LM Studio

LM Studio provides integrated model discovery, in-app download and management, memory-aware filtering, and configurable inference settings for CPU threads and GPU layer offload.
Artificial intelligence
fromZDNET
2 months ago

AMD's new Ryzen chipset promises faster performance, better gaming, and smarter AI

AMD launched new Ryzen AI mobile and workstation processors plus high-performance gaming CPUs with upgraded NPUs and AI-powered FSR Redstone to boost performance and visuals.
#gpt-53-codex-spark
fromInfoQ
1 month ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
fromTechzine Global
2 months ago

AMD presents AI strategy for PCs and smaller data centers

AMD is introducing the Ryzen AI 400 series and the accompanying Ryzen AI PRO 400 line. These processors combine CPU, GPU, and NPU components and are designed for local execution of AI tasks on Windows systems. AMD cites AI computing power of up to 60 TOPS, enabling applications such as image processing, generative AI, and voice functions to run without a cloud connection.
Artificial intelligence
Artificial intelligence
from24/7 Wall St.
2 months ago

Is AMD About to Surpass Nvidia In the AI Chip Race?

Nvidia dominates AI chips with roughly 92% of data-center GPUs, while AMD has rapidly improved with MI300X and may challenge on cost and open-standard appeal.
fromMedium
1 month ago

When to Use Agentic AI Workflows-and When Simpler Is Better

Agentic AI workflows sit at the intersection of automation and decision-making. Unlike a standard workflow, where data flows through pre-defined steps, an agentic workflow gives a language model discretion. The model can decide when to act, when to pause, and when to invoke tools like web search, databases, or internal APIs. That flexibility is powerful - but also costly, fragile, and easy to misuse.
Artificial intelligence
[ Load more ]