#ai-inference
#ai-inference

[ follow ]

Google gives enterprises new controls to manage AI inference costs and reliability

Gemini API introduces Flex and Priority tiers for managing AI inference workloads based on criticality and cost.

Edge clouds and local data centers reshape IT

Cloud computing is evolving towards a selectively distributed model to address latency, sovereignty, and resilience in smart cities and AI applications.

Artificial intelligence

fromComputerWeekly.com

1 week ago

Akamai launches AI Grid intelligent orchestration | Computer Weekly

Akamai Technologies has launched the first global-scale implementation of Nvidia AI Grid, enhancing AI inference through distributed networking and intelligent orchestration.

Software development

fromTechzine Global

1 week ago

llm-d joins the CNCF

llm-d has been accepted as a CNCF Sandbox project, establishing an open standard for AI inference across any accelerator and cloud environment.

Tech industry

fromTelecompetitor

2 weeks ago

Comcast and NVIDIA partnering for edge computing AI inference

Comcast and NVIDIA are testing GPU deployment at edge facilities to enable low-latency AI inference for personalized advertising, business services, and gaming applications.

Tech industry

fromZDNET

2 weeks ago

Nvidia wants to own your AI data center from end to end

Nvidia expanded its AI infrastructure portfolio with five rack types, including a new LPX inference rack using Groq technology, positioning itself to control all data center processing.

Gadgets

fromTechzine Global

3 weeks ago

AMD is giving its embedded chips 80 TOPS of AI compute

AMD's expanded Ryzen AI Embedded P100 Series delivers up to 12 Zen 5 cores and 80 system TOPS for industrial, robotics, and medical imaging applications with ROCm software support.

fromBusiness Insider

1 month ago

There's been a surge in AI use recently. Here's what's behind it.

AI models break down words and other inputs into numerical tokens to make them easier to process and understand. One token is about ¾ of a word. OpenRouter, which helps developers access different AI models, has seen activity roughly double in the first weeks of 2026. This is measured by the number of AI tokens OpenRouter processes. OpenRouter handled 13 trillion AI tokens in the week that ended February 9. That's up from 6.4 trillion during the first week of January.

Artificial intelligence

fromInfoWorld

1 month ago

Mistral AI deepens compute ambitions with Koyeb acquisition

Mistral gains stronger hybrid and on-premises AI deployment capabilities through Koyeb acquisition, improving GPU optimization and appeal to regulated enterprises.

Venture

fromTechCrunch

1 month ago

Exclusive: AI inference startup Modal Labs in talks to raise at $2.5B valuation, sources say

Modal Labs is raising a new funding round at about a $2.5 billion valuation, more than doubling its valuation from five months earlier.

Privacy professionals

fromKotaku

1 month ago

Discord In Damage Control Mode As Users Threaten To Ditch Nitro

Discord will use AI age-inference to exempt most users from face-scan or ID verification, while those inferred under 18 must complete sensitive verification.

#openai

fromFuturism

1 month ago

Artificial intelligence

Uh Oh... Nvidia's $100 Billion Deal With OpenAI Has Fallen Apart

fromFortune

6 months ago

Artificial intelligence

Jensen Huang doesn't care about Sam Altman's AI hype fears: he thinks OpenAI will be the first "multi-trillion dollar hyperscale company" | Fortune

fromFuturism

1 month ago

Artificial intelligence

Uh Oh... Nvidia's $100 Billion Deal With OpenAI Has Fallen Apart

fromFortune

6 months ago

Artificial intelligence

Jensen Huang doesn't care about Sam Altman's AI hype fears: he thinks OpenAI will be the first "multi-trillion dollar hyperscale company" | Fortune

Artificial intelligence

Gruve raises $50 million to solve what its CEO calls AI's biggest problem: power

fromTechRepublic

2 months ago

Artificial intelligence

PowerGen's Shock Pivot: How AI Data Centers Hijacked an Energy Conference

fromBusiness Insider

1 month ago

Artificial intelligence

Gruve raises $50 million to solve what its CEO calls AI's biggest problem: power

fromTechRepublic

2 months ago

Artificial intelligence

PowerGen's Shock Pivot: How AI Data Centers Hijacked an Energy Conference

more#data-centers

Artificial intelligence

fromComputerworld

2 months ago

Microsoft launches its second generation AI inference chip, Maia 200

Maia 200 is a high-performance, energy-efficient inference accelerator optimized for large reasoning models, delivering superior FP4/FP8 throughput and memory compared with rival cloud accelerators.

Venture

fromTechCrunch

2 months ago

Inference startup Inferact lands $150M to commercialize vLLM | TechCrunch

vLLM became VC-backed Inferact, raising $150M seed at an $800M valuation to commercialize faster, cheaper AI inference technology.

Artificial intelligence

fromTechCrunch

2 months ago

From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunch

Neurophos developed metasurface modulators enabling optical tensor-core chips that perform AI matrix-vector multiplications faster and more energy-efficiently than current GPUs for inference.

Artificial intelligence

fromComputerworld

2 months ago

CES 2026: AI compute sees a shift from training to inference

AI spending is shifting from training-heavy investment to inference-heavy investment, with forecasts projecting roughly 80% of future AI spend on inference.

Artificial intelligence

fromBusiness Insider

2 months ago

AI has been all about GPUs. That's changing fast.

Inference-focused LPUs offer lower latency and greater energy efficiency than flexible GPUs, prompting Nvidia's $20 billion acquisition of Groq.

#nvidia

fromwww.theguardian.com

2 months ago

Artificial intelligence

Nvidia announces new, more powerful Vera Rubin chip made for AI

fromFortune

2 months ago

Startup companies

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

fromBusiness Insider

3 months ago

Silicon Valley

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

fromwww.theguardian.com

2 months ago

Artificial intelligence

Nvidia announces new, more powerful Vera Rubin chip made for AI

fromFortune

2 months ago

Startup companies

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

fromBusiness Insider

3 months ago

Silicon Valley

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

more#nvidia

Artificial intelligence

fromFortune

3 months ago

Nvidia's Groq bet shows that the economics of AI chip-building are still unsettled | Fortune

Inference determines AI profitability, requiring specialized, low-latency hardware beyond GPUs to reduce costs and handle large-scale, real-time model serving.

Artificial intelligence

fromAxios

3 months ago

Nvidia deal shows why inference is AI's next battleground

Inference performance and cost-efficiency are critical bottlenecks for scaling and monetizing AI, and Groq's inference-focused chips aim to address that gap.

fromZDNET

4 months ago

Cloud-native computing is poised to explode, thanks to AI inference work

AI inference is the process by which a trained large language model (LLM) applies what it has learned to new data to make predictions, decisions, or classifications. In practical terms, the process goes like this. After a model is trained, say the new GPT 5.1, we use it during the inference phase, where it analyzes data (like a new image) and produces an output (identifying what's in the image) without being explicitly programmed for each fresh image. These inference workloads bridge the gap between LLMs and AI chatbots and agents.

Artificial intelligence

fromIT Pro

4 months ago

What is a tensor processing unit (TPU)?

TPUs are Google-designed ASICs evolved to massive-scale AI accelerators, culminating in the Ironwood chip delivering exaflop-level inference performance and high memory bandwidth.

Artificial intelligence

fromTechzine Global

4 months ago

SUSE sets out to prevent a wave of DIY AI failures

SUSE adds built-in observability and an AI inference engine to manage AI workloads across on-premises, cloud, and hybrid environments and improve AI ROI.

Tech industry

fromIT Pro

4 months ago

Cisco wants to take AI closer to the edge

Cisco introduced Cisco Unified Edge, a scalable, modular platform combining computing, networking, and storage to run real-time AI inference at the enterprise edge.

#qualcomm

fromThe Verge

5 months ago

Artificial intelligence

Qualcomm is turning parts from cellphone chips into AI chips to rival Nvidia

fromComputerWeekly.com

5 months ago

Artificial intelligence

Qualcomm gears up for AI inference revolution | Computer Weekly

fromThe Verge

5 months ago

Artificial intelligence

Qualcomm is turning parts from cellphone chips into AI chips to rival Nvidia

fromComputerWeekly.com

5 months ago

Artificial intelligence

Qualcomm gears up for AI inference revolution | Computer Weekly

more#qualcomm

fromTheregister

5 months ago

Qualcomm announces AI accelerators and racks they'll run in

a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption.

Artificial intelligence

from24/7 Wall St.

5 months ago

Oracle Executive Just Gave 50,000 Reasons to Buy AMD Stock Right Now

AMD rapidly became a meaningful AI GPU competitor, gaining 10–15% market share through MI300X performance, hyperscaler partnerships, and a roadmap toward more efficient inference.

Artificial intelligence

fromTechzine Global

5 months ago

Intel expands AI portfolio with Crescent Island GPU

Intel's Crescent Island GPU targets AI inference with 160GB LPDDR5X, emphasizing energy efficiency, cost-effectiveness, and air-cooled deployment, with first units due H2 2026.

Artificial intelligence

fromTelecompetitor

6 months ago

123NET Expands Southfield Data Center for AI and High-Density Deployments

123NET expanded Southfield DC1 with a 4 MW high-density GPU colocation, liquid/air cooling, and on-site DET-iX free peering for low-latency AI.

Artificial intelligence

fromTheregister

6 months ago

AMD GPU's boosting ROCm 7.0 software libraries are here

AMD's ROCm 7.0 narrows CUDA advantage by boosting inference and training performance on MI355X and older MI300-series GPUs.

fromSilicon Valley Journals

6 months ago

Baseten raises $150 million to power the future of AI inference

Baseten just pulled in a massive $150 million Series D, vaulting the AI infrastructure startup to a $2.15 billion valuation and cementing its place as one of the most important players in the race to scale inference - the behind-the-scenes compute that makes AI apps actually run. If the last generation of great tech companies was built on the cloud, the next wave is being built on inference. Every time you ask a chatbot a question, generate an image, or tap into an AI-powered workflow, inference is happening under the hood.

Venture

Artificial intelligence

fromFortune

6 months ago

Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation

Baseten provides inference infrastructure that enables companies to deploy, manage, and scale AI models while rapidly increasing revenue and valuation.

Artificial intelligence

fromInfoWorld

7 months ago

Evolving Kubernetes for generative AI inference

Kubernetes now includes native AI inference features including vLLM support, inference benchmarking, LLM-aware routing, inference gateway extensions, and accelerator scheduling.

#amd

fromTechzine Global

9 months ago

Artificial intelligence

AMD makes third acquisition in eight days as feeding frenzy continues

fromBusiness Insider

11 months ago

Artificial intelligence

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

fromTechzine Global

9 months ago

Artificial intelligence

AMD makes third acquisition in eight days as feeding frenzy continues

fromBusiness Insider

11 months ago

Artificial intelligence

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

more#amd

fromInfoQ

10 months ago

Google Enhances LiteRT for Faster On-Device Inference

LiteRT, previously TensorFlow Lite, enhances on-device ML inference by simplifying GPU and NPU integration, achieving up to 25x speed improvements and lower power usage.

Artificial intelligence

fromTechzine Global

10 months ago

Red Hat lays foundation for AI inferencing: Server and llm-d project

AI inferencing is crucial for unlocking the full potential of artificial intelligence, as it enables models to apply learned knowledge to real-world situations.

Artificial intelligence

fromIT Pro

11 months ago

'TPUs just work': Why Google Cloud is betting big on its custom chips

Google's seventh generation TPU, 'Ironwood', aims to lead in AI workload efficiency and cost-effectiveness.

TPUs were developed with a cohesive hardware-software synergy, enhancing their utility for AI applications.

[ Load more ]

#ai-inference#ai-inference

Google gives enterprises new controls to manage AI inference costs and reliability

Edge clouds and local data centers reshape IT

Akamai launches AI Grid intelligent orchestration | Computer Weekly

llm-d joins the CNCF

Comcast and NVIDIA partnering for edge computing AI inference

Nvidia wants to own your AI data center from end to end

AMD is giving its embedded chips 80 TOPS of AI compute

There's been a surge in AI use recently. Here's what's behind it.

Mistral AI deepens compute ambitions with Koyeb acquisition

Exclusive: AI inference startup Modal Labs in talks to raise at $2.5B valuation, sources say

Discord In Damage Control Mode As Users Threaten To Ditch Nitro

Uh Oh... Nvidia's $100 Billion Deal With OpenAI Has Fallen Apart

Jensen Huang doesn't care about Sam Altman's AI hype fears: he thinks OpenAI will be the first "multi-trillion dollar hyperscale company" | Fortune

Uh Oh... Nvidia's $100 Billion Deal With OpenAI Has Fallen Apart

Jensen Huang doesn't care about Sam Altman's AI hype fears: he thinks OpenAI will be the first "multi-trillion dollar hyperscale company" | Fortune

Gruve raises $50 million to solve what its CEO calls AI's biggest problem: power

PowerGen's Shock Pivot: How AI Data Centers Hijacked an Energy Conference

Gruve raises $50 million to solve what its CEO calls AI's biggest problem: power

PowerGen's Shock Pivot: How AI Data Centers Hijacked an Energy Conference

Microsoft launches its second generation AI inference chip, Maia 200

Inference startup Inferact lands $150M to commercialize vLLM | TechCrunch

From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunch

CES 2026: AI compute sees a shift from training to inference

AI has been all about GPUs. That's changing fast.

Nvidia announces new, more powerful Vera Rubin chip made for AI

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

Nvidia announces new, more powerful Vera Rubin chip made for AI

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

Nvidia's Groq bet shows that the economics of AI chip-building are still unsettled | Fortune

Nvidia deal shows why inference is AI's next battleground

Cloud-native computing is poised to explode, thanks to AI inference work

What is a tensor processing unit (TPU)?

SUSE sets out to prevent a wave of DIY AI failures

Cisco wants to take AI closer to the edge

Qualcomm is turning parts from cellphone chips into AI chips to rival Nvidia

Qualcomm gears up for AI inference revolution | Computer Weekly

Qualcomm is turning parts from cellphone chips into AI chips to rival Nvidia

Qualcomm gears up for AI inference revolution | Computer Weekly

Qualcomm announces AI accelerators and racks they'll run in

Oracle Executive Just Gave 50,000 Reasons to Buy AMD Stock Right Now

Intel expands AI portfolio with Crescent Island GPU

123NET Expands Southfield Data Center for AI and High-Density Deployments

AMD GPU's boosting ROCm 7.0 software libraries are here

Baseten raises $150 million to power the future of AI inference

Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation

Evolving Kubernetes for generative AI inference

AMD makes third acquisition in eight days as feeding frenzy continues

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

AMD makes third acquisition in eight days as feeding frenzy continues

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

Google Enhances LiteRT for Faster On-Device Inference

Red Hat lays foundation for AI inferencing: Server and llm-d project

'TPUs just work': Why Google Cloud is betting big on its custom chips

#ai-inference
#ai-inference