Artificial intelligence
fromInfoWorld
10 hours agoGoogle gives enterprises new controls to manage AI inference costs and reliability
Gemini API introduces Flex and Priority tiers for managing AI inference workloads based on criticality and cost.
AI models break down words and other inputs into numerical tokens to make them easier to process and understand. One token is about ¾ of a word. OpenRouter, which helps developers access different AI models, has seen activity roughly double in the first weeks of 2026. This is measured by the number of AI tokens OpenRouter processes. OpenRouter handled 13 trillion AI tokens in the week that ended February 9. That's up from 6.4 trillion during the first week of January.
AI inference is the process by which a trained large language model (LLM) applies what it has learned to new data to make predictions, decisions, or classifications. In practical terms, the process goes like this. After a model is trained, say the new GPT 5.1, we use it during the inference phase, where it analyzes data (like a new image) and produces an output (identifying what's in the image) without being explicitly programmed for each fresh image. These inference workloads bridge the gap between LLMs and AI chatbots and agents.
Baseten just pulled in a massive $150 million Series D, vaulting the AI infrastructure startup to a $2.15 billion valuation and cementing its place as one of the most important players in the race to scale inference - the behind-the-scenes compute that makes AI apps actually run. If the last generation of great tech companies was built on the cloud, the next wave is being built on inference. Every time you ask a chatbot a question, generate an image, or tap into an AI-powered workflow, inference is happening under the hood.