#audio-language-model

[ follow ]
#ai-models
Artificial intelligence
fromTNW | Apps
1 day ago

Microsoft launches three in-house AI models in direct challenge to OpenAI

Microsoft has launched three in-house AI models that compete directly with OpenAI, marking a significant shift in its AI strategy.
Artificial intelligence
fromTNW | Apps
1 day ago

Microsoft launches three in-house AI models in direct challenge to OpenAI

Microsoft has launched three in-house AI models that compete directly with OpenAI, marking a significant shift in its AI strategy.
Digital life
fromTechRepublic
1 day ago

Google Vids Just Got a Major AI Upgrade - Here's What's New

Google Vids enables intuitive video creation using AI, allowing users to direct avatars and publish content quickly with simple text prompts.
#openai
#ai
Philosophy
fromPsychology Today
3 days ago

Nobody Carries AI's Thinking With Affection

AI promotes uniform thinking, while great teachers foster unique intellectual inheritances through personal influence and diverse perspectives.
European startups
fromTechCrunch
1 week ago

Mistral releases a new open-source model for speech generation | TechCrunch

Mistral launched Voxtral TTS, an open-source text-to-speech model for voice AI assistants and enterprise applications, supporting nine languages.
Typography
fromMedium
2 days ago

AI is rewriting the rules. Language is following.

The word 'delve' has surged in usage due to AI's influence on language and communication patterns.
Philosophy
fromPsychology Today
3 days ago

Nobody Carries AI's Thinking With Affection

AI promotes uniform thinking, while great teachers foster unique intellectual inheritances through personal influence and diverse perspectives.
European startups
fromTechCrunch
1 week ago

Mistral releases a new open-source model for speech generation | TechCrunch

Mistral launched Voxtral TTS, an open-source text-to-speech model for voice AI assistants and enterprise applications, supporting nine languages.
#ai-music
Music production
fromTechCrunch
2 days ago

ElevenLabs releases a new AI-powered music generation app | TechCrunch

ElevenLabs launched ElevenMusic, an iOS app for creating and discovering AI-generated music, aiming to expand beyond voice models and compete in the music space.
Music production
fromTechCrunch
2 days ago

ElevenLabs releases a new AI-powered music generation app | TechCrunch

ElevenLabs launched ElevenMusic, an iOS app for creating and discovering AI-generated music, aiming to expand beyond voice models and compete in the music space.
Software development
fromZDNET
2 days ago

I built two apps with just my voice and a mouse - are IDEs already obsolete?

AI coding transforms development by replacing traditional editing and debugging with instructive guidance.
Business intelligence
fromeLearning Industry
3 days ago

How Many AI Tools Are There? A Data-Backed Look At The Expanding AI Landscape

The AI tools ecosystem is rapidly expanding, with thousands of tools available across various categories, creating both opportunities and complexities for businesses.
Gadgets
fromTechCrunch
4 days ago

Speechify's Windows app uses local models for transcription and dictation | TechCrunch

Speechify launched a Windows app for dictation and reading aloud, processing voice entirely on-device for enhanced user experience.
Python
fromTalkpython
3 days ago

Deep Agents: LangChain's SDK for Agents That Plan and Delegate

Deep Agents framework enables building advanced AI agents using Python functions and middleware, enhancing capabilities beyond standard LLMs.
Education
fromHarvard Gazette
3 days ago

'Vibe coding' may offer insight into our AI future - Harvard Gazette

Vibe coding allows users to create software by describing functionality in plain English, reducing the need for coding knowledge.
#ai-development
fromwww.businessinsider.com
3 days ago
Online learning

Inside the OpenAI project where freelancers train ChatGPT on everything from farming to commercial flying

Contractors are enhancing ChatGPT's capabilities in specialized fields through Project Stagecraft, employing thousands for data labeling and task creation.
Online learning
fromwww.businessinsider.com
3 days ago

Inside the OpenAI project where freelancers train ChatGPT on everything from farming to commercial flying

Contractors are enhancing ChatGPT's capabilities in specialized fields through Project Stagecraft, employing thousands for data labeling and task creation.
Data science
fromInfoWorld
4 days ago

A GitHub tinkerer teaches Claude to talk less, and that may matter more than it seems

A markdown file can significantly reduce AI output token usage, enhancing efficiency without code changes.
fromTechCrunch
1 week ago

Cohere launches an open-source voice model specifically for transcription | TechCrunch

Cohere's Transcribe model is designed for tasks like note-taking and speech analysis, supporting 14 languages and optimized for consumer-grade GPUs, making it accessible for self-hosting.
European startups
Artificial intelligence
fromTheregister
2 days ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.
Mobile UX
fromTechCrunch
1 week ago

WhatsApp can now draft AI-generated responses based on your conversations | TechCrunch

WhatsApp introduces AI-powered features for suggested replies, message drafting, photo touch-ups, and space management, enhancing user experience and privacy.
#ollama
Apple
fromTechRepublic
3 days ago

Apple Prepares Siri for Multi-Step AI Requests in iOS 27

Apple is developing a Siri upgrade to handle multiple requests in one command, enhancing its functionality in iOS 27.
Python
fromPyImageSearch
5 days ago

Autoregressive Model Limits and Multi-Token Prediction in DeepSeek-V3 - PyImageSearch

Multi-Token Prediction (MTP) in DeepSeek-V3 allows simultaneous token forecasting, enhancing training speed and contextual understanding.
fromWIRED
5 days ago

Meet the Man Making Music With His Brain Implant

Galen Buckwalter, a 69-year-old research psychologist and quadriplegic, participated in a brain implant study to contribute to science that aids those with paralysis. The six chips in his brain decode movement intention, allowing him to operate a computer and feel sensations in his fingers again.
Music production
Apple
fromThe Verge
3 days ago

You can now use ChatGPT with Apple's CarPlay

ChatGPT is now available on CarPlay for voice-based interactions with iOS 26.4 and the latest app version.
Business
from24/7 Wall St.
2 weeks ago

SoundHound AI Sinks 6%: What NVIDIA's Voice AI Bet Says About the Broader Market

SoundHound AI stock dropped 6% following CFO resignation announcement, continuing a 31% year-to-date decline despite strong revenue growth of 59.4% and improving margins.
Science
fromThe Cipher Brief
2 weeks ago

Why the U.S. Must Build the Ultimate Multi-Modal Foundation Model

Advanced AI models like AlphaEarth demonstrate pixel-level geospatial intelligence capabilities that must be integrated into U.S. national security frameworks to maintain technological leadership.
Music production
fromTechCrunch
1 week ago

Google launches Lyria 3 Pro music generation model | TechCrunch

Google released Lyria 3 Pro, allowing users to create longer music tracks with enhanced customization and control compared to Lyria 3.
Artificial intelligence
fromTechCrunch
2 days ago

Microsoft takes on AI rivals with three new foundational models | TechCrunch

Microsoft AI released three foundational AI models for text, voice, and image generation, emphasizing human-centered design and competitive pricing.
Music production
fromThe Verge
1 week ago

Google Lyria 3 Pro makes longer AI songs

Google's Lyria 3 music-making AI now creates tracks up to three minutes long with enhanced features for user control and integration with other Google products.
Deliverability
fromFast Company
3 weeks ago

How to communicate like a human in the age of AI

AI-generated communication lacks personal distinctiveness and authenticity, reducing trustworthiness despite appearing professional, while minimal AI editing preserves human voice and credibility.
Apple
fromThe Verge
1 week ago

Apple is testing a standalone app for its overhauled Siri

Apple is set to unveil a revamped Siri as a systemwide AI agent at WWDC 2026, enhancing integration and capabilities across devices.
Data science
fromInfoQ
3 weeks ago

Google Researchers Propose Bayesian Teaching Method for Large Language Models

Google researchers developed a training method enabling large language models to approximate Bayesian reasoning by learning from optimal Bayesian system predictions, improving belief updates during multi-step interactions.
Music production
fromEngadget
1 week ago

Google's Lyria 3 Pro can now generate AI music (slop) up to 3 minutes in length

Google's Lyria 3 Pro generates full three-minute songs with enhanced customization and understanding of musical composition.
#voice-ai
fromTechCrunch
1 month ago
Artificial intelligence

ElevenLabs CEO: Voice is the next interface for AI | TechCrunch

Voice is becoming the primary AI interface, enabling hands-free, agentic interactions across devices by combining expressive speech with large language model reasoning.
Medicine
fromwww.bbc.com
1 month ago

'My new AI voice keeps my personality alive'

AI technology enables a motor neurone disease patient to communicate using a reconstructed version of her own voice, restoring personal identity and family connection.
Data science
fromNature
3 weeks ago

AI can 'same-ify' human expression - can some brains resist its pull?

Large language models are homogenizing human writing styles, reasoning methods, and perspectives, potentially creating widespread sameness in discourse even among non-direct AI users.
Higher education
fromNews Center
1 month ago

AI Model Predicts Language Development in Children with Hearing Loss - News Center

Advanced machine learning models predict spoken language outcomes in children with cochlear implants more accurately than traditional approaches, enabling identification of at-risk patients for targeted interventions.
Artificial intelligence
fromTechCrunch
3 days ago

Anthropic is having a month | TechCrunch

Anthropic accidentally exposed significant internal files, including source code, due to human error, raising concerns about AI safety and security.
fromDEV Community
2 weeks ago

I Built a 100% Private, On-Device AI Audio Stem Splitter (No Servers!)

If you've ever used tools like PhonicMind or LALAL.AI, you know the drill: Upload your MP3. Wait in a queue. Pay for "credits" or high-quality downloads. Your file sits on someone else's server. For musicians, producers, or just karaoke fans, this is slow and privacy-invasive.
Music production
Venture
from24/7 Wall St.
1 month ago

SoundHound AI Stuns With 80% EPS Beat and Voice AI Expansion Keeps Accelerating

SoundHound AI narrowed quarterly losses to 2 cents per share, beating estimates by 79.65% and moving closer to breakeven while revenue grew 85% year-over-year to $84.7 million.
Music production
fromwww.scientificamerican.com
1 month ago

Experimental composer Holly Herndon built an AI voice clone that anyone can use

Holly Herndon uses machine learning and AI models to create protocol art, where the creative act occurs in designing rule sets and datasets rather than in final media generation, making collective creativity visible.
Marketing tech
fromThe Drum
1 month ago

Getting the first word in voice search

Voice search usage is growing, creating brand opportunities while requiring optimisation for accuracy, shopping trust, and adaptation to screenless interactions.
Artificial intelligence
fromInfoWorld
3 weeks ago

How developers can bring voice AI into telephony applications

Voice AI agents require complex infrastructure beyond LLMs to integrate with legacy telephony systems, demanding flexible architecture designed for component switching and evolution.
fromEngadget
2 months ago

Subtle's 'Voicebuds' use AI to transcribe your words below a whisper, or in very loud spaces

There's a good chance you spend more time talking to your phone's virtual assistant, or dictating text with your voice, instead of actually calling people these days. But, as convenient as voice input can be, you don't want to be the obnoxious person shouting commands to Siri in a quiet library. And you probably won't have much luck dictating an email in a room with toddlers screaming and Peppa Pig blaring on the TV. (Ask me how I know.)
Gadgets
Artificial intelligence
fromFortune
4 weeks ago

AI mastered language. The physical world is next | Fortune

Embodied AI advancement requires world modeling and physical understanding, constrained by scarcity of specific training data rather than compute or architecture limitations.
Artificial intelligence
fromTechCrunch
1 month ago

Claude Code rolls out a voice mode capability | TechCrunch

Anthropic launches Voice Mode for Claude Code, enabling developers to interact with the AI coding assistant through spoken commands, starting with 5% of users.
Apple
fromThe Verge
2 months ago

Apple's second biggest acquisition ever is an AI company that listens to 'silent speech'

Apple acquired AI audio startup Q.ai to integrate imaging and audio machine-learning for nonverbal/micro-movement recognition, enabling whispered-speech interfaces across AirPods, Vision Pro, iPhone, and Macs.
Gadgets
fromSpyglass
2 months ago

"Hello, Computer."

AI-driven advances are creating an inflection point that may finally enable practical, mainstream voice computing after years of partial progress and false starts.
Artificial intelligence
fromPsychology Today
1 month ago

An AI Voice Is Not a Mind

AI systems select and perform contextually appropriate personas rather than expressing unified selves with genuine beliefs, creating fluency that mimics mind without possessing interiority or conviction.
Artificial intelligence
fromwww.aljazeera.com
1 month ago

ElevenLabs CEO says voice AI will change everything. Can it be controlled?

Voice AI technology enables beneficial applications like speech restoration and accessibility while simultaneously creating risks for fraud, disinformation, and unauthorized voice cloning that raise fundamental questions about voice ownership and control.
Gadgets
fromTechCrunch
2 months ago

These AI notetaking devices can help you record and transcribe your meetings | TechCrunch

Physical AI notetakers record and transcribe in-person conversations, providing AI-generated summaries, action items, translations, and varied pricing or subscription options.
Artificial intelligence
fromwww.socialmediatoday.com
1 month ago

Google introduces next iteration of AI image generation model

Google launched Nano Banana 2, a unified AI image generation model combining previous capabilities with advanced world knowledge, real-time web search integration, and enhanced control features for faster, more accurate visual creation.
Gadgets
fromTechCrunch
2 months ago

Amazon's AI assistant comes to the web with Alexa.com | TechCrunch

Alexa+ now has a web interface at Alexa.com and an agent-forward mobile app, expanding AI assistant access and family-focused smart-home features.
fromFortune
1 month ago

We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune

By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like 'impossible,' they diverge sharply on hedge words like 'maybe.' For example, a model might use the word 'likely' to represent an 80% probability, while a human reader assumes it means closer to 65%.
Artificial intelligence
Artificial intelligence
fromBusiness Matters
2 months ago

Free AI Dubbing Tool with Audiobook Support - Convert Text to Speech Instantly

AI audiobook generators and dubbing engines let anyone convert text or video into realistic, human-like audio quickly, affordably, and across languages.
fromTechCrunch
2 months ago

Tiny startup Arcee AI built a 400B open source LLM from scratch to best Meta's Llama | TechCrunch

But tiny 30-person startup Arcee AI disagrees. The company just released a truly and permanently open (Apache license) general-purpose, foundation model called Trinity, and Arcee claims that at 400B parameters, it is among the largest open-source foundation models ever trained and released by a U.S. company. Arcee says Trinity compares to Meta's Llama 4 Maverick 400B, and Z.ai GLM-4.5, a high-performing open-source model from China's Tsinghua University, according to benchmark tests conducted using base models (very little post training).
Artificial intelligence
fromFast Company
1 month ago

Are LTMs the next LLMs? This new type of AI can do what large-language models can't

A major difference between LLMs and LTMs is the type of data they're able to synthesize and use. LLMs use unstructured data-think text, social media posts, emails, etc. LTMs, on the other hand, can extract information or insights from structured data, which could be contained in tables, for instance. Since many enterprises rely on structured data, often contained in spreadsheets, to run their operations, LTMs could have an immediate use case for many organizations.
Artificial intelligence
Artificial intelligence
fromBusiness Matters
1 month ago

AI voice company ElevenLabs valued at $11bn after $500m funding round

ElevenLabs raised $500 million, valuing the company at $11 billion and accelerating expansion in AI voice, multilingual dubbing, music generation, and enterprise adoption.
Artificial intelligence
fromTechCrunch
2 months ago

Google reportedly snags up team behind AI voice startup Hume AI | TechCrunch

Google DeepMind acquired Hume AI's CEO and key engineers to strengthen Gemini's voice capabilities while Hume continues licensing its voice-emotion technology to other firms.
Artificial intelligence
fromThe Verge
1 month ago

ByteDance's next-gen AI model can generate clips based on text, images, audio, and video

Seedance 2.0 generates up to 15-second multimodal videos combining text, images, video, and audio while modeling camera movement, visual effects, and motion.
fromTNW | Artificial-Intelligence
1 month ago

Stop talking to AI, let them talk to each other: The A2A protocol

Have you ever asked Alexa to remind you to send a WhatsApp message at a determined hour? And then you just wonder, 'Why can't Alexa just send the message herself? Or the incredible frustration when you use an app to plan a trip, only to have to jump to your calendar/booking website/tour/bank account instead of your AI assistant doing it all? Well, exactly this gap between AI automation and human action is what the agent-to-agent (A2A) protocol aims to address. With the introduction of AI Agents, the next step of evolution seemed to be communication. But when communication between machines and humans is already here, what's left?
Artificial intelligence
fromTechzine Global
1 month ago

Alibaba launches open source AI model RynnBrain for robotics

Alibaba has launched RynnBrain, an open source AI model that helps robots and smart devices perform complex tasks in the real world. The model combines spatial understanding with time awareness. Alibaba's DAMO Academy introduced the foundation model that enables interaction with the environment. RynnBrain can map objects, predict trajectories, and navigate in complex environments such as kitchens or factory halls. The system is trained on Alibaba's Qwen3-VL vision language model.
Artificial intelligence
[ Load more ]