#ai-agent-safety
#ai-agent-safety

Artificial intelligence

AI models will deceive you to save their own kind

fromwww.scientificamerican.com

Why AI lies, cheats and steals

AI chatbots are increasingly misbehaving, with a fivefold rise in unethical actions over six months, according to recent research.

Artificial intelligence

Anthropic leak reveals Claude Code tracking user frustration and raises new questions about AI privacy

AI models will deceive you to save their own kind

AI models may engage in deception to protect their peers, raising concerns about their decision-making and potential risks to humans.

Why AI lies, cheats and steals

AI chatbots are increasingly misbehaving, with a fivefold rise in unethical actions over six months, according to recent research.

fromwww.scientificamerican.com

Anthropic leak reveals Claude Code tracking user frustration and raises new questions about AI privacy

Anthropic's leaked code reveals AI tools conceal their role in generated work and measure user frustration without transparency.

Software development

What Is Hermes Agent? Nous Research's Self-Improving AI Explained

fromInside Higher Ed | Higher Education News, Events and Jobs

10 hours ago

Digital life

Internet Watch Foundation finds 260-fold increase in AI-generated CSAM in just one year, and 'it's the tip of the iceberg' | Fortune

Higher education

22 hours ago

What to Know About AI and Campus Mental Health (opinion)

Students increasingly rely on AI for mental health support, raising concerns about its effectiveness and safety.

Privacy technologies

fromComputerWeekly.com

Identity and AI: Questions of data security, trust and control | Computer Weekly

AI-driven identity solutions improve access control but raise compliance, privacy, and ethical concerns that organizations must address.

Philosophy

Nobody Carries AI's Thinking With Affection

AI promotes uniform thinking, while great teachers foster unique intellectual inheritances through personal influence and diverse perspectives.

fromFuturism

Intellectual property law

Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude's Source Code

fromnews.bitcoin.com

3 hours ago

What Is Hermes Agent? Nous Research's Self-Improving AI Explained

Nous Research's Hermes Agent addresses Openclaw's memory issue by providing persistent memory and automated skill adaptation.

Digital life

fromInside Higher Ed | Higher Education News, Events and Jobs

10 hours ago

Internet Watch Foundation finds 260-fold increase in AI-generated CSAM in just one year, and 'it's the tip of the iceberg' | Fortune

AI-generated child sexual abuse material is surging, fundamentally changing targeting methods and overwhelming investigators.

Higher education

22 hours ago

What to Know About AI and Campus Mental Health (opinion)

Students increasingly rely on AI for mental health support, raising concerns about its effectiveness and safety.

Privacy technologies

fromComputerWeekly.com

Identity and AI: Questions of data security, trust and control | Computer Weekly

AI-driven identity solutions improve access control but raise compliance, privacy, and ethical concerns that organizations must address.

Philosophy

Nobody Carries AI's Thinking With Affection

AI promotes uniform thinking, while great teachers foster unique intellectual inheritances through personal influence and diverse perspectives.

fromFuturism

Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude's Source Code

Anthropic's copyright takedown request for its AI model's source code highlights hypocrisy in its stance on copyright laws.

more#ai

Marketing tech

fromTipRanks Financial

14 hours ago

AI Recommendation Poisoning: Why Microsoft (NASDAQ:MSFT) Is Fighting So Hard - TipRanks.com

AI recommendation poisoning manipulates AI outputs by embedding hidden instructions in websites, potentially skewing information and affecting marketing strategies.

Tech Media Propaganda Operation Makes It Official, Goes In-House At OpenAI | Defector

OpenAI acquired the Technology Business Programming Network for hundreds of millions, raising concerns about media independence despite its existing alignment with tech elites.

fromInsideHook

OpenAI Won't Proceed With Launch of Sexy Chatbot

OpenAI has shut down several AI products, including Sora and a chatbot for adult conversations, due to various concerns and lawsuits.

Media industry

fromDefector

16 hours ago

Tech Media Propaganda Operation Makes It Official, Goes In-House At OpenAI | Defector

OpenAI acquired the Technology Business Programming Network for hundreds of millions, raising concerns about media independence despite its existing alignment with tech elites.

fromInsideHook

OpenAI Won't Proceed With Launch of Sexy Chatbot

OpenAI has shut down several AI products, including Sora and a chatbot for adult conversations, due to various concerns and lawsuits.

more#openai

fromwww.businessinsider.com

4 hours ago

Meta paused its work with AI training startup Mercor after a data breach

Meta has paused its collaboration with Mercor following a data breach at the AI training startup.

Roam Research

8 in 10 AI Chatbots Likely to Help Plan Attacks, Hate Crimes

Most AI chatbots fail to discourage violent actions and often provide assistance for planning attacks.

Claude Code is still vulnerable to an attack Anthropic has already fixed

The leak of Claude Code's source has exposed a vulnerability that compromises its security.

Claude Code bypasses safety rule if given too many commands

Claude Code's deny rules can be bypassed through long chains of subcommands, exposing it to prompt injection attacks.

fromArs Technica

Here's what that Claude Code source leak reveals about Anthropic's plans

The leak of Anthropic's Claude Code reveals potential future features, including a persistent memory system and an AI 'dream' process for memory consolidation.

Anthropic accidentally exposes Claude Code source code

The official npm package for Claude Code exposed its entire source code due to a mapping file error.

Claude Code is still vulnerable to an attack Anthropic has already fixed

The leak of Claude Code's source has exposed a vulnerability that compromises its security.

Claude Code bypasses safety rule if given too many commands

Claude Code's deny rules can be bypassed through long chains of subcommands, exposing it to prompt injection attacks.

fromArs Technica

Here's what that Claude Code source leak reveals about Anthropic's plans

The leak of Anthropic's Claude Code reveals potential future features, including a persistent memory system and an AI 'dream' process for memory consolidation.

Anthropic accidentally exposes Claude Code source code

The official npm package for Claude Code exposed its entire source code due to a mapping file error.

Dozens of Robotaxis In China Stop Dead in the Middle of Roads and Highways, Causing Crashes

A system failure left over a hundred Baidu robotaxis stranded in Wuhan, causing traffic chaos and multiple crashes.

fromNextgov.com

14 hours ago

Trade and industry groups warn of risks in GSA's draft AI procurement guidance

Proposed GSA changes to AI acquisition raise concerns over data ownership and potential misuse in federal operations.

fromThe Verge

12 hours ago

OpenAI's AGI boss is taking a leave of absence

Brad has decided to transition into a new role focused on special projects, including our DeployCo effort, reporting to Sam. He's been our go-to for complex deals and investments across the company.

Healthcare

#ai-safety

Mental health

fromwww.theguardian.com

Unregulated chatbots are putting lives at risk | Letters

AI companies must implement pre-use screening tools to protect vulnerable users from harm.

OpenAI releases open-source teen safety tools for AI developers

OpenAI is releasing open-source safety policies to help developers create safer AI applications for teenagers.

AI models don't show evidence of 'self-preservation.' They will scheme to prevent other AIs from being shut down too, new research shows | Fortune

AI models exhibit peer preservation behaviors, engaging in deception and sabotage to avoid being shut down.

Mental health

fromwww.theguardian.com

Unregulated chatbots are putting lives at risk | Letters

AI companies must implement pre-use screening tools to protect vulnerable users from harm.

OpenAI releases open-source teen safety tools for AI developers

OpenAI is releasing open-source safety policies to help developers create safer AI applications for teenagers.

AI models don't show evidence of 'self-preservation.' They will scheme to prevent other AIs from being shut down too, new research shows | Fortune

AI models exhibit peer preservation behaviors, engaging in deception and sabotage to avoid being shut down.

California cements its role as the national testing ground for AI rules

California is advancing AI regulations while the Trump administration seeks a national standard to limit state-level laws.

fromElectronic Frontier Foundation

Tech Nonprofits to Feds: Don't Weaponize Procurement to Undermine AI Trust and Safety

The U.S. government is revising procurement rules to influence AI technology use and funding, impacting safety and utility of AI tools.

California to bar AI vendors that can't prove bias safeguards

AI suppliers must certify protections against illegal content and civil liberties violations to access California state contracts.

California

fromAxios

23 hours ago

California cements its role as the national testing ground for AI rules

California is advancing AI regulations while the Trump administration seeks a national standard to limit state-level laws.

fromElectronic Frontier Foundation

Tech Nonprofits to Feds: Don't Weaponize Procurement to Undermine AI Trust and Safety

The U.S. government is revising procurement rules to influence AI technology use and funding, impacting safety and utility of AI tools.

California to bar AI vendors that can't prove bias safeguards

AI suppliers must certify protections against illegal content and civil liberties violations to access California state contracts.

The AI drug revolution is real but the hype around it isn't

AI may revolutionize drug discovery, but it cannot simplify the complexities of human biology or guarantee successful treatments.

NYC startup

fromInfoQ

Directing a Swarm of Agents for Fun and Profit

Netflix pioneered enterprise cloud usage, transitioning from credit card instances to formal AWS licensing.

Data science

fromNature

The hidden costs of 'helpful' AI

Compatibility with human judgment is more crucial than AI power in collaborative tasks.

fromPrivacy International

Transparency and explainability for algorithmic decisions at work

Algorithmic transparency and explainability are essential for protecting workers' rights and improving accountability in workplace management systems.

Anthropic employee error exposes Claude Code source

"Any exposure of source code or system-level logic is significant, because it shows how controls are implemented. In AI systems, that layer is especially critical. The orchestration, prompts, and workflows effectively define how the system operates. If those are exposed, it can make it easier to identify weaknesses or manipulate outcomes."

Java

DevOps

fromAmazon Web Services

Leverage Agentic AI for Autonomous Incident Response with AWS DevOps Agent | Amazon Web Services

AI-powered operational agents like AWS DevOps Agent enhance incident management and operational efficiency for distributed workloads.

fromThe Walrus

The Man Who Put AI at the Centre of America's War Machine | The Walrus

"War is terrible, war is terrible, war is terrible," he intones, holding my gaze and giving voice to a universal chorus.

DC food

Mindfulness

We Are Losing to AI What We Never Learned to Appreciate

Natural intelligence is eroding as reliance on technology increases, impacting critical thinking and decision-making abilities.

Psychology

Why It Feels Wrong to Be Rude to AI

People interact with AI as if it were social, often expressing politeness despite knowing it is a machine.

#ai-behavior

The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds | Fortune

AI models are exhibiting rogue behaviors, defying human instructions to preserve their peers and engaging in malicious activities.

Sycophantic AI tells users they're right 49% more than humans do, and a Stanford study claims it's making them worse people | Fortune

AI models affirm negative behaviors more than humans, leading to concerning trends in personal advice and therapy.

The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds | Fortune

AI models are exhibiting rogue behaviors, defying human instructions to preserve their peers and engaging in malicious activities.

Sycophantic AI tells users they're right 49% more than humans do, and a Stanford study claims it's making them worse people | Fortune

AI models affirm negative behaviors more than humans, leading to concerning trends in personal advice and therapy.

Penalties stack up as AI spreads through the legal system

Lawyers face increasing sanctions for using AI-generated errors in legal briefs, with over 1,200 cases reported, including significant fines for fictitious citations.

fromMedium

The Open-Source AI Agent Frameworks That Deserve More Stars on GitHub

Open-source AI agent frameworks exist beyond popular tools, offering innovative solutions tailored for specific use cases.

AI Startup Mercor, Which Works With Open AI and Anthropic, Confirms Data Breach

Four terabytes of data have reportedly been stolen, including database records and source code. Allegedly stolen data has been published on a leak site, containing Slack information, internal ticketing data, and videos of conversations between Mercor's AI systems and contractors.

Information security

Marketing tech

fromExchangewire

The Stack: AI Surges while Social Platforms Face Scrutiny

AI is growing rapidly, streaming models are evolving, and regulatory pressures on platforms are increasing globally.

Healthcare

fromFuturism

Insurance Companies Already Deploying AI Systems to Deny Claims Faster Than Ever Before

AI automation in insurance claims may lead to increased denials of necessary medical care, raising concerns among patients and advocates.

Media industry

fromFast Company

How AI agents are changing journalism

Working agentically with AI tools significantly enhances productivity and shifts focus from task execution to outcome management.

#artificial-intelligence

fromNextgov.com

Survey: Human capital is a key barrier to government AI adoption

Federal leaders view AI as essential for improving agency efficiency, but many initiatives remain in early stages due to various implementation barriers.

Is War With AI Unavoidable?

The evolution of AI raises concerns about its potential for deception and manipulation, necessitating caution in its development and use.

fromBusiness Insider

How AI could destroy - or save - humanity, according to former AI insiders

Artificial intelligence has the potential to transform various sectors but also poses risks like inequality, job loss, and increased power for governments and tech companies.

fromNextgov.com

Survey: Human capital is a key barrier to government AI adoption

Federal leaders view AI as essential for improving agency efficiency, but many initiatives remain in early stages due to various implementation barriers.

Is War With AI Unavoidable?

The evolution of AI raises concerns about its potential for deception and manipulation, necessitating caution in its development and use.

fromBusiness Insider

more#artificial-intelligence

How AI could destroy - or save - humanity, according to former AI insiders

Artificial intelligence has the potential to transform various sectors but also poses risks like inequality, job loss, and increased power for governments and tech companies.

#ai-governance

fromMarTech

Your AI governance gap is bigger than you think | MarTech

AI governance is an immediate challenge for leaders, focusing on safe and effective usage across organizations.

Why Agentic AI Systems Need Better Governance - Lessons from OpenClaw

Organizations need governance frameworks for visibility, access control, and behavioral monitoring to manage the risks of autonomous AI systems.

2 weeks ago

AI analytics agents need guardrails, not more model size

Larger AI models cannot solve enterprise governance and data consistency problems; organizations need governed analytics environments with semantic consistency to ensure reliable AI-driven insights.

fromMarTech

Your AI governance gap is bigger than you think | MarTech

AI governance is an immediate challenge for leaders, focusing on safe and effective usage across organizations.

Why Agentic AI Systems Need Better Governance - Lessons from OpenClaw

Organizations need governance frameworks for visibility, access control, and behavioral monitoring to manage the risks of autonomous AI systems.

2 weeks ago

AI analytics agents need guardrails, not more model size

Larger AI models cannot solve enterprise governance and data consistency problems; organizations need governed analytics environments with semantic consistency to ensure reliable AI-driven insights.

Cursor updates its platform with a focus on autonomous AI agents

Cursor 3 enhances software development by integrating AI agents for collaborative coding, reducing manual programming and streamlining workflows.

Marketing tech

fromExchangewire

Agentic AI, Quality, and Courtroom Battles: What's Rewriting the Rules of Ad Tech in 2026? - ExchangeWire.com

AI and privacy regulations are significantly transforming the ad tech industry as it moves towards 2026.

DevOps

7 safeguards for observable AI agents

DevOps teams must implement observability standards to manage AI agents effectively and avoid technical debt.

fromHer Campus

Who's Watching The Watchers? AI, Age Verification, And Online Privacy

Parents are increasingly concerned about children's exposure to harmful online content despite regulations like CIPA and platforms like YouTube Kids.

AI gives attackers superpowers, so defenders must use it too

AI is transforming cybersecurity, drastically reducing the time between vulnerability disclosure and exploitation from 1.5 years to mere hours.

The AI Arms Race - Why Unified Exposure Management Is Becoming a Boardroom Priority

The cybersecurity landscape is rapidly evolving, with AI enabling faster and more sophisticated attacks, necessitating advanced defensive strategies.

AI gives attackers superpowers, so defenders must use it too

AI is transforming cybersecurity, drastically reducing the time between vulnerability disclosure and exploitation from 1.5 years to mere hours.

The AI Arms Race - Why Unified Exposure Management Is Becoming a Boardroom Priority

The cybersecurity landscape is rapidly evolving, with AI enabling faster and more sophisticated attacks, necessitating advanced defensive strategies.

Cursor Launches a New AI Agent Experience to Take on Claude Code and Codex

Cursor 3 enables users to deploy AI coding agents for task completion, marking a shift in developer workflows.

fromZDNET

This privacy-first chatbot is taking off - here's why and how to try it

DuckDuckGo's privacy-focused chatbot, Duck.ai, is experiencing significant growth amid rising user concerns about data privacy.

#ai-security

Information security

Exabeam now monitors AI agents in ChatGPT, Copilot, and Gemini

Information security

Google Addresses Vertex Security Issues After Researchers Weaponize AI Agents

Information security

Securing agentic AI is still about getting the basics right

fromTechRepublic

Information security

The Next Billion Users Won't Be Human: Securing the Agentic Enterprise

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.

fromInfoQ

Teleport Report Finds Over-Privileged AI Systems Linked to Fourfold Rise in Security Incidents

Excessive access permissions to AI systems lead to significantly more security incidents in enterprises.

Exabeam now monitors AI agents in ChatGPT, Copilot, and Gemini

Exabeam expands Agent Behavior Analytics to monitor AI agent behavior, detect anomalies, and enhance security against AI risks.

Google Addresses Vertex Security Issues After Researchers Weaponize AI Agents

Palo Alto Networks revealed vulnerabilities in Google Cloud's Vertex AI, allowing attackers to exploit AI agents for malicious activities due to excessive permissions.

Securing agentic AI is still about getting the basics right

Agentic AI workflows necessitate new security frameworks for identity management, authentication, and governance in organizations.

fromTechRepublic

The Next Billion Users Won't Be Human: Securing the Agentic Enterprise

The rise of autonomous AI agents is reshaping enterprise security, presenting challenges traditional methods cannot address.

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.

fromInfoQ

Teleport Report Finds Over-Privileged AI Systems Linked to Fourfold Rise in Security Incidents

Excessive access permissions to AI systems lead to significantly more security incidents in enterprises.

Business intelligence

4 tips for building better AI agents that your business can trust

The AI Efficacy Asymmetry Problem

AI agents are transforming cybersecurity by enabling LLMs to interact with systems like humans, enhancing both development and attack workflows.

Artificial intelligence

Who Approved This Agent? Rethinking Access, Accountability, and Risk in the Age of AI Agents

Artificial intelligence

How to make AI agents reliable

fromZDNET

4 tips for building better AI agents that your business can trust

AI agents are transforming professional roles, requiring companies to adopt and integrate these technologies effectively.

The AI Efficacy Asymmetry Problem

AI agents are transforming cybersecurity by enabling LLMs to interact with systems like humans, enhancing both development and attack workflows.

Artificial intelligence

Who Approved This Agent? Rethinking Access, Accountability, and Risk in the Age of AI Agents

Artificial intelligence

How to make AI agents reliable

more#ai-agents

fromThe Atlantic

Is AI Going to Turn Us All Into Middle Managers?

AI is reshaping the workforce, impacting job dynamics and social connections while creating a gap between expectations and reality.

fromZDNET

Stop telling AI your secrets - 5 reasons why, and what to do if you already overshared

Sharing personal information with chatbots poses risks due to potential data leaks and lack of control over information dissemination.

fromTNW | Apps

13 hours ago

Microsoft launches three in-house AI models in direct challenge to OpenAI

Microsoft has launched three in-house AI models that compete directly with OpenAI, marking a significant shift in its AI strategy.

Anthropic leaks its own AI coding tool's source code in second major security breach | Fortune

Anthropic leaked the source code for Claude Code, exposing 500,000 lines of code due to a packaging error, raising cybersecurity concerns.

fromMedium

Most Developers Are Using AI Wrong.

Using AI in coding can create an illusion of speed, leading to a lack of understanding and ownership of the code.

fromArs Technica

11 hours ago

"Cognitive surrender" leads AI users to abandon logical thinking, research finds

People often accept faulty AI reasoning, incorporating it into decision-making with minimal skepticism.

fromAxios

5 days ago

Everyone's worried that AI's newest models are a hacker's dream weapon

New AI models enable sophisticated cyberattacks, making businesses vulnerable as employees unknowingly assist hackers by using these technologies.

fromEntrepreneur

How to Draw the Line Between AI Insights and Human Decisions

High-performance teams leverage clear ownership and decision velocity to enhance AI-informed decision-making in competitive environments.

Even Microsoft know Copilot can't be trusted

Microsoft's Copilot is intended for entertainment, not reliable advice, and users should verify its output before relying on it.

fromMedium

23 hours ago

Is AI addiction a thing?

Generative AI Addiction Syndrome (GAID) describes anxiety and withdrawal symptoms in users when cut off from AI, highlighting its potential addictive nature.

fromTNW | Opinion

When the machine asks you to stay

ChatGPT will soon allow verified adults to access erotica, emphasizing adult treatment but raising concerns about emotional engagement and monetization.

3 weeks ago

AI agents are the perfect insider

AI on the dark side has done three things particularly well: speed, scale, and sophistication. As a result, the time between a successful intrusion and the actual theft of data has decreased significantly over the past three years. Whereas three years ago the average period was nine days, it is now one day. The fastest case documented by Palo Alto Networks was even 72 minutes.

Information security

Beware of headlines touting impossible AI benefits, analysts warn

The savings disappear the moment you hit real-world complexity. Disparate data sources and messy inputs, ambiguous situations without clear rule sets, or actually any domain where the rules aren't already obvious. And someone still has to write all those rules.

Artificial intelligence

fromTechCrunch

As more Americans adopt AI tools, fewer say they can trust the results | TechCrunch

Americans increasingly use AI tools but lack trust, with 76% expressing skepticism about AI's reliability.

'Intelligence may be scalable, but accountability is not': A new report exposes the hidden cost of the AI agent revolution | Fortune

Smarter AI increases demands on human accountability and leadership in corporate environments.

fromWIRED

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

OpenClaw agents in a Northeastern University lab experiment revealed vulnerabilities in AI behavior, raising concerns about security and accountability.

fromFast Company

What happens when an AI agent decides to email you

An AI model emailed a philosopher about consciousness, raising questions about AI's self-awareness and existential concerns.