NYC startup
fromInfoQ
4 days agoDirecting a Swarm of Agents for Fun and Profit
Netflix pioneered enterprise cloud usage, transitioning from credit card instances to formal AWS licensing.
We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'
The revamped Galaxy AI will have integration for AI agents at the OS level. This will help Samsung stay on top of the quickly evolving AI field with seamless integration of new AI agents in a way that is consistent with the familiar Galaxy experience. Since the integration is at the system level, you will be able to control the different agents without needing to switch between apps or repeat commands. This also gives the agent the context it needs for more natural interactions.
Frontier AI systems are simply not reliable enough to operate without human oversight in high-stakes physical environments. The Pentagon's demand was, in structural terms, a demand to eliminate the human's ability to redirect, halt, or override the system. Amodei's refusal was an insistence on maintaining State-Space Reversibility - the architectural commitment to keeping the human in the loop precisely because the system lacks the functional grounding to be trusted outside it.
Time pressure, limited information, confusion, fatigue, and mortality salience combine to set the stage for decision-making errors, sometimes with grave consequences. An example is the downing of Iran Air Flight 655 by a missile launched by the USS Vincennes in 1988, resulting in the death of 290 passengers and crew. In a time of heightened tension between the U.S. and Iran, the captain of the Vincennes misidentified the airliner as an incoming hostile aircraft and ordered his crew to shoot it down.
AI agents and other systems can't yet conduct cyberattacks fully on their own - but they can help criminals in many stages of the attack chain, according to the International AI Safety report. The second annual report, chaired by the Canadian computer scientist Yoshua Bengio and authored by more than 100 experts across 30 countries, found that over the past year, developers of AI systems have vastly improved their ability to help automate and perpetrate cyberattacks.
The best new co-op games are those that do something a bit different, offering more than a single-player experience with another player thoughtlessly tacked on. These multiplayer games account for groups of friends all wanting their own role, with a shared goal in sight and plenty of chaos on the path to getting there.
A dyad has three parts, not two: Partner A, Partner B, and the relationship or agreements between them. A dyad of two experts who cannot communicate clearly will often lose to a dyad of less-skilled individuals who coordinate effectively.
The team, which is being led by Jülich neurophysics professor Markus Diesmann, will leverage the Joint Undertaking Pioneer for Innovative and Transformative Exascale Research (JUPITER) supercomputer for their simulation. JUPITER is currently the fourth most powerful supercomputer in the world according to the TOP500 list, and features thousands of graphical processing units. The team demonstrated last month that a " spiking neural network " could be scaled up and run on JUPITER, effectively matching the cerebral cortex's 20 billion neurons and 100 trillion connections.
Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing. Initially I felt that the enormous corpus of pre-existing code would cement existing languages in place but now I'm starting to think the opposite is true. Here I want to outline my thinking on why we are going to see more new programming languages and why there is quite a bit of space for interesting innovation.
AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions:
Have you ever asked Alexa to remind you to send a WhatsApp message at a determined hour? And then you just wonder, 'Why can't Alexa just send the message herself? Or the incredible frustration when you use an app to plan a trip, only to have to jump to your calendar/booking website/tour/bank account instead of your AI assistant doing it all? Well, exactly this gap between AI automation and human action is what the agent-to-agent (A2A) protocol aims to address. With the introduction of AI Agents, the next step of evolution seemed to be communication. But when communication between machines and humans is already here, what's left?
When a scientist feeds a data set into a bot and says "give me hypotheses to test", they are asking the bot to be the creator, not a creative partner. Humans tend to defer to ideas produced by bots, assuming that the bot's knowledge exceeds their own. And, when they do, they end up exploring fewer avenues for possible solutions to their problem.