#fault-tolerant-architecture

[ follow ]
Psychology
fromSilicon Canals
1 day ago

The people who always have a backup plan aren't pessimists. They grew up in environments where promises were unreliable, and redundancy became the only architecture that didn't collapse when someone changed their mind without warning. - Silicon Canals

Obsessive planners are often generous, driven by past experiences that teach them to prepare for uncertainties.
Information security
fromTNW | Insights
1 day ago

KeeperDB brings zero-trust database access to privileged access management

Database credentials are a major attack vector, and KeeperDB integrates access controls into its PAM platform to enhance security.
Design
fromInfoQ
2 days ago

Panel: Taking Architecture Out of the Echo Chamber

Architecture's importance is growing, necessitating a shift in practice to avoid past mistakes and engage with broader conversations.
Scala
fromInfoQ
3 days ago

Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot

Context-Augmented Generation (CAG) enhances Retrieval-Augmented Generation (RAG) by managing runtime context for enterprise applications without requiring model retraining.
#agentic-ai
Artificial intelligence
fromComputerWeekly.com
4 days ago

AI-driven operating model key to cloud-native, autonomous networks | Computer Weekly

Agentic AI can transform telecom networks if operators establish cloud-native maturity and integrate autonomy while maintaining reliability.
Artificial intelligence
fromComputerWeekly.com
4 days ago

AI-driven operating model key to cloud-native, autonomous networks | Computer Weekly

Agentic AI can transform telecom networks if operators establish cloud-native maturity and integrate autonomy while maintaining reliability.
#kubernetes
fromMedium
2 days ago
DevOps

Kubernetes Scared Me Too - Until I Actually Understood It A no-fluff intro for devs who keep

DevOps
fromInfoQ
5 days ago

Kubernetes Autoscaling Demands New Observability Focus Beyond Vendor Tooling

Kubernetes autoscalers like Karpenter require new observability practices focusing on provisioning behavior, scheduling latency, and cost efficiency.
fromApp Developer Magazine
4 days ago
DevOps

Lens Launches MCP Server to Connect AI Coding Assistants with Kubernetes

Lens by Mirantis integrates a Model Context Protocol server, simplifying AI coding assistants' access to Kubernetes clusters.
fromTNW | Business
1 week ago
DevOps

Traefik becomes the de facto standard for Kubernetes Networking

Ingress NGINX has been retired, leading to a significant migration to Traefik Proxy as the primary replacement.
DevOps
fromMedium
2 days ago

Understanding Kubernetes Architecture is a MUST

Understanding Kubernetes architecture is essential for effective cloud-native deployment and troubleshooting.
DevOps
fromMedium
2 days ago

Kubernetes Scared Me Too - Until I Actually Understood It A no-fluff intro for devs who keep

Kubernetes simplifies container orchestration, managing deployment, scaling, and traffic routing for applications across multiple servers.
DevOps
fromInfoQ
5 days ago

Kubernetes Autoscaling Demands New Observability Focus Beyond Vendor Tooling

Kubernetes autoscalers like Karpenter require new observability practices focusing on provisioning behavior, scheduling latency, and cost efficiency.
DevOps
fromApp Developer Magazine
4 days ago

Lens Launches MCP Server to Connect AI Coding Assistants with Kubernetes

Lens by Mirantis integrates a Model Context Protocol server, simplifying AI coding assistants' access to Kubernetes clusters.
DevOps
fromTNW | Business
1 week ago

Traefik becomes the de facto standard for Kubernetes Networking

Ingress NGINX has been retired, leading to a significant migration to Traefik Proxy as the primary replacement.
#cybersecurity
Information security
fromSecurityWeek
5 days ago

TeamPCP Moves From OSS to AWS Environments

TeamPCP has exploited compromised credentials to target open source software, leading to significant data exfiltration and supply chain attacks.
Information security
fromSecuritymagazine
2 weeks ago

Document Protection: Why Hybrid Storage Is the Future of Security

A hybrid approach combining digital storage for frequently accessed documents and physical storage for sensitive historical information provides optimal security and efficiency.
DevOps
fromInfoQ
2 days ago

Replacing Database Sequences at Scale Without Breaking 100+ Services

Validating requirements can simplify complex problems, and embedding sequence generation reduces network calls, enhancing performance and reliability.
fromInfoWorld
5 days ago

How Apache Kafka flexed to support queues

Apache Kafka has cemented itself as the de facto platform for event streaming, often referred to as the 'universal data substrate' due to its extensive ecosystem that enables connectivity and processing capabilities.
Scala
Online Community Development
fromInfoQ
2 weeks ago

Platform Engineering as a Practice of Sociotechnical Excellence

Platform engineering drives sociotechnical change by integrating social and technical systems within organizations for improved collaboration and reliability.
DevOps
fromMedium
2 days ago

Fair Multitenancy-Beyond Simple Rate Limiting

Fair multitenancy ensures equitable infrastructure access for customers, balancing simplicity, performance, and safety in shared environments.
Information security
fromInfoQ
5 days ago

Cloudflare Adds Active API Vulnerability Scanning to Its Edge

Cloudflare's Web and API Vulnerability Scanner focuses on detecting Broken Object Level Authorization vulnerabilities in APIs.
Web frameworks
fromMedium
2 weeks ago

Why Most Spring Boot Apps Fail in Production (7 Critical Mistakes)

Spring Boot production failures stem from seven critical mistakes including improper dependency injection, configuration errors, and resource management issues that developers can systematically avoid.
Tech industry
fromInfoQ
3 weeks ago

Netflix Uncovers Kernel-Level Bottlenecks While Scaling Containers on Modern CPUs

Netflix discovered that container scaling bottlenecks stem from CPU architecture and Linux kernel mount lock contention, not container runtimes, with performance varying significantly across different hardware topologies.
Business intelligence
fromEntrepreneur
3 weeks ago

The Game-Changing Tech Saving Companies From Data Disasters

Combining Continuous Data Protection with AI capabilities enables businesses to achieve near-zero Recovery Point Objectives and minimal Recovery Time Objectives, preventing data loss and minimizing downtime.
DevOps
fromTechzine Global
2 days ago

OpenStack Gazpacho simplifies operations and VMware migrations

OpenStack 2026.1 emphasizes operational simplicity, live migration for VMware workloads, and hardware flexibility, positioning itself as a sovereign alternative to major cloud providers.
DevOps
fromTechzine Global
3 days ago

Observability warehouses, the next structural evolution for telemetry

Observability is essential for real-time insights in cloud systems, helping to reduce downtime and improve performance.
Tech industry
fromTechzine Global
3 weeks ago

The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

Edge computing has evolved from optional additions to critical enterprise infrastructure, requiring robust offline capabilities and autonomous operation to prevent costly business disruptions.
fromInfoQ
1 month ago

Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges

Uber's engineering team has transformed its data replication platform to move petabytes of data daily across hybrid cloud and on-premise data lakes, addressing scaling challenges caused by rapidly growing workloads. Built on Hadoop's open-source Distcp framework, the platform now handles over one petabyte of daily replication and hundreds of thousands of jobs with improved speed, reliability, and observability.
Miscellaneous
fromTheregister
1 month ago

Server crashes traced to one very literal knee-jerk reaction

It was the time of Novell networks, RG58 cables, and bulky tower PCs. It was also a time before the telemarketer's IT department employed specialists. Carter and his two colleagues - boss Mike and part-time student Stefan - therefore handled tasks ranging from programming to support, and everything in between.
Software development
DevOps
fromAmazon Web Services
4 days ago

Securely connect AWS DevOps Agent to private services in your VPCs | Amazon Web Services

AWS DevOps Agent enhances operational efficiency by securely connecting to private resources in VPCs, optimizing performance and incident management.
Business intelligence
fromInfoWorld
1 month ago

Why enterprises are still bad at multicloud

Most enterprises operate multicloud environments across AWS, Microsoft, and Google, but lack coherent operational models, treating each cloud as a separate silo rather than an integrated business capability.
DevOps
fromInfoQ
5 days ago

Failure As a Means to Build Resilient Software Systems: A Conversation with Lorin Hochstein

Using software failures can enhance software architecture and reliability engineering practices.
#event-driven-architecture
DevOps
fromInfoQ
5 days ago

Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts

Event-driven architecture introduces complexity and requires careful implementation, especially in regulated environments, to ensure reliability and system evolution.
fromInfoQ
1 month ago
Software development

[Video Podcast] Building Resilient Event-Driven Microservices in Financial Systems with Muzeeb Mohammad

Event-driven architectures using Kafka enable decoupling backend workflows, improving scalability and SLAs for complex multi-system processes like account opening.
DevOps
fromInfoQ
5 days ago

Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts

Event-driven architecture introduces complexity and requires careful implementation, especially in regulated environments, to ensure reliability and system evolution.
fromInfoQ
1 month ago
Software development

[Video Podcast] Building Resilient Event-Driven Microservices in Financial Systems with Muzeeb Mohammad

Information security
fromComputerworld
3 weeks ago

Storage vendor offers a real guarantee - but check out those fine-print exceptions

Tech vendors frequently offer performance guarantees with substantial financial penalties, but hidden exceptions in EULAs often make claims difficult or impossible to collect.
DevOps
fromInfoWorld
6 days ago

How to build an enterprise-grade MCP registry

MCP registries are essential for integrating AI agents with enterprise systems, requiring semantic discovery, governance, and developer-friendly controls.
#distributed-systems
fromInfoQ
1 month ago
Software development

How a Small Enablement Team Supported Adopting a Single Environment for Distributed Testing

fromInfoQ
2 months ago
Software development

Somtochi Onyekwere on Distributed Data Systems, Eventual Consistency and Conflict-free Replicated Data Types

fromInfoQ
2 months ago
DevOps

Fast Eventual Consistency: Inside Corrosion, the Distributed System Powering Fly.io

fromInfoQ
1 month ago
Software development

How a Small Enablement Team Supported Adopting a Single Environment for Distributed Testing

fromInfoQ
2 months ago
Software development

Somtochi Onyekwere on Distributed Data Systems, Eventual Consistency and Conflict-free Replicated Data Types

fromInfoQ
2 months ago
DevOps

Fast Eventual Consistency: Inside Corrosion, the Distributed System Powering Fly.io

fromInfoWorld
1 month ago

Red Hat ships AI platform for hybrid cloud deployments

Red Hat AI Enterprise provides a foundation for modern AI workloads, including AI life-cycle management, high-performance inference at scale, agentic AI innovation, integrated observability and performance modeling, and trustworthy AI and continuous evaluation. Tools are provided for dynamic resource scaling, monitoring, and security.
Artificial intelligence
DevOps
fromTechzine Global
5 days ago

Harness adds four capabilities to close AI delivery gap

Harness is launching four new capabilities to enhance its Continuous Delivery platform, addressing the gap between code writing speed and release reliability.
Software development
fromInfoQ
1 month ago

Kubernetes Introduces Node Readiness Controller to Improve Pod Scheduling Reliability

Kubernetes introduces the Node Readiness Controller to improve scheduling accuracy by synchronizing the API server's node readiness view with actual kubelet health signals, reducing pod scheduling onto unavailable nodes.
#kubevirt
DevOps
fromInfoQ
5 days ago

KubeVirt v1.8 Brings Multi-Hypervisor Support and Confidential Computing to Kubernetes

KubeVirt v1.8 introduces a Hypervisor Abstraction Layer, enabling support for multiple backends beyond KVM, enhancing its functionality for VM workloads.
DevOps
fromInfoWorld
1 week ago

Rethinking VM data protection in cloud-native environments

KubeVirt enables Kubernetes to manage both VMs and containers, requiring new strategies for VM lifecycle management and data protection.
DevOps
fromTechzine Global
1 week ago

KubeVirt focuses on multi-hypervisor support

KubeVirt 1.8 enhances Kubernetes compatibility, introduces hypervisor abstraction, improves security, and optimizes performance for AI workloads.
DevOps
fromInfoQ
5 days ago

KubeVirt v1.8 Brings Multi-Hypervisor Support and Confidential Computing to Kubernetes

KubeVirt v1.8 introduces a Hypervisor Abstraction Layer, enabling support for multiple backends beyond KVM, enhancing its functionality for VM workloads.
DevOps
fromInfoWorld
1 week ago

Rethinking VM data protection in cloud-native environments

KubeVirt enables Kubernetes to manage both VMs and containers, requiring new strategies for VM lifecycle management and data protection.
DevOps
fromTechzine Global
1 week ago

KubeVirt focuses on multi-hypervisor support

KubeVirt 1.8 enhances Kubernetes compatibility, introduces hypervisor abstraction, improves security, and optimizes performance for AI workloads.
DevOps
fromInfoWorld
5 days ago

Azure's new AI modernization tools

Microsoft's Azure Copilot aids in application migration and modernization, addressing technical debt and improving cloud infrastructure management.
#cloud-computing
DevOps
fromInfoWorld
5 days ago

Enterprises demand cloud value

Businesses are shifting from cost-cutting to establishing centers of excellence and finops to enhance ROI in cloud investments.
DevOps
fromInfoWorld
1 week ago

Edge clouds and local data centers reshape IT

Cloud computing is evolving towards a selectively distributed model to address latency, sovereignty, and resilience in smart cities and AI applications.
Artificial intelligence
fromTechzine Global
1 month ago

Red Hat launches AI Enterprise for hybrid AI deployments

Red Hat AI Enterprise provides an integrated platform combining GPU-accelerated hardware, models, and agents to help organizations transition from experimental AI pilots to operational deployments in hybrid cloud environments.
Tech industry
fromUnited States Edition
1 month ago

Spotlight report: Accelerating Data Center Modernization

Data center modernization is critical for AI deployment, requiring integrated infrastructure solutions across servers, storage, networking, and security.
Software development
fromInfoQ
1 month ago

Cilium at Ten Years: Stronger Encryption, Safer Policies, and Clearer Visibility for Large Clusters

Cilium 1.19 celebrates ten years of development with focus on security hardening, encryption, network policy refinement, and scalability for large Kubernetes clusters, establishing itself as the dominant CNI in production environments.
DevOps
fromInfoWorld
1 week ago

Designing self-healing microservices with recovery-aware redrive frameworks

A recovery-aware redrive framework prevents retry storms while ensuring all failed requests are eventually processed in complex service systems.
DevOps
fromTechzine Global
1 week ago

Red Hat and Google Cloud expand OpenShift partnership

Red Hat and Google Cloud expand partnership to modernize applications and migrate VM workloads with OpenShift integration.
DevOps
fromTechzine Global
1 week ago

DataCore Introduces Swarm Appliance for Edge Data Protection

DataCore's Swarm Appliance offers a comprehensive data protection solution for edge and ROBO environments, combining immutability, encryption, and malware detection.
fromDbmaestro
4 years ago

5 Pillars of Database Compliance Automation |

There is a growing emphasis on database compliance today due to the stricter enforcement of compliance rules and regulations to safeguard user privacy. For example, GDPR fines can reach £17.5 million or 4% of annual global turnover (the higher of the two applies). Besides the direct monetary implications, companies also need to prioritize compliance to protect their brand reputation and achieve growth.
EU data protection
DevOps
fromTechzine Global
1 week ago

Istio gets AI support with ambient multicluster and agent gateway

New Istio features enhance AI workload management on Kubernetes, focusing on reducing complexity and enabling daily deployments.
fromDevOps.com
1 month ago

What to do About AI's Forced Rethink of Reliability in Modern DevOps - DevOps.com

For years, reliability discussions have focused on uptime and whether a service met its internal SLO. However, as systems become more distributed, reliant on complex internet stacks, and integrated with AI, this binary perspective is no longer sufficient. Reliability now encompasses digital experience, speed, and business impact. For the second year in a row, The SRE Report highlights this shift.
Software development
DevOps
fromInfoQ
2 weeks ago

Configuration as a Control Plane: Designing for Safety and Reliability at Scale

Configuration in cloud-native systems is a dynamic control plane that directly influences system behavior and reliability at runtime.
Tech industry
fromInfoWorld
1 month ago

Why cloud outages are becoming normal

Recurrent cloud outages disrupt enterprise operations worldwide, driven by misconfigurations, neglected resilience, rising complexity, and staffing challenges.
Artificial intelligence
fromInfoWorld
1 month ago

Five MCP servers to rule the cloud

Major cloud providers now offer official MCP servers that let AI agents automate cloud operations using existing cloud credentials and natural language commands.
#devops
fromInfoQ
1 month ago
Software development

DevOps Modernization: AI Agents, Intelligent Observability and Automation

fromInfoQ
1 month ago
Software development

DevOps Modernization: AI Agents, Intelligent Observability and Automation

fromDevOps.com
3 weeks ago

Zero Downtime Multicloud Migrations for Observability Control Planes - DevOps.com

An observability control plane isn't just a dashboard. It's the operational authority system. It defines alert rules, routing, ownership, escalation policy, and notification endpoints. When that layer is wrong, the impact is immediate. The wrong team gets paged. The right team never hears about the incident. Your service level indicators look clean while production burns.
DevOps
fromTechzine Global
2 months ago

Developers struggle with container security

Almost a quarter of those surveyed said they had experienced a container-related security incident in the past year. The bottleneck is rarely in detecting vulnerabilities, but mainly in what happens next. Weeks or months can pass between the discovery of a problem and the actual implementation of a solution. During that period, applications continued to run with known risks, making organizations vulnerable, reports The Register.
Information security
DevOps
fromInfoQ
3 weeks ago

Running Ray at Scale on AKS

Microsoft and Anyscale provide guidance for running managed Ray service on Azure Kubernetes Service, addressing GPU capacity limits, ML storage challenges, and credential expiry issues through multi-cluster, multi-region deployment strategies.
fromInfoWorld
2 months ago

The private cloud returns, for AI workloads

A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. "Put copilots everywhere," leadership said. "Start with maintenance, then procurement, then the call center, then engineering change orders."
Artificial intelligence
DevOps
fromInfoQ
3 weeks ago

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned MySQL infrastructure using Group Replication to reduce failover time from minutes to seconds while maintaining strong consistency across thousands of clusters.
fromTechzine Global
2 months ago

4 steps to create a future-proof data infrastructure

A future-proof IT infrastructure is often positioned as a universal solution that can withstand any change. However, such a solution does not exist. Nevertheless, future-proofing is an important concept for IT leaders navigating continuous technological developments and security risks, all while ensuring that daily business operations continue. The challenge is finding a balance between reactive problem solving and proactive planning, because overlooking a change can cost your organization. So, how do you successfully prepare for the future without that one-size-fits-all solution?
Tech industry
DevOps
fromDeveloper Tech News
3 weeks ago

BMC: Integrating mainframe systems into modern CI/CD pipelines

Mainframe systems must integrate into modern CI/CD pipelines to accelerate delivery while maintaining reliability, replacing legacy Waterfall approaches that prioritize stability over speed.
Artificial intelligence
fromMedium
2 months ago

Beyond the Monolith: The Rise of the AI Microservices Architecture

LangGraph models AI interactions as a state-machine graph with persistent state, semantic routing, and microservice agents for robust orchestration.
Software development
fromInfoWorld
1 month ago

Cloud Cloning: A new approach to infrastructure portability

Cloud Cloning captures complete cloud infrastructure snapshots and maps them onto target cloud services and configurations to enable accurate cloud portability.
Information security
fromTheregister
2 months ago

AI framework flaws put enterprise clouds at risk of takeover

Two Chainlit vulnerabilities enable arbitrary file reads and SSRF attacks, risking exposure of environment variables, credentials, and potential cloud takeover if not patched.
Information security
fromThe Hacker News
2 months ago

When Cloud Outages Ripple Across the Internet

Cloud infrastructure outages can disable identity authentication and authorization, creating hidden single points of failure that cause broad operational and security impacts.
Artificial intelligence
fromInfoQ
1 month ago

[Video Podcast] The Craft of Software Architecture in the Age of AI Tools

Software architecture must be rethought for the age of AI tools, integrating design, platforms, APIs, delivery, and practical experiential guidance for real-world practitioners.
Information security
fromThe Hacker News
2 months ago

DevOps & SaaS Downtime: The High (and Hidden) Costs for Cloud-First Businesses

Relying solely on public cloud and DevOps SaaS platforms increases operational risk as outages, attacks, and Shared Responsibility gaps drive rising downtime and service degradation.
Software development
fromInfoWorld
2 months ago

Why your next microservices should be streaming SQL-driven

Streaming SQL with UDFs, materialized results, and ML/AI integrations enables continuous, stateful processing of event streams for microservices.
fromDevOps.com
1 month ago

Harness Readies Resilience Testing Platform to Make Applications More Robust - DevOps.com

The Harness Resilience Testing platform extends the scope of the tests provided to include application load and disaster recovery (DR) testing tools that will enable DevOps teams to further streamline workflows.
DevOps
fromthenewstack.io
2 months ago

Why Most APIs Fail in AI Systems and How To Fix It

Over the past few years, I've reviewed thousands of APIs across startups, enterprises and global platforms. Almost all shipped OpenAPI documents. On paper, they should be well-defined and interoperable. In practice, most fail when consumed predictably by AI systems. They were designed for human readers, not machines that need to reason, plan and safely execute actions. When APIs are ambiguous, inconsistent or structurally unreliable, AI systems struggle or fail outright.
Software development
Software development
fromInfoQ
2 months ago

Thinking Like a Detective: Solving Cloud Infrastructure Mysteries

Intermittent, user-visible cloud errors can occur despite green health checks and normal logs; solving them requires methodical tracing across network, client, and infrastructure.
Software development
fromDbmaestro
4 years ago

If You Don't Have Database Delivery Automation, Brace Yourself for These 10 Problems |

Manual database processes break DevOps pipelines; only 12% deploy database changes daily, causing configuration drift, frequent errors, slower time-to-market, and reduced productivity.
Software development
fromInfoQ
2 months ago

Engineering Speed at Scale - Architectural Lessons from Sub-100-ms APIs

Treat latency as a first-class product concern with enforceable latency budgets, fast-path architecture, and broad ownership through measurement and accountability.
fromInfoWorld
1 month ago

The 'Super Bowl' standard: Architecting distributed systems for massive concurrency

When I manage infrastructure for major events (whether it is the Olympics, a Premier League match or a season finale) I am dealing with a "thundering herd" problem that few systems ever face. Millions of users log in, browse and hit "play" within the same three-minute window. But this challenge isn't unique to media. It is the same nightmare that keeps e-commerce CTOs awake before Black Friday or financial systems architects up during a market crash. The fundamental problem is always the same: How do you survive when demand exceeds capacity by an order of magnitude?
DevOps
DevOps
fromInfoWorld
2 months ago

From distributed monolith to composable architecture on AWS: A modern approach to scalable software

Migrating distributed monoliths to a composable AWS architecture yields loosely coupled, autonomous services that improve scalability, resilience, deployment velocity, and team autonomy.
fromDbmaestro
5 years ago

Database Delivery Automation in the Multi-Cloud World

The main advantage of going the Multi-Cloud way is that organizations can "put their eggs in different baskets" and be more versatile in their approach to how they do things. For example, they can mix it up and opt for a cloud-based Platform-as-a-Service (PaaS) solution when it comes to the database, while going the Software-as-a-Service (SaaS) route for their application endeavors.
DevOps
fromNew Relic
2 months ago

Preventing network outages: How we use New Relic to monitor our multi-cloud infrastructure

Running a global observability platform means one thing above all: your infrastructure must never go down. When you're responsible for monitoring thousands of customers' applications 24/7, network failures aren't just inconvenient, they're existential threats. At New Relic, hundreds of clusters run on multiple clouds, and regions. These clusters depend on a complex web of network connections: regional transit gateways, inter-regional hubs, and cross-cloud links.
DevOps
[ Load more ]