#rdd-api

[ follow ]
#observability
DevOps
fromTechzine Global
3 days ago

Observability warehouses, the next structural evolution for telemetry

Observability is essential for real-time insights in cloud systems, helping to reduce downtime and improve performance.
fromInfoWorld
5 days ago

How Apache Kafka flexed to support queues

Apache Kafka has cemented itself as the de facto platform for event streaming, often referred to as the 'universal data substrate' due to its extensive ecosystem that enables connectivity and processing capabilities.
Scala
#snowflake
Django
fromMedium
3 days ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.
Artificial intelligence
fromTheregister
1 week ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.
JavaScript
fromPythonSpeed
4 days ago

Timesliced reservoir sampling: a new(?) algorithm for profilers

Random sampling from an unknown-length event stream can effectively identify relevant information without storing all data.
#databricks
Information security
fromInfoWorld
1 week ago

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.
Information security
fromInfoWorld
1 week ago

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.
#scala
Scala
fromMedium
6 days ago

Data Extraction and Classification Using Structural Pattern Matching in Scala

Scala pattern matching enhances code readability and extensibility in real-world data engineering use cases.
fromMedium
2 months ago
Scala

I Thought Scala Was Vibe Coding

Scala emphasizes immutability, expression-oriented programming, powerful pattern matching, and Option-based null safety for more concise, safer, and more composable JVM code.
Scala
fromMedium
6 days ago

Data Extraction and Classification Using Structural Pattern Matching in Scala

Scala pattern matching enhances code readability and extensibility in real-world data engineering use cases.
Node JS
fromInfoQ
1 week ago

Inside Netflix's Graph Abstraction: Handling 650TB of Graph Data in Milliseconds Globally

Netflix engineers developed Graph Abstraction to manage large-scale graph data in real time, enabling fast queries and supporting various internal services.
Science
fromNature
1 week ago

Drowning in data sets? Here's how to cut them down to size

The Square Kilometre Array Observatory will generate massive data, but storage and retention pose significant challenges for researchers.
#apache-spark
Java
fromMedium
2 weeks ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
2 weeks ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
Java
fromMedium
2 weeks ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
2 weeks ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
Data science
fromMedium
4 weeks ago

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.
Information security
fromTechzine Global
1 week ago

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.
DevOps
fromInfoQ
1 week ago

Uber Launches IngestionNext: Streaming-First Data Lake Cuts Latency and Compute by 25%

Uber's IngestionNext platform shifts to a streaming-first system, reducing data ingestion latency from hours to minutes for analytics and machine learning.
Business intelligence
fromInfoWorld
2 weeks ago

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Project SnowWork is Snowflake's autonomous AI layer that automates data analysis tasks like forecasting, churn analysis, and report generation without requiring data team intervention.
fromInfoWorld
2 weeks ago

Migrating from Apache Airflow v2 to v3

Airflow 3 represents a clear architectural direction for the project: API-driven execution, better isolation, data-aware scheduling and a platform designed for modern scale. While Airflow 2.x is still widely used, it is clearly moving toward long-term maintenance (end-of-life April 2026) with most innovation and architectural investment happening in the 3.x line.
Software development
Data science
fromMedium
2 weeks ago

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.
DevOps
fromInfoQ
2 weeks ago

AWS Expands Aurora DSQL with Playground, New Tool Integrations, and Driver Connectors

Amazon Aurora DSQL introduces usability enhancements, including a browser-based playground and integrations with popular SQL tools for improved developer experience.
Artificial intelligence
fromInfoWorld
3 weeks ago

Databricks launches Genie Code to automate data science and engineering tasks

Databricks launched Genie Code, an AI agent that automates data science and engineering tasks within its lakehouse platform to accelerate ML workflows and enterprise data operations.
Software development
fromMedium
1 month ago

Unified Databricks Repository for Scala and Python Data Pipelines

Databricks repositories require structured setup with Gradle for multi-language support, dependency management, and version control to scale beyond manual notebook maintenance.
#ai-agent-evaluation
Business intelligence
fromInfoWorld
3 weeks ago

Databricks buys Quotient AI to boost enterprisegrade AI agent performance

Databricks acquired Quotient AI to enable enterprises to deploy AI agents reliably in production with continuous evaluation, monitoring, and performance improvement capabilities.
Artificial intelligence
fromTechzine Global
3 weeks ago

Databricks acquires Quotient AI in push for agent reliability

Databricks acquired Quotient AI to embed agent evaluation and reinforcement learning capabilities into its platform, addressing the critical challenge of maintaining reliable AI agents in production environments.
DevOps
fromInfoWorld
2 weeks ago

Update your databases now to avoid data debt

Multiple major open source databases reach end-of-life in 2026, requiring teams to plan upgrades and migrations to avoid security risks and higher costs.
#scala-interview-preparation
Python
fromTreehouse Blog
1 month ago

Python for Data: A SQL + Pandas Mini-Project That Actually Prepares You for Real Work

Effective data analysis requires combining SQL and Python skills in integrated projects that mirror real-world workflows, not learning them in isolation.
#mariadb-acquisition
Business intelligence
fromInfoWorld
3 weeks ago

MariaDB taps GridGain to keep pace with AI-driven data demands

MariaDB's acquisition of GridGain aims to create an integrated platform combining relational database reliability with in-memory computing speed to compete with hyperscaler offerings.
Business intelligence
fromInfoWorld
3 weeks ago

MariaDB taps GridGain to keep pace with AI-driven data demands

MariaDB's acquisition of GridGain aims to create an integrated platform combining relational database reliability with in-memory computing speed to compete with hyperscaler offerings.
Software development
fromInfoQ
3 weeks ago

How Datadog Cut the Size of Its Agent Go Binaries by 77%

Datadog reduced its Agent binary from 1.22 GiB by auditing imports, using build tags, isolating optional code, and eliminating reflection pitfalls to remove unnecessary dependencies and compiler bloat.
DevOps
fromInfoQ
3 weeks ago

Elastic Releases Version 9.3.0 With Enhanced AI Tools and OTel Support

Elastic 9.3.0 introduces AI workflow automation, 12x faster vector indexing via NVIDIA GPU acceleration, and OpenTelemetry integration for vendor-neutral observability across hybrid cloud environments.
Business intelligence
fromTechzine Global
3 weeks ago

Dataiku introduces platform for scalable enterprise AI

Dataiku launches Platform for AI Success with three new products designed to move AI initiatives from pilots to measurable business outcomes through unified orchestration across cloud providers.
DevOps
fromInfoQ
3 weeks ago

Running Ray at Scale on AKS

Microsoft and Anyscale provide guidance for running managed Ray service on Azure Kubernetes Service, addressing GPU capacity limits, ML storage challenges, and credential expiry issues through multi-cluster, multi-region deployment strategies.
Artificial intelligence
fromComputerWeekly.com
4 weeks ago

Edge AI: What's working and what isn't | Computer Weekly

Edge AI deployment success depends on identifying efficient, narrow use cases with manageable risks rather than pursuing sophisticated, large-scale models across all applications.
Artificial intelligence
fromInfoWorld
1 month ago

Why AI requires rethinking the storage-compute divide

AI workloads require continuous processing of unstructured multimodal data, causing redundant data movement and transformation that wastes infrastructure costs and data scientist time.
DevOps
fromInfoQ
3 weeks ago

Netflix Automates RDS PostgreSQL to Aurora PostgreSQL Migration Across 400 Production Clusters

Netflix automated RDS to Aurora PostgreSQL migrations across 400 production clusters through infrastructure-level orchestration, eliminating manual intervention while maintaining data integrity and CDC pipeline correctness.
Data science
fromInfoQ
1 month ago

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Lakebase is a serverless PostgreSQL OLTP database that separates compute from storage and unifies transactional and analytical capabilities.
Startup companies
fromInfoQ
2 months ago

Etleap Launches Iceberg Pipeline Platform to Simplify Enterprise Adoption of Apache Iceberg

Managed Iceberg pipeline platform unifies ingestion, transformation, orchestration, and table operations inside customers' VPCs, enabling enterprise Iceberg adoption without building custom stacks.
Web development
fromInfoQ
3 months ago

DuckDB's WebAssembly Client Allows Querying Iceberg Datasets in the Browser

DuckDB-Wasm enables browser-based, serverless end-to-end query, read, and write access to Iceberg REST catalogs and object storage without infrastructure setup.
#streamlit
fromInfoQ
2 months ago

350PB, Millions of Events, One System: Inside Uber's Cross-Region Data Lake and Disaster Recovery

Uber has built HiveSync, a sharded batch replication system that keeps Hive and HDFS data synchronized across multiple regions, handling millions of Hive events daily. HiveSync ensures cross-region data consistency, enables Uber's disaster recovery strategy, and eliminates inefficiency caused by the secondary region sitting idle, which previously incurred hardware costs equal to the primary, while still maintaining high availability. Built initially on the open-source Airbnb ReAir project, HiveSync has been extended with sharding, DAG-based orchestration, and a separation of control and data planes.
Tech industry
Data science
fromMedium
2 months ago

The Complete Guide to Optimizing Apache Spark Jobs: From Basics to Production-Ready Performance

Optimize Spark jobs by using lazy evaluation awareness, early filter and column pruning, partition pruning, and appropriate join strategies to minimize shuffles and I/O.
#spark
fromMedium
2 months ago
Software development

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

fromMedium
2 months ago
Data science

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

fromMedium
2 months ago
Software development

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

fromMedium
2 months ago
Data science

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

fromTechzine Global
2 months ago

Sumo Logic launches data pipeline apps for Snowflake and Databricks

Snowflake offers a fully managed data platform, but Sumo Logic users often lack insight into performance, login activity, and operational health. The Sumo Logic Snowflake Logs App analyzes login and access activity to identify anomalies or suspicious behavior. It also optimizes data pipelines with insights into long-running or failing queries. Teams can centralize log data to facilitate correlation across applications, cloud services, and data platforms.
Information security
Artificial intelligence
fromInfoQ
2 months ago

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

A Q-learning agent autonomously learns and generalizes optimal Spark configurations by discretizing dataset features and combining with Adaptive Query Execution for superior performance.
Software development
fromInfoQ
1 month ago

Are You Missing a Data Frame? The Power of Data Frames in Java

DataFrames and data-oriented programming promote modeling immutable data separately from behavior, making Java suitable for DataFrame-style data manipulation comparable to Python.
Software development
fromInfoWorld
2 months ago

Why your next microservices should be streaming SQL-driven

Streaming SQL with UDFs, materialized results, and ML/AI integrations enables continuous, stateful processing of event streams for microservices.
#instructed-retriever
fromInfoWorld
2 months ago
Artificial intelligence

Databricks says its Instructed Retriever offers better AI answers than RAG in the enterprise

fromInfoWorld
2 months ago
Artificial intelligence

Databricks says its Instruction Retrieval offers better AI answers than RAG in the enterprise

fromInfoWorld
2 months ago
Artificial intelligence

Databricks says its Instructed Retriever offers better AI answers than RAG in the enterprise

fromInfoWorld
2 months ago
Artificial intelligence

Databricks says its Instruction Retrieval offers better AI answers than RAG in the enterprise

Business intelligence
fromTechzine Global
2 months ago

ClickHouse, the open-source challenger to Snowflake and Databricks

ClickHouse is a high-performance columnar OLAP database rapidly adopted by AI and enterprise users, now valued at $15B and acquiring Langfuse.
Data science
fromDevOps.com
2 months ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.
fromTechzine Global
1 month ago

Databricks makes serverless Postgress service Lakebase available

Databricks today announced the general availability of Lakebase on AWS, a new database architecture that separates compute and storage. The managed serverless Postgres service is designed to help organizations build faster without worrying about infrastructure management. When databases link compute and storage, every query must use the same CPU and memory resources. This can cause a single heavy query to affect all other operations. By separating compute and storage, resources automatically scale with the actual load.
Software development
fromInfoQ
2 months ago

Cloudflare Introduces Aggregations in R2 SQL for Data Analytics

R2 SQL now supports SUM, COUNT, AVG, MIN, and MAX, as well as GROUP BY and HAVING clauses. These aggregation functions let developers run SQL analytics directly on data stored in R2 via the R2 Data Catalog, enabling them to quickly summarize data, spot trends, generate reports, and identify unusual patterns in logs. In addition to aggregations, the update introduces schema discovery commands, including SHOW TABLES and DESCRIBE.
Software development
Artificial intelligence
fromFortune
2 months ago

Want to get AI agents to work better? Improve how they retrieve data, Databricks says | Fortune

Engineering complete AI-agent workflows and providing access to correct information are essential for moving AI agents beyond pilot phase.
Data science
fromInfoQ
1 month ago

Beyond the Warehouse: Why BigQuery Alone Won't Solve Your Data Problems

Data warehouses like BigQuery perform well initially but become slow, costly, and disorganized at scale, undermining low-latency operational use and innovation.
fromInfoWorld
1 month ago

Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation

By replacing repeated fine‑tuning with a dual‑memory system, MemAlign reduces the cost and instability of training LLM judges, offering faster adaptation to new domains and changing business policies. Databricks' Mosaic AI Research team has added a new framework, MemAlign, to MLflow, its managed machine learning and generative AI lifecycle development service. MemAlign is designed to help enterprises lower the cost and latency of training LLM-based judges, in turn making AI evaluation scalable and trustworthy enough for production deployments.
Artificial intelligence
Data science
fromComputerworld
2 months ago

Great R packages for data import, wrangling, and visualization

A set of R packages (dplyr, purrr, readr/vroom, datapasta, Hmisc) streamline data wrangling, importing, and analysis with faster, standardized, and reproducible tools.
Software development
fromMedium
1 month ago

The Complete Database Scaling Playbook: From 1 to 10,000 Queries Per Second

Database scaling to 10,000 QPS requires staged architectural strategies timed to traffic thresholds to avoid outages or unnecessary cost.
fromInfoWorld
2 months ago

AI is changing the way we think about databases

Developers have spent the past decade trying to forget databases exist. Not literally, of course. We still store petabytes. But for the average developer, the database became an implementation detail; an essential but staid utility layer we worked hard not to think about. We abstracted it behind object-relational mappers (ORM). We wrapped it in APIs. We stuffed semi-structured objects into columns and told ourselves it was flexible.
Software development
Data science
fromMedium
3 months ago

Migrating from Historical Batch Processing to Incremental CDC Using Apache Iceberg (Glue 4...

Use Apache Iceberg Copy-on-Write tables in AWS Glue 4 to migrate from full historical batch reprocessing to incremental CDC, reducing redundant computation, I/O, and costs.
fromTechzine Global
1 month ago

Databricks shows how AI strengthens the SaaS model

The rise of generative AI is often seen as an existential threat to the SaaS model. Interfaces would disappear, software would fade away, and existing players would become irrelevant. However, new figures from Databricks paint a different picture. Rather than undermining SaaS, AI appears to be increasing its use. This week, Databricks reported a revenue run rate of $5.4 billion, a 65 percent year-on-year increase. More than a quarter of that now comes from AI-related products.
Artificial intelligence
Software development
fromMedium
1 month ago

When Kafka Lag Lies: A Production Debugging Story

Uncommitted Kafka offsets can cause persistent consumer-group lag even when ingestion is low, databases are idle, and no errors are observed.
Data science
fromInfoWorld
2 months ago

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Cortex Code enables developers to use natural language to build, optimize, and deploy governed, production-ready data pipelines, analytics, ML workloads, and AI agents.
fromthenewstack.io
2 months ago

Why Most APIs Fail in AI Systems and How To Fix It

Over the past few years, I've reviewed thousands of APIs across startups, enterprises and global platforms. Almost all shipped OpenAPI documents. On paper, they should be well-defined and interoperable. In practice, most fail when consumed predictably by AI systems. They were designed for human readers, not machines that need to reason, plan and safely execute actions. When APIs are ambiguous, inconsistent or structurally unreliable, AI systems struggle or fail outright.
Software development
Artificial intelligence
fromTheregister
2 months ago

Nvidia says DGX Spark is now 2.5x faster than at launch

Nvidia's DGX Spark and GB10 systems gain significant software-driven performance improvements and broader software integrations, boosting prefill compute performance for genAI workflows.
Data science
fromCIO
2 months ago

5 perspectives on modern data analytics

Data/business analytics is the top IT investment priority, yet analytics projects often fail due to poor data, vague objectives, and one-size-fits-all solutions.
Software development
fromInfoQ
2 months ago

AWS Adds Intelligent-Tiering and Replication for S3 Tables

S3 Tables now support Intelligent-Tiering automatic cost optimization and cross-region/account Apache Iceberg table replication without manual synchronization.
Artificial intelligence
fromInfoWorld
2 months ago

Edge AI: The future of AI inference is smarter local compute

Edge AI shifts computation from cloud to devices, enabling low-latency, cost-efficient, and privacy-preserving AI inference while facing performance and ecosystem challenges.
Software development
fromInfoQ
1 month ago

OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

OpenAI scaled a single-primary PostgreSQL to millions of queries per second by optimizing instance size, query patterns, read replicas, and offloading write-heavy workloads.
#ai
fromInfoQ
1 month ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
Artificial intelligence
fromInfoWorld
2 months ago

Teradata unveils enterprise AgentStack to push AI agents into production

Teradata positions Enterprise AgentStack as a vendor-agnostic execution layer across hybrid environments, contrasting platform-tied AI approaches from Snowflake and Databricks.
fromCointelegraph
2 months ago

What Role Is Left for Decentralized GPU Networks in AI?

What we are beginning to see is that many open-source and other models are becoming compact enough and sufficiently optimized to run very efficiently on consumer GPUs,
Artificial intelligence
Artificial intelligence
fromInfoQ
2 months ago

Google BigQuery Adds SQL-Native Managed Inference for Hugging Face Models

BigQuery lets data teams deploy and run Hugging Face or Vertex AI open models with plain SQL, auto-provisioning compute and managing endpoints.
[ Load more ]