#pg_lake
#pg_lake

2 days ago

Django

Snowflake Supports Directory Imports

Artificial intelligence

Snowflake's ongoing pitch: bring AI to data, not vice versa

Artificial intelligence

Snowflake adds Google Gemini support

Django

2 days ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.

Artificial intelligence

Snowflake adds Google Gemini support

more#snowflake

4 days ago

Microsoft Fabric Database Hub dubbed 'partial' solution

Microsoft's Fabric Database Hub offers a centralized management solution for its database services but lacks support for non-Microsoft databases.

Node JS

fromhowtocenterdiv.com

Database Performance Bottlenecks: N+1 Queries, Missing Indexes, and Connection Pools

Database issues, like missing indexes and N+1 queries, are often overlooked in software engineering, leading to persistent performance problems.

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.

fromNew Relic

Business intelligence

Optimize Databricks: Full Visibility with New Relic

Information security

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.

fromNew Relic

Business intelligence

Optimize Databricks: Full Visibility with New Relic

ProxySQL Introduces Multi-Tier Release Strategy With Stable, Innovative, and AI Tracks

ProxySQL 3.0.6 introduces a multi-tier release strategy focusing on stability, innovation, and AI capabilities for diverse user needs.

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.

Information security

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Project SnowWork is Snowflake's autonomous AI layer that automates data analysis tasks like forecasting, churn analysis, and report generation without requiring data team intervention.

AWS Expands Aurora DSQL with Playground, New Tool Integrations, and Driver Connectors

Amazon Aurora DSQL introduces usability enhancements, including a browser-based playground and integrations with popular SQL tools for improved developer experience.

How Datadog Cut the Size of Its Agent Go Binaries by 77%

Datadog reduced its Agent binary from 1.22 GiB by auditing imports, using build tags, isolating optional code, and eliminating reflection pitfalls to remove unnecessary dependencies and compiler bloat.

Oracle moves to assure MySQL community it really does care

Oracle firmly believes that MySQL's enduring strength arises from this vibrant global community. We are excited to work with the MySQL Community on the strategy we announced in Belgium, January 29, 2026, including adding more features and functionality, accelerating innovation directly in the MySQL core.

Online Community Development

Microsoft promises multi database wrangling hub on Fabric

Microsoft launched Database Hub, a unified management tool within Fabric that consolidates multiple database services across on-premises, PaaS, and SaaS environments with AI-assisted capabilities.

AWS spurs Catch-22, ending PostgreSQL 13 support for RDS

AWS RDS PostgreSQL 13 end of support forces upgrades to PostgreSQL 14+, but this breaks AWS Glue ETL service due to incompatible authentication schemes, creating a production environment conflict.

#mariadb

Online Community Development

MariaDB backs down on Galera removal after community outcry

Software development

The best new features in MariaDB

Online Community Development

MariaDB backs down on Galera removal after community outcry

Software development

The best new features in MariaDB

more#mariadb

Why Postgres has won as the de facto database: Today and for the agentic future

Leading enterprises achieve 5x ROI by adopting open source databases like PostgreSQL to unify structured and unstructured data for agentic AI, with 81% of successful enterprises committed to open source strategies.

Miscellaneous

Google Cloud Brings Full OpenTelemetry Support to Cloud Monitoring Metrics

Google Cloud now supports OpenTelemetry Protocol (OTLP) for metrics in Cloud Monitoring, enabling vendor-agnostic telemetry collection alongside traces and logs through a unified pipeline.

Update your databases now to avoid data debt

Multiple major open source databases reach end-of-life in 2026, requiring teams to plan upgrades and migrations to avoid security risks and higher costs.

100 Scala Interview Questions and Answers for Data Engineers

Structured Scala and Apache Spark interview preparation requires understanding distributed systems, performance trade-offs, and pipeline design beyond theoretical knowledge.

Why AI requires rethinking the storage-compute divide

AI workloads require continuous processing of unstructured multimodal data, causing redundant data movement and transformation that wastes infrastructure costs and data scientist time.

fromComputerWeekly.com

Everpure's Evergreen One for AI brings Exa flash and GPU-based service-level agreements | Computer Weekly

Everpure launches Evergreen One for AI, a consumption model with GPU-count-based SLAs for FlashBlade//Exa storage to optimize AI workload performance.

#mariadb-acquisition

MariaDB taps GridGain to keep pace with AI-driven data demands

MariaDB's acquisition of GridGain aims to create an integrated platform combining relational database reliability with in-memory computing speed to compete with hyperscaler offerings.

DevOps

MariaDB acquires GridGain for agentic AI data

MariaDB taps GridGain to keep pace with AI-driven data demands

MariaDB's acquisition of GridGain aims to create an integrated platform combining relational database reliability with in-memory computing speed to compete with hyperscaler offerings.

DevOps

MariaDB acquires GridGain for agentic AI data

more#mariadb-acquisition

4 weeks ago

The revenge of SQL: How a 50-year-old language reinvents itself

SQL has experienced a major comeback driven by SQLite in browsers, improved language tools, and PostgreSQL's jsonb type, making it both traditional and exciting for modern development.

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned MySQL infrastructure using Group Replication to reduce failover time from minutes to seconds while maintaining strong consistency across thousands of clusters.

Netflix Automates RDS PostgreSQL to Aurora PostgreSQL Migration Across 400 Production Clusters

Netflix automated RDS to Aurora PostgreSQL migrations across 400 production clusters through infrastructure-level orchestration, eliminating manual intervention while maintaining data integrity and CDC pipeline correctness.

Buyer's guide: Comparing the leading cloud data platforms

Five leading cloud data platforms—Databricks, Snowflake, Amazon RedShift, Google BigQuery, and Microsoft Fabric—offer distinct architectural approaches for enterprise data storage, analytics, and AI workloads.

Google BigQuery Previews Cross-Region SQL Queries for Distributed Data

BigQuery's global queries feature enables SQL queries across multiple geographic regions without data movement, eliminating ETL pipelines for distributed analytics.

Java

GlassFish 8 Java server boosts data access, concurrency

GlassFish 8 adds virtual threads for massive concurrency, integrates Jakarta Security with MicroProfile JWT for flexible authentication, and supports JMX monitoring in Embedded mode.

Tech industry

Snowflake plugs PostgreSQL into its AI Data Cloud

Snowflake now offers a native PostgreSQL DBaaS in its AI Data Cloud to run transactional workloads alongside analytics and AI under unified governance.

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Lakebase is a serverless PostgreSQL OLTP database that separates compute from storage and unifies transactional and analytical capabilities.

#mysql

Software development

Dear Oracle, we need to talk about the future of MySQL

Tech industry

Oracle promises new approach to MySQL

Software development

What is the future for MySQL?

Software development

Devs assessing options for MySQL's future beyond Oracle

Software development

Dear Oracle, we need to talk about the future of MySQL

Tech industry

Oracle promises new approach to MySQL

Software development

What is the future for MySQL?

Software development

Devs assessing options for MySQL's future beyond Oracle

Etleap Launches Iceberg Pipeline Platform to Simplify Enterprise Adoption of Apache Iceberg

Managed Iceberg pipeline platform unifies ingestion, transformation, orchestration, and table operations inside customers' VPCs, enabling enterprise Iceberg adoption without building custom stacks.

Web development

DuckDB's WebAssembly Client Allows Querying Iceberg Datasets in the Browser

DuckDB-Wasm enables browser-based, serverless end-to-end query, read, and write access to Iceberg REST catalogs and object storage without infrastructure setup.

fromMoz

Why Export GA4 Data to BigQuery?

Then coming on to the next point, which is you can create your own sessions and user properties. Now you can do this in the GA4 interface under Explorations.

Marketing tech

#ai

Artificial intelligence

With AI, the database matters again

Artificial intelligence

AI makes the database matter again

Artificial intelligence

With AI, the database matters again

Artificial intelligence

AI makes the database matter again

libgd-gis continues to grow - now with styles and more

libgd-gis is a Ruby raster GIS engine on libgd that supports cartographic styles, layered GeoJSON, full labeling, and direct image composition.

Databricks makes serverless Postgress service Lakebase available

Databricks today announced the general availability of Lakebase on AWS, a new database architecture that separates compute and storage. The managed serverless Postgres service is designed to help organizations build faster without worrying about infrastructure management. When databases link compute and storage, every query must use the same CPU and memory resources. This can cause a single heavy query to affect all other operations. By separating compute and storage, resources automatically scale with the actual load.

Software development

Starburst: Chewing through data access is key to AI adoption

AI adoption is bottlenecked by lack of access to contextual, current, and governed data; without that, AI cannot reliably increase productivity.

Tech industry

Google Introduces Managed Connection Pooling for AlloyDB

AlloyDB's managed connection pooling increases client connections and transactional throughput while reducing operational burden and latency for high-concurrency and serverless workloads.

Generative AI and the future of databases

Databases must evolve into AI-native systems that securely federate with LLMs, support real-time access, granular permissions, and tools for safe natural-language-to-SQL integration.

The Complete Database Scaling Playbook: From 1 to 10,000 Queries Per Second

Database scaling to 10,000 QPS requires staged architectural strategies timed to traffic thresholds to avoid outages or unnecessary cost.

#clickhouse

Business intelligence

ClickHouse, the open-source challenger to Snowflake and Databricks

DevOps

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

DevOps

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

Software development

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

Business intelligence

ClickHouse, the open-source challenger to Snowflake and Databricks

DevOps

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

DevOps

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

Software development

Stop Paying for Expensive Logging: Self-Hosted ClickHouse on Kubernetes

more#clickhouse

350PB, Millions of Events, One System: Inside Uber's Cross-Region Data Lake and Disaster Recovery

Uber has built HiveSync, a sharded batch replication system that keeps Hive and HDFS data synchronized across multiple regions, handling millions of Hive events daily. HiveSync ensures cross-region data consistency, enables Uber's disaster recovery strategy, and eliminates inefficiency caused by the secondary region sitting idle, which previously incurred hardware costs equal to the primary, while still maintaining high availability. Built initially on the open-source Airbnb ReAir project, HiveSync has been extended with sharding, DAG-based orchestration, and a separation of control and data planes.

Tech industry

Beyond the Warehouse: Why BigQuery Alone Won't Solve Your Data Problems

Data warehouses like BigQuery perform well initially but become slow, costly, and disorganized at scale, undermining low-latency operational use and innovation.

If You Don't Have Database Delivery Automation, Brace Yourself for These 10 Problems |

Manual database processes break DevOps pipelines; only 12% deploy database changes daily, causing configuration drift, frequent errors, slower time-to-market, and reduced productivity.

Tech industry

Expired Oracle Patent Opens Fast Sorting Algorithm to Open Source Databases

An expired Oracle patent enables open-source databases to implement an adaptive "Orasort" that speeds sorting of similar keys by skipping common prefixes and caching substrings.

LangGrant Unveils LEDGE MCP Server to Enable Agentic AI on Enterprise Databases

LEDGE MCP Server enables LLMs to generate multi-step analytics across enterprise databases securely without exposing raw data, reducing token costs and preserving governance.

Google tests BigQuery feature to generate SQL queries from English

Google allows natural language expressions inside SQL comments to speed translation of intent into executable queries, reducing query-writing time and easing analytics workflows.

OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

OpenAI scaled a single-primary PostgreSQL to millions of queries per second by optimizing instance size, query patterns, read replicas, and offloading write-heavy workloads.

Migrating from Historical Batch Processing to Incremental CDC Using Apache Iceberg (Glue 4...

Use Apache Iceberg Copy-on-Write tables in AWS Glue 4 to migrate from full historical batch reprocessing to incremental CDC, reducing redundant computation, I/O, and costs.

MongoDB Introduces Embedding and Reranking API on Atlas

MongoDB Atlas now offers an Embedding and Reranking API with Voyage AI models, enabling unified semantic search, automated embeddings, and integrated monitoring and billing.

AI is changing the way we think about databases

Developers have spent the past decade trying to forget databases exist. Not literally, of course. We still store petabytes. But for the average developer, the database became an implementation detail; an essential but staid utility layer we worked hard not to think about. We abstracted it behind object-relational mappers (ORM). We wrapped it in APIs. We stuffed semi-structured objects into columns and told ourselves it was flexible.

Software development

The Complete Guide to Optimizing Apache Spark Jobs: From Basics to Production-Ready Performance

Optimize Spark jobs by using lazy evaluation awareness, early filter and column pruning, partition pruning, and appropriate join strategies to minimize shuffles and I/O.

Memgraph founder: Don't get too loose with your use of MCP

MCP is an open standard enabling AI models to connect with external data, APIs, and services, creating a universal framework across language models.

Snowflake and Google Cloud integrate Gemini into AI Data Cloud

Snowflake and Google Cloud are deepening their collaboration by integrating the Google Gemini 3 model into Snowflake Cortex AI. Companies can now develop generative AI applications without moving data between platforms. The integration of Gemini 3 into Snowflake Cortex AI marks a significant step forward in both parties' AI strategy. Developers will have access to Google's large language model within Snowflake's secure data environment. This enables building, deploying, and scaling AI agents and generative AI applications without copying or moving data.

Artificial intelligence

Snowflake updates developer tools, adds observability features

Snowflake adds observability capabilities via Trail The company also added new observability features in the form of Snowflake Trail, which provides visibility into data quality, pipelines, and applications, enabling developers to monitor, troubleshoot, and optimize their workflows. It is built with OpenTelemetry standards so developers can integrate with popular observability and alert platforms including Datadog, Grafana, Metaplane, PagerDuty, and Slack, among others.

DevOps

4 self-contained databases for your apps

XAMPP provides a complete local web stack (MariaDB, Apache, PHP, Mercury SMTP, OpenSSL) while PostgreSQL can be run standalone or embedded via pgserver in Python.

fromNew Relic

The Power and Cost of Data Cardinality

The more attributes you add to your metrics, the more complex and valuable questions you can answer. Every additional attribute provides a new dimension for analysis and troubleshooting. For instance, adding an infrastructure attribute, such as region can help you determine if a performance issue is isolated to a specific geographic area or is widespread. Similarly, adding business context, like a store location attribute for an e-commerce platform, allows you to understand if an issue is specific to a particular set of stores

Data science

#database-devops

Software development

Database DevOps - Where Do I Start? |

Software development

Database delivery automation with GitLab: a deep dive |

Software development

Database DevOps - Where Do I Start? |

Software development

Database delivery automation with GitLab: a deep dive |

more#database-devops

Snowflake launches Cortex Code agent for understanding data context

Cortex Code is an AI agent that converts complex data engineering, ML, and analytics tasks into natural-language workflows integrated into Snowflake and developer tools.

5 years ago

Database Delivery Automation in the Multi-Cloud World

The main advantage of going the Multi-Cloud way is that organizations can "put their eggs in different baskets" and be more versatile in their approach to how they do things. For example, they can mix it up and opt for a cloud-based Platform-as-a-Service (PaaS) solution when it comes to the database, while going the Software-as-a-Service (SaaS) route for their application endeavors.

DevOps

Why your next microservices should be streaming SQL-driven

Streaming SQL with UDFs, materialized results, and ML/AI integrations enables continuous, stateful processing of event streams for microservices.

AWS Adds Intelligent-Tiering and Replication for S3 Tables

S3 Tables now support Intelligent-Tiering automatic cost optimization and cross-region/account Apache Iceberg table replication without manual synchronization.

VillageSQL Launches as an Extension-Focused MySQL Fork

A new open-source project, VillageSQL, has been introduced as a tracking fork of MySQL aimed at expanding extensibility and addressing feature gaps increasingly relevant to AI and agent-based workloads. Announced by founder Dominic Preuss, VillageSQL Server for MySQL is positioned as a drop-in replacement that maintains compatibility with upstream MySQL while adding a structured extension framework. The alpha release is now available for experimentation.

Software development

Firestore Adds Pipeline Operations with Over 100 New Query Features

Google has overhauled Firestore Enterprise edition's query engine, adding Pipeline operations that let developers chain together multiple query stages for complex aggregations, array operations, and regex matching. The update removes Firestore's longstanding query limitations and makes indexes optional, putting the database on par with other major NoSQL platforms. Pipeline operations work through sequential stages that transform data inside the database.

Software development

What's new in MySQL 9.0

But it still contains useful things and can be upgraded to from MySQL 8.4 LTS; the MySQL Configurator automatically does the upgrade without user intervention during MSI installations on Windows. The major changes include: A new Vector datatype is supported in CREATE and ALTER statements. JavaScript Stored Programs, which support JavaScript-based stored programs and functions, has come to MySQL Enterprise Edition. JavaScript Stored Programs can call SQL, and SQL can call them.

Software development

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

"The job didn't fail. It just... never finished." That was the worst part. No errors.No stack traces.Just a Spark job running forever in production - blocking downstream pipelines, delaying reports, and waking up-on-call engineers at 2 AM. This is the story of how I diagnosed a real Spark performance issue in production and fixed it drastically, not by adding more machines - but by understanding Spark properly.