Moving data across clouds sounds simple until the bill arrives. A query runs in one environment. Storage sits in another. Before long, egress charges start piling up, data gets copied across platforms, and migrating workloads becomes harder than expected.
If you're building a multi-cloud data platform, those problems can limit the flexibility that multi-cloud was supposed to provide.
The x-Lake architecture addresses them through open storage, shared metadata, and flexible compute, giving you the benefits of a decoupled data architecture without the lock-in and operational overhead that often follow.
What is x-Lake and What Problem Does It Solve?
At its core, x-Lake is an open multi-cloud data platform designed to separate storage, compute, and governance. Built on S3-compatible storage platform principles, it uses object storage as the persistence layer, open table formats such as Parquet, Iceberg, Delta, and ORC for data management, and a flexible compute layer powered by Spark, Trino, and Airflow.
The result is an open architecture data platform that lets you run different workloads without tying your data to a single engine or cloud. This design addresses three common limitations found in many managed data platforms:
- Proprietary storage formats can restrict which engines can access your data and make migrations more complex.
- Usage-based pricing models often become more expensive as data volumes, workloads, and teams grow.
- Data egress charges can increase costs when data moves across cloud environments or platform boundaries.
The x-Lake architecture takes a different approach. Open table formats reduce engine lock-in. S3-native storage removes dependence on proprietary storage layers.
A VPC-native deployment model keeps data and workloads within your environment, helping create a zero egress data platform for operations while giving teams the flexibility of a multi-engine data platform.
S3-Compatible Storage as the Foundation
A S3-compatible storage platform is the foundation of modern data infrastructure because it is supported by every major cloud provider, most data engines, and leading open table formats. That makes it a practical starting point for a multi-cloud data platform, where storage needs to remain accessible regardless of where workloads run.
Using object storage as the persistence layer also supports a decoupled data architecture. Storage remains independent, while compute scales up or down as needed. Teams can process data with different engines without moving it or converting formats.
This approach enables:
- Compute clusters that start and stop per workload instead of running continuously.
- Persistent storage that remains available at object-storage economics.
- Data that can be read by multiple engines through a shared metadata layer.
- Less operational overhead as environments grow and data complexity increases.
The multi-cloud advantage comes from consistency. Amazon S3, Azure Data Lake Storage, and Google Cloud Storage expose similar object storage interfaces, allowing workloads to move across cloud environments without re-engineering data.
Combined with open table formats, that portability becomes the foundation of a multi-engine query lakehouse that can evolve without locking teams into a single cloud or platform.
Multi-Engine Query: Spark, Trino, and Airflow on the Same Storage Layer
Most enterprise teams don't use a single engine for every workload. Data engineers need one set of tools to process large datasets. Analytics teams need another to answer business questions quickly. Pipeline teams need a reliable way to orchestrate everything in between.
A Trino Spark Airflow platform allows all three teams to work from the same storage layer. Instead of creating separate data copies for each engine, every workload reads from and writes to the same S3-native foundation. That reduces operational overhead and supports the flexibility expected from a multi-engine data platform.
The benefit is simple: each team can use the engine best suited to its work without managing separate datasets. Analytics teams can improve SQL efficiency in Trino while data engineers process the same data in Spark. Airflow orchestrates the workflows that connect them.
Apache Iceberg makes this possible. As an open table format platform, it maintains shared table metadata, schemas, and partition information that every engine can understand. That creates a multi-engine query lakehouse where teams get consistent results regardless of which engine they use, without relying on format-specific connectors or proprietary integrations.
Zero Egress Architecture: How x-Lake Eliminates Data Movement Costs
Data movement is one of the hidden costs of modern data platforms. When storage and compute run on vendor-controlled infrastructure, data often moves between the platform and your environment for processing, analytics, or orchestration. As workloads grow, those transfers can create a recurring category of egress charges.
x-Lake is designed as a zero egress data platform. Its VPC-native architecture keeps both storage and compute inside your environment, so data stays within your network boundary during platform operations. That removes the need to transfer data between vendor infrastructure and customer-managed environments.
This approach delivers three practical benefits:
- Lower operating costs: Eliminates egress fees associated with platform-level data movement.
- Simpler operations: Teams can run a data workflow closer to the data itself instead of moving datasets between environments.
- Stronger data residency controls: Data processing remains within the customer-controlled boundary, helping satisfy regional and industry-specific requirements.
For organizations building a multi-cloud data platform, reducing data movement is about more than cost. It helps create a more predictable operating model where governance, compliance, and processing stay aligned with where the data resides.
Open Table Formats: How x-Lake Eliminates Vendor Lock-In
Vendor lock-in often starts at the storage layer. When data is stored in proprietary formats, organizations become dependent on the engines and services that support those formats. Over time, moving workloads or changing platforms can require costly migrations, data conversion projects, and application changes.
As an open table format platform, x-Lake uses formats such as Parquet, Iceberg, Delta, and ORC by default. That means data remains accessible to a broad ecosystem of engines, helping organizations maintain flexibility as requirements change.
This approach delivers three advantages:
- Engine freedom: Spark, Trino, Flink, Hive, Presto, and other compatible engines can read the same data without proprietary connectors or format conversion.
- Simpler platform evolution: Teams can adopt new tools and workloads without rebuilding the underlying storage layer.
- Open governance: Open formats work alongside catalog and access-control services such as Apache Gravitino and Apache Ranger, allowing governance policies to remain consistent without creating format dependencies.
The result is an open architecture data platform that supports long-term portability. Combined with shared metadata and open storage, it gives organizations the flexibility to evolve their multi-cloud data platform without being tied to a single vendor's ecosystem.
x-Lake Is What Multi-Cloud Data Should Have Always Looked Like
Multi-cloud promises flexibility, portability, and cost control. The challenge is that many platforms achieve those goals by introducing new dependencies, whether through proprietary formats, data movement costs, or tightly coupled infrastructure.
x-Lake takes a different approach. It combines three architectural foundations that help enterprise teams keep data open, portable, and accessible across environments:
- S3-native storage that separates persistence from compute.
- Open table formats that prevent engine and platform lock-in.
- Multi-engine compute that allows teams to use Spark, Trino, and Airflow on the same data foundation.
Together, these capabilities create a multi-cloud data platform that reduces egress costs, supports workload portability, and gives teams more freedom to evolve their architecture over time. As the storage and data platform layer within Acceldata's xLake platform, x-Lake provides the open, multi-cloud foundation that organizations can deploy inside their own VPC.
See how xLake eliminates egress costs and vendor lock-in. Book a demo to build a more portable, cost-efficient multi-cloud data platform.
x-Lake Architecture: Frequently Asked Questions
What is x-Lake?
x-Lake is Acceldata's open, multi-cloud data platform architecture. It combines S3-native storage, open table formats such as Iceberg and Parquet, and multi-engine compute to reduce vendor lock-in, improve portability, and eliminate platform-related egress costs.
How does x-Lake eliminate data egress costs?
x-Lake runs storage and compute inside your VPC, keeping data within your network boundary. This removes the platform-level data movement common in managed services and helps eliminate the egress charges associated with transferring data between environments.
What table formats does x-Lake support?
x-Lake supports major open table formats, including Apache Iceberg, Parquet, Delta Lake, and ORC. Because it avoids proprietary formats, compatible engines can access the same data without format conversion, helping preserve portability and interoperability.
What compute engines can query x-Lake data?
x-Lake supports engines that work with open table formats and S3-compatible storage, including Apache Spark, Trino, Flink, Hive, and Presto. Teams can use the engine best suited to their workload while sharing the same underlying data.
How is x-Lake different from a standard data lakehouse?
Unlike managed lakehouse platforms, x-Lake operates entirely within the customer's VPC. This VPC-native architecture keeps data under customer control, reduces egress costs, limits vendor access, and supports an open, multi-cloud data platform strategy.








.webp)
.webp)

