Modern types of data architecture. Dive into different approaches to building data warehouses

In the era of digitalization of business, when every click and every transaction turns into data, the importance of its architecture becomes key to the success of a company. Modern data architectures are not just storage, they are complex systems that provide fast access, security and the ability to analyze huge amounts of information.

Data architecture is the skeleton of any information system. It defines how data is collected, stored, processed and transmitted. The history of data architectures begins with simple file systems and reaches all the way to modern cloud and distributed solutions. This evolutionary path has led to a variety of approaches to data warehouse architecture, each with its own unique features and benefits. The most iconic in this context are the classic approaches developed by William Inmon and Ralph Kimball.

Types of data warehouse architecture

Classical types: Inmon and Kimball

Inmon's architecture (3NF) focuses on creating a single, integrated data warehouse, while Kimball's architecture (star, dimensional modeling) focuses on building multidimensional models for analytics. In addition, the so-called Modern stack, or big data architecture, which offers flexible and scalable solutions, occupies an important place.

Inmon’s data storage model

Kimball’s data storage model

Data stores may include several layers, usually up to three (staging area, operational data store, detail layer, presentation layer), although in some cases there may be more.
For example, the following layers may be present: stg lake, lake, stg ods, ods, stg dds, dds, stg cdm, cdm, and a specialized layer for machine learning. This provides a wide range of possibilities for organizing and analyzing data in different usage scenarios.

Examples of three-tier data architecture

Lambda and Kappa architectures
In the context of modern data architectures, Lambda and Kappa architectures stand out for their specificity. They represent approaches to building ETL (Extract, Transform, Load) pipelines, which is a key element in data processing.

Lambda Architecture: Adapting to Market Dynamics
Lambda architecture is a service-based approach that can quickly adapt to market changes and ensure relevance of offerings. This architecture becomes particularly relevant when it comes to tasks such as sending personalized discount offers, where both historical customer data and their current location need to be taken into account.

Components of the Lambda Architecture

Batch Layer - "cold path": Here, data is stored in its raw form and processed with a delay. This layer contains "raw" data, including regulatory and reference information that changes infrequently. Using machine learning techniques, the batch layer analyzes historical data to segment customers and create predictive models.
Speed Layer - Hot Layer: This layer provides real-time data analysis with minimal latency. It uses streaming data processing frameworks such as Apache Spark, Storm or Flink to process information with a short lifecycle. In this layer, some compromises are made in accuracy in favor of speed.
Serving Layer: The Serving Layer provides an interface for combining data from the batch and speed layers. It allows users to access consolidated, up-to-date data for analytical purposes.

The Lambda architecture is a flexible approach that allows you to quickly adapt to changes and ensure relevance of offerings, which is critical in a dynamic market environment.

Application of λ-architecture in Big Data practice
Lambda architecture enables processing of large volumes of data with minimal latency, standing out for its fault tolerance and scalability. This is achieved through a combination of batch processing and high-speed analytics, enabling the integration of new data while maintaining the integrity and availability of historical information. Such qualities make the lambda architecture in demand in real-world Big Data projects, as evidenced by its use by leading companies such as Twitter, Netflix and Yahoo.
Key usage scenarios for λ-architecture include:

One-time query processing using immutable data stores.
Rapid response and integration of new data streams.
The need to preserve historical data without deletion with the ability to add updates.

Kappa architecture in Big Data: Simplification and Efficiency
Kappa architecture is a slender data processing model where all data is treated as a sequential stream of events. These events are systematically organized in a robust event log, where each new event updates the current state. Unlike the lambda architecture, Kappa eliminates the batch level of processing, focusing on real-time stream processing and storing data as a continuously updated view.
The Kappa architecture simplifies the design of Big Data systems by removing the need for batch processing and relying on an immutable event log to serve as the basis for all operations. This approach allows stream computing systems to process data directly from the log, supplementing it with information from auxiliary stores as needed.

Technologies utilized in the Kappa architecture include:

Continuous Event Logging systems such as Apache Kafka and Amazon Kinesis.
Frameworks for stream computing, including Apache Spark and Flink.
A variety of service-level databases, from resident databases to specialized full-text search solutions.

Applications of K-architecture and Apache Kafka importance
K-architecture finds its application in scenarios where:

A queue of events and requests in a distributed file system needs to be managed.
The order of events is not fixed and streaming frameworks can interact with data at any time.
High availability and resiliency are critical because data processing occurs on every node in the system.

Apache Kafka, as a powerful message broker, supports these requirements by providing a fast, reliable and scalable platform for data collection and aggregation. This makes the Kafka-based Kappa architecture ideal for projects like LinkedIn, where large amounts of data need to be processed and stored to serve multiple identical requests.

The difference between Lambda and Kappa architecture

Conclusion: synthesizing popular data architectures
Exploring different data architectures, from lambda and kappa architectures to microservices, opens up a world of possibilities for managing and analyzing information. These approaches provide the foundation on which to build powerful and flexible data systems that can transform business processes and decision making.
However, mastering these technologies requires not only time and effort, but also specialized knowledge. In this context, choosing the right partner or platform to implement and support your data architecture can be a critical success factor.