Primary Key Tables: Unifying Log and Cache for 🚀 Streaming
Modern data platforms have traditionally relied on two foundational components: a log for durable, ordered event storage and a cache for low-latency access. Common architectures include combinations such as Kafka with Redis, or Debezium feeding changes into a key-value store. While these patterns underpin a significant portion of production infrastructure, they also introduce complexity, fragility, and operational overhead.
Apache Fluss (Incubating) addresses this challenge with an elegant solution: Primary Key Tables (PK Tables). These persistent state tables provide the same semantics as running both a log and a cache, without needing two separate systems. Every write produces a durable log entry and an immediately consistent key-value update. Snapshots and log replay guarantee deterministic recovery, while clients benefit from the simplicity of interacting with one system for reads, writes, and queries.
In this post, we will explore how Fluss PK Tables work, why unifying log and cache into a persistent design is a critical advancement, and how this model resolves long-standing challenges of maintaining consistency across multiple systems.