Insights/Engineering

Engineering22 min read

Apache Kafka vs Google Pub/Sub for Real-Time ML Pipelines: What Your Architecture Decision Really Costs

A product-focused comparison of Kafka and Google Pub/Sub for real-time ML workloads, feature stores, model serving, inference pipelines, and event-driven AI. Covers operational cost, latency guarantees, ordering semantics, schema evolution, stream processing, and the trade-offs that matter at scale.

Boolean and Beyond Team

March 13, 2026 · Updated March 26, 2026

Event Streaming Is the Backbone of Production ML

Every production ML system beyond batch prediction depends on event streaming. Feature stores need real-time feature computation from user activity streams. Model serving pipelines need request routing, A/B test allocation, and fallback handling. Online inference needs event triggers, a user action, a sensor reading, a transaction, that kick off a prediction within milliseconds. The event streaming platform you choose becomes the central nervous system of your ML infrastructure.

Apache Kafka and Google Pub/Sub are the two platforms that ML teams in Bengaluru and across India most frequently evaluate. They solve the same fundamental problem, reliably moving events between producers and consumers, but they make radically different architectural trade-offs that affect your ML pipeline's performance, cost, and operational burden.

Exactly-Once Semantics for ML Workloads

Why Exactly-Once Matters for Feature Computation

Duplicate events in an ML pipeline cause feature skew. If a user click event is processed twice, the feature store records two clicks instead of one, and the model makes predictions based on incorrect feature values. In a fraud detection system, duplicate transaction events can trigger false positives that block legitimate payments. The cost of duplicate processing in ML is not just wasted compute, it is degraded model accuracy that is difficult to diagnose because the features look plausible.

Kafka's Exactly-Once Delivery

Kafka provides exactly-once semantics (EOS) through idempotent producers and transactional consumers. When enable.idempotence=true, the broker deduplicates messages using a sequence number per producer session, preventing duplicates caused by network retries. Transactional producers wrap a batch of messages and consumer offset commits in an atomic transaction, ensuring that a consume-process-produce pipeline either fully succeeds or fully rolls back. The latency overhead of EOS is 5-15% compared to at-least-once delivery, which is acceptable for feature computation pipelines where correctness matters more than throughput.

Pub/Sub Deduplication Strategies

Google Pub/Sub provides at-least-once delivery by default. Exactly-once processing is available through Pub/Sub Lite (now part of the standard service) with ordering keys, but it requires careful subscriber configuration. For teams that need deduplication without Pub/Sub Lite, the standard approach is client-side deduplication using a message ID cache in Redis or Memorystore, checking each message ID against the cache before processing. This adds 1-3 ms of latency per message and requires maintaining the deduplication cache, which at 100K messages per minute needs approximately 500 MB of Redis memory for a 10-minute deduplication window.

Kafka Connect for Feature Stores

Streaming Features from Databases to Feature Stores

Kafka Connect's CDC (Change Data Capture) connectors, particularly Debezium, stream database changes to Kafka topics in real time. A Debezium connector on a PostgreSQL database captures every INSERT, UPDATE, and DELETE as a Kafka event with a latency of 50-200 ms from the database WAL. This enables real-time feature computation: when a user updates their profile, the change flows through Kafka to a feature computation job that updates the feature store within seconds, not hours as with batch ETL.

Kafka Connect also provides sink connectors for popular feature stores: Redis for low-latency feature serving, BigQuery for offline feature analysis, and Feast-compatible stores for unified online/offline feature management. A typical ML pipeline uses Debezium as a source connector, Kafka Streams for feature transformation, and a Redis sink connector for the online feature store, all managed through Kafka Connect's declarative configuration.

Pub/Sub Lite vs Standard for High-Throughput ML

Standard Pub/Sub Pricing at ML Scale

Standard Pub/Sub charges $40 per TiB of message data delivered. For an ML pipeline processing 50,000 events per second with an average event size of 1 KB, that is approximately 4.3 TB per day, costing $172/day or $5,160/month in message delivery alone. At this throughput, each subscriber (feature computation, model serving, analytics) multiplies the delivery cost because Pub/Sub charges per delivery, not per publish. Three subscribers triple the cost to $15,480/month.

Pub/Sub Lite's Capacity-Based Model

Pub/Sub Lite uses a capacity-based pricing model similar to Kafka: you pay for provisioned throughput and storage, not per-message delivery. At the same 50,000 events/second workload, Pub/Sub Lite costs approximately $2,500-3,500/month for provisioned publish and subscribe capacity plus storage. The savings are dramatic, roughly 70-80% less than standard Pub/Sub for high-throughput workloads. The trade-off is that Pub/Sub Lite requires capacity planning (you must provision throughput units upfront) and is limited to a single region, unlike standard Pub/Sub's automatic global replication.

Schema Evolution for ML Payloads

Avro and Protobuf for ML Events

ML feature events evolve constantly. New features get added, deprecated features are removed, and feature data types change as models are retrained. Avro and Protobuf schemas provide backward and forward compatibility, allowing producers and consumers to evolve independently. Avro is the more common choice in Kafka ecosystems due to Confluent Schema Registry's native Avro support. Protobuf is preferred in GCP ecosystems and when the same event schema is shared with gRPC services.

Kafka Schema Registry

Confluent Schema Registry enforces schema compatibility rules at the broker level. When a producer tries to publish a message with a schema that breaks backward compatibility, the registry rejects the registration before the message is published. This catches schema-breaking changes in CI/CD rather than in production. For ML pipelines, we configure BACKWARD_TRANSITIVE compatibility mode, which ensures that any consumer can read messages produced with any older schema version, critical when you have consumer services that cannot all be deployed simultaneously.

Pub/Sub Schema Validation

Pub/Sub supports Avro and Protobuf schemas through its Schema service. Topics can be configured with a schema and encoding (JSON or binary), and the service validates messages against the schema at publish time. Schema revisions support backward and forward compatibility checks. The main gap compared to Kafka's Schema Registry is ecosystem tooling: there is no equivalent of Confluent's schema compatibility testing in CI/CD pipelines, so teams typically build custom schema validation steps in their deployment process.

Backpressure Handling in Training Pipelines

Kafka Consumer Lag as Backpressure Signal

When an ML training pipeline consumes events slower than they are produced, Kafka consumer lag increases. This lag is visible per partition and per consumer group, giving you precise insight into which pipeline stage is the bottleneck. Kafka retains messages for the configured retention period (commonly 7 days), so a slow consumer never loses data, it just falls behind. This natural backpressure mechanism is ideal for training pipelines where throughput variation is expected: a GPU training job might pause consumption during gradient computation and resume during data loading.

Pub/Sub Acknowledgement Deadline as Backpressure

Pub/Sub uses acknowledgement deadlines for flow control. If a subscriber does not acknowledge a message within the deadline (default 10 seconds, configurable up to 600 seconds), Pub/Sub redelivers it. For ML training pipelines that process batches slowly, the subscriber must extend the acknowledgement deadline periodically using modifyAckDeadline. Failing to extend causes redelivery, which without deduplication leads to duplicate processing. The maximum backlog retention for a subscription is 7 days, after which unacknowledged messages are discarded.

Kafka Streams vs Dataflow for Feature Engineering

Kafka Streams for Real-Time Feature Computation

Kafka Streams is a Java/Kotlin library (not a separate cluster) that processes Kafka topics with stateful operations: windowed aggregations, joins, and exactly-once processing. For feature engineering, a common pattern is computing rolling 5-minute averages of user activity (page views, clicks, searches) from event streams and writing the aggregated features to a feature store. Kafka Streams handles this with KTable aggregations that maintain local state backed by RocksDB, processing 50,000 events per second on a single 4-core instance.

The advantage of Kafka Streams over a separate processing framework is deployment simplicity: it runs as a regular JVM application, scales by adding instances, and uses Kafka itself for coordination and state backup. There is no separate cluster to manage (unlike Flink or Spark Streaming). The disadvantage is that it only reads from and writes to Kafka topics, so if your feature engineering involves data from non-Kafka sources, you need Kafka Connect to bring that data into Kafka first.

Dataflow for Pub/Sub Feature Engineering

Google Dataflow (Apache Beam runner) is the natural stream processing choice for Pub/Sub. It provides windowed aggregations, stateful processing, and exactly-once delivery semantics. Dataflow auto-scales workers based on backlog size, which is valuable for ML workloads with variable throughput. A Dataflow job processing 50,000 events per second for feature computation typically uses 4-8 n1-standard-4 workers, costing approximately $400-800/month. Dataflow's advantage is unified batch and stream processing: the same Beam pipeline can process historical data for training and real-time data for serving, ensuring feature parity between offline and online features.

Partition Strategy for ML Workloads

Kafka Partition Key Design for ML

Kafka partition keys determine message ordering and consumer parallelism. For ML feature computation, the partition key is typically the entity ID (user ID, device ID, session ID) so that all events for a single entity are processed in order by the same consumer. This ensures that feature aggregations (rolling windows, counters, last-N events) are computed correctly. A common mistake is using a high-cardinality key like event_id, which distributes messages evenly but loses entity-level ordering, making stateful feature computation impossible without additional sorting.

For model serving pipelines where ordering does not matter and maximum throughput is the goal, using a round-robin partition strategy (null key) maximizes consumer parallelism. With 32 partitions and 32 consumer instances, each consumer processes 1/32 of the events independently. Scaling from 16 to 32 consumers is as simple as deploying more instances, Kafka rebalances partitions automatically, though the rebalance pause (10-30 seconds) means brief throughput drops during scale-up events.

Real Throughput Numbers and Cost at Scale

Self-Managed Kafka on Kubernetes

A 3-broker Kafka cluster running on Kubernetes with Strimzi, each broker on an n2-standard-8 instance (8 vCPUs, 32 GB RAM) with 500 GB SSD persistent disks, sustains approximately 200,000 messages per second with 1 KB average message size. This costs roughly $750/month for compute and $150/month for storage on GCP, totaling $900/month. Adding ZooKeeper (or KRaft mode for Kafka 3.5+) requires 3 additional small instances at approximately $200/month. Total infrastructure cost: $1,100/month for a cluster handling 200K msg/s.

Confluent Cloud Pricing

Confluent Cloud on the Standard tier charges based on CKU (Confluent Kafka Units) starting at $1.50/hour for a single CKU that supports approximately 100 MB/s throughput. At 200K messages/second with 1 KB messages (200 MB/s), you need 2 CKUs at $2,160/month, plus $0.10/GB for data transfer, adding approximately $500/month. Total: roughly $2,660/month, about 2.4x the self-managed cost. The trade-off is zero operational burden: no broker management, no ZooKeeper, no capacity planning, automatic upgrades and security patches.

Google Pub/Sub at the Same Scale

Standard Pub/Sub at 200K msg/s with 1 KB messages processes roughly 17 TB/day. At $40/TiB for message delivery (per subscriber), a single subscriber costs approximately $680/day or $20,400/month. Two subscribers double this to $40,800/month. This makes standard Pub/Sub prohibitively expensive for high-throughput ML pipelines. Pub/Sub Lite at the same throughput costs approximately $4,000-5,000/month, comparable to Confluent Cloud and 4-5x more than self-managed Kafka.

The Decision Framework for ML Teams

Choose Kafka when your ML pipeline requires exactly-once processing for feature computation, when you need Kafka Connect for CDC-based feature ingestion from databases, when throughput exceeds 10,000 events per second and cost efficiency matters, when you need Kafka Streams for in-stream feature transformations without a separate processing cluster, or when your team has Kafka operational experience or uses Confluent Cloud.

Choose Pub/Sub when your ML infrastructure runs primarily on GCP and tight integration with Vertex AI, BigQuery, and Dataflow matters more than per-message cost, when your throughput is under 10,000 events per second where standard Pub/Sub pricing is reasonable, when your team has no Kafka operational experience and prefers fully managed infrastructure, or when you need built-in global message delivery without managing cross-region replication.

For ML teams in Bengaluru building their first real-time feature pipeline, we recommend starting with Confluent Cloud if the team is multi-cloud or AWS-centric, and Pub/Sub Lite if the team is GCP-native. Self-managed Kafka makes sense only when throughput exceeds 100K msg/s and the team has dedicated DevOps capacity to manage broker upgrades, partition rebalancing, and storage expansion.

Boolean and Beyond Team

EngineeringImplementationProduction Delivery

March 26, 2026

Insight → Execution

Turn this into a delivery plan

Book an architecture call, validate cost assumptions, and move from strategy to production with measurable milestones.

Get in Touch Estimate cost

Frequently Asked Questions

Yes, standard Pub/Sub auto-scales to millions of messages per second without capacity planning. The constraint is cost, not throughput. At 50,000+ messages per second, standard Pub/Sub's per-delivery pricing becomes expensive relative to Kafka. Pub/Sub Lite offers capacity-based pricing similar to Kafka at roughly 70-80% lower cost than standard Pub/Sub for high-throughput workloads.

Exactly-once semantics prevent duplicate event processing, which directly impacts feature accuracy. In feature computation pipelines, a duplicated click event means the model sees inflated click counts. A duplicated purchase event skews revenue features. Kafka's exactly-once guarantees ensure each event is processed once and only once in a consume-transform-produce pipeline, maintaining feature integrity. The performance overhead is 5-15% compared to at-least-once delivery.

Kafka Streams is simpler to deploy (runs as a regular JVM app) and sufficient for most feature engineering: windowed aggregations, joins, and stateful transformations at up to 100K events per second per instance. Flink is better for complex event processing with multiple input streams, large state sizes exceeding available memory, and exactly-once guarantees across non-Kafka sinks. Choose Kafka Streams for simplicity, Flink for complex topologies.

Use Avro or Protobuf schemas with a Schema Registry (Confluent for Kafka, Pub/Sub Schema service for GCP). Configure BACKWARD_TRANSITIVE compatibility, which ensures new schema versions can be read by all existing consumers. Add new fields with defaults, never rename or remove fields in a single step. For breaking changes, publish to a new topic version and migrate consumers incrementally.

A self-managed 3-broker Kafka cluster on GCP (n2-standard-8 instances) handling 200K messages per second costs approximately $1,100/month including ZooKeeper and storage. Confluent Cloud at the same throughput costs approximately $2,600/month. These costs are for the messaging infrastructure only and do not include stream processing (Kafka Streams, Flink) or feature store compute.

Yes, using a dual-write pattern. Produce events to both Pub/Sub and Kafka simultaneously during a migration window. Migrate consumers from Pub/Sub subscriptions to Kafka consumer groups one at a time, verifying each consumer processes correctly from Kafka before decommissioning its Pub/Sub subscription. The migration window typically lasts 1-2 weeks per consumer service. The dual-write period doubles your messaging costs but ensures zero data loss.

Implementation Links for This Topic

Explore related services, insights, case studies, and planning tools for your next implementation step.

Delivery available from Bengaluru and Coimbatore teams, with remote implementation across India.

Found this helpful?

Back to all insights