Realtime Data Subscription

1. Overview

Kafka topic partitioning is deterministic:

partition = hash(key) % number_of_partitions

The topic partition count and the publisher's partitioning logic MUST match. If a topic is created with 10 partitions, messages with the same key will always go to the same partition.

All messages MUST have a key, and the key MUST be the symbol name.

This ensures:

Ordering per symbol: All messages for a symbol are processed in order
Deterministic routing: Same key always goes to same partition
Even distribution: Keys distribute evenly across partitions
Consistent replay: Reprocessing produces identical partition assignments

2. Keying Standard

All messages must populate the key field. The key must always be the symbol name.

Examples:

Asset Class	Key Format	Examples
Equity	Symbol name	`AAPL`, `TSLA`, `MSFT`
Futures	Contract symbol	`NIFTY_FUT`, `ESZ25`, `NQH25`
Options	Root symbol	`SPXW`, `SPY`, `AAPL`

Key Properties:

Must be non-empty: Empty keys cause uneven distribution
Must be consistent: Same symbol = same key across all producers
Case-sensitive: AAPL ≠ aapl (use consistent casing)
Immutable: Key never changes for a given message

Special Case: Options Keying

For options, the key is the root symbol, not the full OCC option symbol:

Key:     "SPXW"                    (root symbol)
Payload: {
 "symbol": "SPXW", "chain": "SPXW  251028C06875000",  (full OCC symbol in payload) "strike": 6875.0, "put_or_call": "C", ...}

Rationale:

Cache Locality: All options for a root (all strikes, expirations, calls/puts) stay together
Simpler Subscriptions: Clients can subscribe to "all SPXW options" via single key
Message Ordering: Maintains ordering for all contracts of a given root
Implementation: See src/publishers/mock_market_data_publisher.cpp:310

Trade-off: High-volume roots (SPX, SPY) may bottleneck on single partition. For extreme volume scenarios, consider using full OCC symbol as key.

3. Partitioning Rules

Core Principles:

Producer provides key: Kafka's client library computes the partition automatically
Consistent hashing: All Kafka producers use Murmur2 hash algorithm by default
Consumers don't compute: Kafka assigns partitions to consumers via consumer groups
Fixed partition count: Changing partition count after production begins breaks ordering guarantees
Configuration must match: Topic partition count and producer configuration must align

What NOT to Do:

❌ Manually compute partition in application code
❌ Change partition count on existing topics with data
❌ Use different hashing algorithms across producers
❌ Send messages without keys (unless order doesn't matter)

4. Topic Specifications

Equity Topics

market_data.equity.tbbo         -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1s      -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1m      -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1d      -> key = symbol, partition = hash(symbol) % N
market_data.equity.imbalance    -> key = symbol, partition = hash(symbol) % N

Futures Topics

market_data.futures.tbbo        -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1s     -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1m     -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1d     -> key = symbol, partition = hash(symbol) % N

Options Topics

market_data.options.snaps_1s    -> key = root symbol, partition = hash(root) % N

Options Keying Note:

Key is the root symbol (e.g., SPXW, SPY, AAPL)
NOT the full OCC symbol (e.g., SPXW 251028C06875000)
Full OCC symbol is included in message payload as chain field
This groups all strikes/expirations for a root in the same partition
Ensures cache locality and simpler subscription model (subscribe to "all SPXW options")
Trade-off: Single partition per root (may bottleneck for very high-volume roots)

Signals Topics

signals.khawk.quant_features    -> key = symbol, partition = hash(symbol) % N

Note: N = number of partitions configured for the topic (e.g., 10, 20, 50)

5. Producer Implementation

Best Practice (Recommended)

Let Kafka's producer library handle partition assignment by providing the key:

C++ (librdkafka)

// Configuration
const std::string topic = "market_data.equity.tbbo";
const std::string key = symbol;  // e.g., "AAPL"
const std::string payload = json_data;

// Produce message - Kafka computes partition automatically
RdKafka::ErrorCode err = producer->produce(
 topic, RdKafka::Topic::PARTITION_UA,  // PARTITION_UA = Unassigned (let Kafka decide) RdKafka::Producer::RK_MSG_COPY, const_cast<void*>(static_cast<const void*>(payload.data())), payload.size(), key.c_str(),                   // Key for partitioning key.size(), 0, nullptr);

// Kafka internally computes: partition = murmur2(key) % num_partitions

Python (kafka-python)

from kafka import KafkaProducer

# Configuration
producer = KafkaProducer(
 bootstrap_servers=['localhost:9092'], key_serializer=lambda k: k.encode('utf-8'), value_serializer=lambda v: v.encode('utf-8'))

# Produce message - Kafka computes partition automatically
topic = "market_data.equity.tbbo"
key = symbol  # e.g., "AAPL"
payload = json_data

producer.send(
 topic, key=key,      # Kafka uses this to compute partition value=payload)

# Kafka internally computes: partition = murmur2(key) % num_partitions

What Kafka Does Internally

When you provide a key with PARTITION_UA:

Kafka computes: hash = murmur2(key.bytes)
Then: partition = hash % num_partitions
Sends message to computed partition
Guarantees ordering for all messages with same key

Manual Partitioning (NOT Recommended)

// ❌ AVOID: Manual partition computationint partition = custom_hash(key) % num_partitions;
producer.produce(topic, partition, payload, key);

// Problems:
// - Must replicate Kafka's Murmur2 hash exactly
// - Breaks if Kafka changes hashing algorithm
// - Error-prone and difficult to maintain
// - Inconsistent with standard Kafka practices

✅ DO: Provide key, use PARTITION_UA, let Kafka compute partition ❌ DON'T: Manually compute partition in application code ✅ DO: Trust Kafka's built-in Murmur2 partitioner ❌ DON'T: Implement custom partitioning logic

6. Topic Creation Convention

Development (Single Node)

kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server localhost:9092 \ --partitions 10 \ --replication-factor 1 \ --if-not-exists```

### Production (Cluster)
```bash
kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092 \ --partitions 20 \ --replication-factor 3 \ --if-not-exists```

**Important Notes:**
- **Development**: `--replication-factor 1` (single broker)
- **Production**: `--replication-factor 3` (fault tolerance)
- **Partition count**: Choose based on throughput needs (10-50 typical)
- **Cannot decrease**: Can only increase partitions, but breaks ordering

### Partition Count Guidelines

| Expected Throughput | Symbols | Recommended Partitions |
|---------------------|---------|------------------------|
| Low (< 1k msg/s) | < 100 | 10 |
| Medium (1k-10k msg/s) | 100-1000 | 20 |
| High (10k-100k msg/s) | 1000+ | 50 |
| Very High (> 100k msg/s) | 10000+ | 100+ |

---

## 7. Consumer Behavior

### How Consumers Work

**Consumers DO NOT compute partitions.**
Kafka assigns partitions to consumers in a consumer group automatically.

### Consumer Group Mechanics

Topic: market_data.equity.tbbo (10 partitions) Consumer Group: hiveq-dispatcher-group (3 consumers)

Kafka assigns:

Consumer 1: partitions 0, 1, 2, 3
Consumer 2: partitions 4, 5, 6
Consumer 3: partitions 7, 8, 9


### Consumer Code (Simplified)

```cpp
// Create consumer
auto consumer = std::make_shared<KafkaConsumer>(
 dispatcher_pool, "localhost:9092", "hiveq-dispatcher-group", num_kafka_partitions  // For reference only, not used for partition assignment);

// Subscribe to topics - Kafka assigns partitions automatically
consumer->subscribe({"market_data.equity.tbbo", "market_data.equity.bars_1s"});

// Start consuming - reads from assigned partitions
consumer->start();

// Consumer loop
while (running) {
 message = consumer->poll(); // Extract topic, key, payload // Process message}

Rebalancing

When consumers join/leave the group:

Kafka automatically rebalances partition assignments
Each partition is assigned to exactly one consumer in the group
Ordering per partition is maintained

8. HiveQ Architecture Notes

Two-Level Routing

HiveQ uses two independent routing mechanisms:

1. Kafka Partitioning (Broker-Level)

Producer → hash(key) % num_kafka_partitions → Kafka Partition

Determines which Kafka partition stores the message
Handled by Kafka broker using Murmur2 hash
Ensures ordering per symbol within Kafka

2. Dispatcher Routing (Application-Level)

Consumer → hash(key) % num_dispatcher_workers → Worker Thread

Determines which dispatcher worker processes the message
Handled by HiveQ application using std::hash
Distributes processing load across threads

These are independent:

Message in Kafka partition 5 might go to dispatcher worker 2
Different hash functions (Murmur2 vs std::hash)
Different modulos (num_partitions vs num_workers)
Both contribute to overall system performance

9. Summary Checklist

Before Production:

Every topic has a fixed number of partitions
All messages include a key (symbol name)
Producers use PARTITION_UA to let Kafka compute partitions
Topics created with appropriate replication factor (3 for production)
Partition count matches expected throughput
All producers in the ecosystem use standard Kafka clients

During Production:

Monitor partition distribution (should be even)
Monitor consumer lag per partition
Never change partition count on active topics
Ensure all messages have keys
Verify ordering guarantees hold per symbol

Verification Commands:

# List topics and partition counts
kafka-topics.sh --list --bootstrap-server localhost:9092

# Describe topic details
kafka-topics.sh --describe --topic market_data.equity.tbbo --bootstrap-server localhost:9092

# Check consumer group status
kafka-consumer-groups.sh --describe --group hiveq-dispatcher-group --bootstrap-server localhost:9092

# Monitor message flow
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic market_data.equity.tbbo --property print.key=true

Kafka Partitioning Documentation
librdkafka Configuration
HiveQ Implementation: src/kafka/kafka_publisher.cpp:256 (uses PARTITION_UA)
HiveQ Implementation: src/kafka/kafka_consumer.cpp:123 (subscribes to all partitions)

Document Version: 1.0 Last Updated: 2025-11-19 Maintained By: HiveQ Team