HiveQ Docs
HiveQ Data

Realtime Data Subscription

1. Overview

Kafka topic partitioning is deterministic:

partition = hash(key) % number_of_partitions

The topic partition count and the publisher's partitioning logic MUST match. If a topic is created with 10 partitions, messages with the same key will always go to the same partition.

All messages MUST have a key, and the key MUST be the symbol name.

This ensures:

  • Ordering per symbol: All messages for a symbol are processed in order
  • Deterministic routing: Same key always goes to same partition
  • Even distribution: Keys distribute evenly across partitions
  • Consistent replay: Reprocessing produces identical partition assignments

2. Keying Standard

All messages must populate the key field. The key must always be the symbol name.

Examples:

Asset ClassKey FormatExamples
EquitySymbol nameAAPL, TSLA, MSFT
FuturesContract symbolNIFTY_FUT, ESZ25, NQH25
OptionsRoot symbolSPXW, SPY, AAPL

Key Properties:

  • Must be non-empty: Empty keys cause uneven distribution
  • Must be consistent: Same symbol = same key across all producers
  • Case-sensitive: AAPLaapl (use consistent casing)
  • Immutable: Key never changes for a given message

Special Case: Options Keying

For options, the key is the root symbol, not the full OCC option symbol:

Key:     "SPXW"                    (root symbol)
Payload: {
 "symbol": "SPXW", "chain": "SPXW  251028C06875000",  (full OCC symbol in payload) "strike": 6875.0, "put_or_call": "C", ...}

Rationale:

  • Cache Locality: All options for a root (all strikes, expirations, calls/puts) stay together
  • Simpler Subscriptions: Clients can subscribe to "all SPXW options" via single key
  • Message Ordering: Maintains ordering for all contracts of a given root
  • Implementation: See src/publishers/mock_market_data_publisher.cpp:310

Trade-off: High-volume roots (SPX, SPY) may bottleneck on single partition. For extreme volume scenarios, consider using full OCC symbol as key.


3. Partitioning Rules

Core Principles:

  1. Producer provides key: Kafka's client library computes the partition automatically
  2. Consistent hashing: All Kafka producers use Murmur2 hash algorithm by default
  3. Consumers don't compute: Kafka assigns partitions to consumers via consumer groups
  4. Fixed partition count: Changing partition count after production begins breaks ordering guarantees
  5. Configuration must match: Topic partition count and producer configuration must align

What NOT to Do:

  • ❌ Manually compute partition in application code
  • ❌ Change partition count on existing topics with data
  • ❌ Use different hashing algorithms across producers
  • ❌ Send messages without keys (unless order doesn't matter)

4. Topic Specifications

Equity Topics

market_data.equity.tbbo         -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1s      -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1m      -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1d      -> key = symbol, partition = hash(symbol) % N
market_data.equity.imbalance    -> key = symbol, partition = hash(symbol) % N

Futures Topics

market_data.futures.tbbo        -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1s     -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1m     -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1d     -> key = symbol, partition = hash(symbol) % N

Options Topics

market_data.options.snaps_1s    -> key = root symbol, partition = hash(root) % N

Options Keying Note:

  • Key is the root symbol (e.g., SPXW, SPY, AAPL)
  • NOT the full OCC symbol (e.g., SPXW 251028C06875000)
  • Full OCC symbol is included in message payload as chain field
  • This groups all strikes/expirations for a root in the same partition
  • Ensures cache locality and simpler subscription model (subscribe to "all SPXW options")
  • Trade-off: Single partition per root (may bottleneck for very high-volume roots)

Signals Topics

signals.khawk.quant_features    -> key = symbol, partition = hash(symbol) % N

Note: N = number of partitions configured for the topic (e.g., 10, 20, 50)


5. Producer Implementation

Let Kafka's producer library handle partition assignment by providing the key:

C++ (librdkafka)

// Configuration
const std::string topic = "market_data.equity.tbbo";
const std::string key = symbol;  // e.g., "AAPL"
const std::string payload = json_data;

// Produce message - Kafka computes partition automatically
RdKafka::ErrorCode err = producer->produce(
 topic, RdKafka::Topic::PARTITION_UA,  // PARTITION_UA = Unassigned (let Kafka decide) RdKafka::Producer::RK_MSG_COPY, const_cast<void*>(static_cast<const void*>(payload.data())), payload.size(), key.c_str(),                   // Key for partitioning key.size(), 0, nullptr);

// Kafka internally computes: partition = murmur2(key) % num_partitions

Python (kafka-python)

from kafka import KafkaProducer

# Configuration
producer = KafkaProducer(
 bootstrap_servers=['localhost:9092'], key_serializer=lambda k: k.encode('utf-8'), value_serializer=lambda v: v.encode('utf-8'))

# Produce message - Kafka computes partition automatically
topic = "market_data.equity.tbbo"
key = symbol  # e.g., "AAPL"
payload = json_data

producer.send(
 topic, key=key,      # Kafka uses this to compute partition value=payload)

# Kafka internally computes: partition = murmur2(key) % num_partitions

What Kafka Does Internally

When you provide a key with PARTITION_UA:

  1. Kafka computes: hash = murmur2(key.bytes)
  2. Then: partition = hash % num_partitions
  3. Sends message to computed partition
  4. Guarantees ordering for all messages with same key
// ❌ AVOID: Manual partition computationint partition = custom_hash(key) % num_partitions;
producer.produce(topic, partition, payload, key);

// Problems:
// - Must replicate Kafka's Murmur2 hash exactly
// - Breaks if Kafka changes hashing algorithm
// - Error-prone and difficult to maintain
// - Inconsistent with standard Kafka practices

Key Takeaways

DO: Provide key, use PARTITION_UA, let Kafka compute partition ❌ DON'T: Manually compute partition in application code ✅ DO: Trust Kafka's built-in Murmur2 partitioner ❌ DON'T: Implement custom partitioning logic


6. Topic Creation Convention

Development (Single Node)

kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server localhost:9092 \ --partitions 10 \ --replication-factor 1 \ --if-not-exists```

### Production (Cluster)
```bash
kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092 \ --partitions 20 \ --replication-factor 3 \ --if-not-exists```

**Important Notes:**
- **Development**: `--replication-factor 1` (single broker)
- **Production**: `--replication-factor 3` (fault tolerance)
- **Partition count**: Choose based on throughput needs (10-50 typical)
- **Cannot decrease**: Can only increase partitions, but breaks ordering

### Partition Count Guidelines

| Expected Throughput | Symbols | Recommended Partitions |
|---------------------|---------|------------------------|
| Low (< 1k msg/s) | < 100 | 10 |
| Medium (1k-10k msg/s) | 100-1000 | 20 |
| High (10k-100k msg/s) | 1000+ | 50 |
| Very High (> 100k msg/s) | 10000+ | 100+ |

---

## 7. Consumer Behavior

### How Consumers Work

**Consumers DO NOT compute partitions.**
Kafka assigns partitions to consumers in a consumer group automatically.

### Consumer Group Mechanics

Topic: market_data.equity.tbbo (10 partitions) Consumer Group: hiveq-dispatcher-group (3 consumers)

Kafka assigns:

  • Consumer 1: partitions 0, 1, 2, 3
  • Consumer 2: partitions 4, 5, 6
  • Consumer 3: partitions 7, 8, 9

### Consumer Code (Simplified)

```cpp
// Create consumer
auto consumer = std::make_shared<KafkaConsumer>(
 dispatcher_pool, "localhost:9092", "hiveq-dispatcher-group", num_kafka_partitions  // For reference only, not used for partition assignment);

// Subscribe to topics - Kafka assigns partitions automatically
consumer->subscribe({"market_data.equity.tbbo", "market_data.equity.bars_1s"});

// Start consuming - reads from assigned partitions
consumer->start();

// Consumer loop
while (running) {
 message = consumer->poll(); // Extract topic, key, payload // Process message}

Rebalancing

When consumers join/leave the group:

  • Kafka automatically rebalances partition assignments
  • Each partition is assigned to exactly one consumer in the group
  • Ordering per partition is maintained

8. HiveQ Architecture Notes

Two-Level Routing

HiveQ uses two independent routing mechanisms:

1. Kafka Partitioning (Broker-Level)

Producer → hash(key) % num_kafka_partitions → Kafka Partition
  • Determines which Kafka partition stores the message
  • Handled by Kafka broker using Murmur2 hash
  • Ensures ordering per symbol within Kafka

2. Dispatcher Routing (Application-Level)

Consumer → hash(key) % num_dispatcher_workers → Worker Thread
  • Determines which dispatcher worker processes the message
  • Handled by HiveQ application using std::hash
  • Distributes processing load across threads

These are independent:

  • Message in Kafka partition 5 might go to dispatcher worker 2
  • Different hash functions (Murmur2 vs std::hash)
  • Different modulos (num_partitions vs num_workers)
  • Both contribute to overall system performance

9. Summary Checklist

Before Production:

  • Every topic has a fixed number of partitions
  • All messages include a key (symbol name)
  • Producers use PARTITION_UA to let Kafka compute partitions
  • Topics created with appropriate replication factor (3 for production)
  • Partition count matches expected throughput
  • All producers in the ecosystem use standard Kafka clients

During Production:

  • Monitor partition distribution (should be even)
  • Monitor consumer lag per partition
  • Never change partition count on active topics
  • Ensure all messages have keys
  • Verify ordering guarantees hold per symbol

Verification Commands:

# List topics and partition counts
kafka-topics.sh --list --bootstrap-server localhost:9092

# Describe topic details
kafka-topics.sh --describe --topic market_data.equity.tbbo --bootstrap-server localhost:9092

# Check consumer group status
kafka-consumer-groups.sh --describe --group hiveq-dispatcher-group --bootstrap-server localhost:9092

# Monitor message flow
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic market_data.equity.tbbo --property print.key=true

10. Troubleshooting

Problem: Uneven partition distribution

Cause: Keys not distributed evenly (e.g., only a few symbols) Solution: Use more diverse keys or reduce partition count

Problem: Out-of-order messages for same symbol

Cause: Producer not setting key or using wrong partition Solution: Verify key is set and using PARTITION_UA

Problem: "Unknown topic or partition" error

Cause: Topic not created before producing Solution: Use --if-not-exists flag or auto-create topics (as implemented in HiveQ)

Problem: Consumer lag increasing

Cause: Too few partitions for throughput or too few consumers Solution: Increase partitions (new topic) or add more consumers to group


References


Document Version: 1.0 Last Updated: 2025-11-19 Maintained By: HiveQ Team