ring.h — Fixed-Size Ring Buffer

Introduction

ring.h provides xRingBuffer, a fixed-capacity circular buffer that never reallocates. It is ideal for bounded producer-consumer scenarios where a fixed memory budget is required. The capacity is rounded up to the next power of two internally, enabling bitmask indexing instead of expensive modulo operations.

Design Philosophy

  1. Fixed Capacity, Zero Reallocation — Once created, the ring buffer never grows. Writes that exceed capacity return xErrno_NoMemory. This makes memory usage predictable and avoids allocation latency spikes.

  2. Power-of-Two Masking — The internal capacity is always a power of two. Index computation uses head & mask instead of head % cap, which is significantly faster on most architectures.

  3. Monotonic Cursorshead (write) and tail (read) grow monotonically and never wrap. The actual array index is computed via bitmask. This simplifies the full/empty distinction: head - tail gives the exact readable byte count.

  4. Single Allocation — Like xBuffer, the header and data area are allocated together using a flexible array member.

  5. Scatter-Gather I/O — The ring buffer provides ReadIov/WriteIov helpers that fill iovec arrays for efficient readv()/writev() syscalls, handling the wrap-around transparently.

Architecture

graph LR
    PRODUCER["Producer"] -->|"xRingBufferWrite"| RB["xRingBuffer<br/>(fixed capacity)"]
    RB -->|"xRingBufferRead"| CONSUMER["Consumer"]

    RB -->|"xRingBufferReadIov"| IOV1["iovec[2]"] -->|"writev()"| FD1["fd"]
    FD2["fd"] -->|"readv()"| IOV2["iovec[2]"] -->|"xRingBufferWriteIov"| RB

    style RB fill:#f5a623,color:#fff

Implementation Details

Memory Layout

Single malloc() allocation:
┌───────────────────────┬──────────────────────────────────────┐
│  xRingBuffer_ header  │  data[cap]  (flexible array member)  │
│  cap, mask, head, tail│                                      │
└───────────────────────┴──────────────────────────────────────┘

Circular data layout (cap=8, mask=7):
         tail & mask          head & mask
              ↓                    ↓
  ┌───┬───┬───┬───┬───┬───┬───┬───┐
  │   │   │ R │ R │ R │ W │   │   │
  └───┴───┴───┴───┴───┴───┴───┴───┘
  0   1   2   3   4   5   6   7

  R = readable data (tail..head)
  W = next write position

Internal Structure

XDEF_STRUCT(xRingBuffer_) {
    size_t cap;   // Capacity (power of two)
    size_t mask;  // cap - 1 (for bitmask indexing)
    size_t head;  // Write cursor (monotonic)
    size_t tail;  // Read cursor (monotonic)
    char   data[];// Flexible array member
};

Power-of-Two Rounding

static size_t next_pow2(size_t v) {
    if (v < 16) v = 16;
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    // v |= v >> 32;  (on 64-bit)
    return v + 1;
}

This ensures cap is always a power of two, so mask = cap - 1 produces a valid bitmask. For example, cap = 8mask = 0b111.

Bitmask Indexing

Instead of:

size_t idx = head % cap;  // Expensive division

The ring buffer uses:

size_t idx = head & mask;  // Single AND instruction

This works because cap is a power of two: x % (2^n) == x & (2^n - 1).

Wrap-Around Write

flowchart TD
    WRITE["xRingBufferWrite(rb, data, len)"]
    CHECK{"len <= writable?"}
    FAIL["Return xErrno_NoMemory"]
    POS["pos = head & mask"]
    FIRST["first = cap - pos"]
    WRAP{"len <= first?"}
    SINGLE["memcpy(data+pos, src, len)"]
    SPLIT["memcpy(data+pos, src, first)<br/>memcpy(data, src+first, len-first)"]
    ADVANCE["head += len"]

    WRITE --> CHECK
    CHECK -->|No| FAIL
    CHECK -->|Yes| POS --> FIRST --> WRAP
    WRAP -->|Yes| SINGLE --> ADVANCE
    WRAP -->|No| SPLIT --> ADVANCE

    style FAIL fill:#e74c3c,color:#fff
    style ADVANCE fill:#50b86c,color:#fff

Operations and Complexity

OperationTime ComplexityNotes
xRingBufferWriteO(n)Up to 2 memcpy calls
xRingBufferReadO(n)Up to 2 memcpy calls
xRingBufferPeekO(n)Like Read but doesn't advance tail
xRingBufferDiscardO(1)Just advances tail
xRingBufferLenO(1)head - tail
xRingBufferReadFdO(1)Single readv() syscall
xRingBufferWriteFdO(1)Single writev() syscall

API Reference

Lifecycle

FunctionSignatureDescriptionThread Safety
xRingBufferCreatexRingBuffer xRingBufferCreate(size_t min_cap)Create a ring buffer. Capacity rounded up to power of 2.Not thread-safe
xRingBufferDestroyvoid xRingBufferDestroy(xRingBuffer rb)Free the ring buffer. NULL is a no-op.Not thread-safe
xRingBufferResetvoid xRingBufferReset(xRingBuffer rb)Discard all data, keep memory.Not thread-safe

Query

FunctionSignatureDescriptionThread Safety
xRingBufferLensize_t xRingBufferLen(xRingBuffer rb)Readable bytes.Not thread-safe
xRingBufferCapsize_t xRingBufferCap(xRingBuffer rb)Total capacity.Not thread-safe
xRingBufferWritablesize_t xRingBufferWritable(xRingBuffer rb)Writable bytes.Not thread-safe
xRingBufferEmptybool xRingBufferEmpty(xRingBuffer rb)True if no readable data.Not thread-safe
xRingBufferFullbool xRingBufferFull(xRingBuffer rb)True if no writable space.Not thread-safe

Write

FunctionSignatureDescriptionThread Safety
xRingBufferWritexErrno xRingBufferWrite(xRingBuffer rb, const void *data, size_t len)Write bytes. Returns xErrno_NoMemory if full.Not thread-safe

Read

FunctionSignatureDescriptionThread Safety
xRingBufferReadsize_t xRingBufferRead(xRingBuffer rb, void *out, size_t len)Read and consume bytes. Returns actual count.Not thread-safe
xRingBufferPeeksize_t xRingBufferPeek(xRingBuffer rb, void *out, size_t len)Read without consuming.Not thread-safe
xRingBufferDiscardsize_t xRingBufferDiscard(xRingBuffer rb, size_t n)Discard bytes without copying.Not thread-safe

I/O Helpers

FunctionSignatureDescriptionThread Safety
xRingBufferReadIovint xRingBufferReadIov(xRingBuffer rb, struct iovec iov[2])Fill iovecs with readable regions (for writev).Not thread-safe
xRingBufferWriteIovint xRingBufferWriteIov(xRingBuffer rb, struct iovec iov[2])Fill iovecs with writable regions (for readv).Not thread-safe
xRingBufferReadFdssize_t xRingBufferReadFd(xRingBuffer rb, int fd)Read from fd using readv().Not thread-safe
xRingBufferWriteFdssize_t xRingBufferWriteFd(xRingBuffer rb, int fd)Write to fd using writev().Not thread-safe

Usage Examples

Basic FIFO

#include <stdio.h>
#include <xbuf/ring.h>

int main(void) {
    // Request 1000 bytes; actual capacity will be 1024 (next power of 2)
    xRingBuffer rb = xRingBufferCreate(1000);
    printf("Capacity: %zu\n", xRingBufferCap(rb)); // 1024

    // Write data
    const char *msg = "Hello, Ring!";
    xRingBufferWrite(rb, msg, 12);

    // Read data
    char out[32];
    size_t n = xRingBufferRead(rb, out, sizeof(out));
    printf("Read %zu bytes: %.*s\n", n, (int)n, out);

    xRingBufferDestroy(rb);
    return 0;
}

Network Socket Buffer

#include <xbuf/ring.h>

void event_loop_handler(int sockfd) {
    xRingBuffer rb = xRingBufferCreate(65536); // 64KB ring

    // Read from socket into ring buffer
    ssize_t n = xRingBufferReadFd(rb, sockfd);
    if (n > 0) {
        // Process data...
        // Write processed data back
        xRingBufferWriteFd(rb, sockfd);
    }

    xRingBufferDestroy(rb);
}

Use Cases

  1. Fixed-Budget Network Buffers — When you need predictable memory usage per connection (e.g., 64KB per socket), the ring buffer provides a hard capacity limit.

  2. Logging Ring Buffer — Capture the last N bytes of log output, automatically discarding old data when the buffer wraps.

  3. Inter-Thread Communication — With external synchronization, a ring buffer can serve as a bounded channel between producer and consumer threads.

Best Practices

  • Choose capacity carefully. The ring buffer never grows. If you write more than the capacity, the write fails. Size it for your worst-case scenario.
  • Use scatter-gather I/O. xRingBufferReadFd/WriteFd use readv()/writev() to handle wrap-around in a single syscall, avoiding the need to linearize data.
  • Be aware of power-of-two rounding. Requesting 1000 bytes gives you 1024. Requesting 1025 gives you 2048. Plan accordingly.
  • Check xRingBufferWritable() before writing if you want to handle partial writes gracefully.

Comparison with Other Libraries

Featurexbuf ring.hLinux kfifoBoost circular_bufferDPDK rte_ring
CapacityFixed, power-of-2Fixed, power-of-2Fixed, any sizeFixed, power-of-2
IndexingBitmaskBitmaskModuloBitmask
LayoutFAM (single alloc)Separate allocHeap arrayHuge pages
Thread SafetyNot thread-safeSingle-producer/single-consumerNot thread-safeMulti-producer/multi-consumer
I/O Helpersreadv/writevkfifo_to_user/kfifo_from_userNoNo (packet-oriented)
LanguageC99C (kernel)C++C

Key Differentiator: xbuf's ring buffer combines the power-of-two bitmask optimization (like kfifo) with scatter-gather I/O helpers (readv/writev) in a single-allocation design. It's purpose-built for event-driven network programming where fixed memory budgets and efficient syscalls are essential.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbuf/ring_bench.cpp

BenchmarkSizeTime (ns)CPU (ns)Throughput
BM_Ring_WriteRead646.056.0519.7 GiB/s
BM_Ring_WriteRead25616.816.828.4 GiB/s
BM_Ring_WriteRead1,02427.427.469.6 GiB/s
BM_Ring_WriteRead4,09699.299.276.9 GiB/s
BM_Ring_Throughput4,09622522517.0 GiB/s
BM_Ring_Throughput16,38480680618.9 GiB/s
BM_Ring_Throughput65,5363,1983,19819.1 GiB/s

Key Observations:

  • WriteRead (single write + read cycle) achieves up to ~77 GiB/s at 4KB chunks, demonstrating the efficiency of the bitmask-based wrap-around and memcpy for larger transfers.
  • Throughput (sustained writes until full) stabilizes at ~19 GiB/s regardless of capacity, showing consistent performance as the ring scales.
  • The ring buffer's zero-overhead indexing (bitmask instead of modulo) keeps per-operation cost extremely low — just 6ns for a 64-byte write+read cycle.