xKit

Welcome to the xKit documentation. xKit is a collection of low-level C building blocks for event-driven, asynchronous programming on macOS and Linux. (Windows is on the roadmap but not a near-term priority).

Designed and reviewed by Leo X.
Coded by Codebuddy with claude-4.6-opus

Architecture Overview

graph TD
    subgraph "Application Layer"
        APP["User Application"]
    end

    subgraph "High-Level Modules"
        XHTTP["xhttp<br/>HTTP Client &amp; Server &amp; WebSocket"]
        XLOG["xlog<br/>Async Logging"]
    end

    subgraph "Networking Layer"
        XNET["xnet<br/>URL / DNS / TLS Config / TCP"]
    end

    subgraph "Buffer Layer"
        XBUF["xbuf<br/>Buffer Primitives"]
    end

    subgraph "Core Layer"
        XBASE["xbase<br/>Core Primitives"]
    end

    APP --> XHTTP
    APP --> XLOG
    APP --> XNET
    APP --> XBUF
    APP --> XBASE
    XHTTP --> XNET
    XHTTP --> XBASE
    XHTTP --> XBUF
    XNET --> XBASE
    XLOG --> XBASE
    XBUF -->|"atomic.h"| XBASE

    style XBASE fill:#50b86c,color:#fff
    style XBUF fill:#4a90d9,color:#fff
    style XNET fill:#e74c3c,color:#fff
    style XHTTP fill:#f5a623,color:#fff
    style XLOG fill:#9b59b6,color:#fff

Module Index

xbase — Core Primitives

The foundation of xKit. Provides event loop, timers, tasks, async sockets, memory management, and lock-free data structures.

Sub-Module	Description
event.h	Cross-platform event loop — kqueue (macOS) / epoll (Linux) / poll (fallback)
timer.h	Monotonic timer with Push (thread-pool) and Poll (lock-free MPSC) fire modes
task.h	N:M task model — lightweight tasks multiplexed onto a thread pool
socket.h	Async socket abstraction with idle-timeout support
memory.h	Reference-counted allocation with vtable-driven lifecycle
error.h	Unified error codes and human-readable messages
heap.h	Min-heap with index tracking (used by timer subsystem)
mpsc.h	Lock-free multi-producer / single-consumer queue
atomic.h	Compiler-portable atomic operations (GCC/Clang builtins)
log.h	Per-thread callback-based logging with optional backtrace
backtrace.h	Platform-adaptive stack trace (libunwind > execinfo > stub)
`time.h`	Time utilities: `xMonoMs()` (monotonic) and `xWallMs()` (wall-clock)

xbuf — Buffer Primitives

Three buffer types for different I/O patterns — linear, ring, and block-chain.

Sub-Module	Description
buf.h	Linear auto-growing byte buffer with 2× expansion
ring.h	Fixed-size ring buffer with power-of-2 mask indexing
io.h	Reference-counted block-chain I/O buffer with zero-copy split/cut

xnet — Networking Primitives

Shared networking utilities: URL parser, async DNS resolver, and TLS configuration types used by higher-level modules.

Sub-Module	Description
url.h	Lightweight URL parser with zero-copy component extraction
dns.h	Async DNS resolution via thread-pool offload
tls.h	Shared TLS configuration types (client & server)
tcp.h	Async TCP connection, connector & listener with optional TLS

xhttp — Async HTTP Client & Server & WebSocket

Full-featured async HTTP framework: libcurl-powered client with SSE streaming, event-driven server with HTTP/1.1 & HTTP/2 (h2c), TLS support (OpenSSL / mbedTLS), and RFC 6455 WebSocket (server & client).

Sub-Module	Description
client.h	Async HTTP client (GET / POST / PUT / DELETE / PATCH / HEAD)
sse.c	SSE streaming client with W3C-compliant event parsing
server.h	Event-driven HTTP server with HTTP/1.1 and HTTP/2 (h2c)
ws.h	RFC 6455 WebSocket server with handler-initiated upgrade
ws.h	RFC 6455 WebSocket client with async connect
transport.h	Pluggable TLS transport layer (OpenSSL / mbedTLS / plain)

xlog — Async Logging

High-performance async logger with MPSC queue, three flush modes, and file rotation.

Sub-Module	Description
logger.h	Async logger with Timer / Notify / Mixed modes and `XLOG_*` macros

bench — End-to-End Benchmarks

End-to-end benchmark results comparing xKit against other frameworks in real-world scenarios.

Benchmark	Description
HTTP/1.1 Server	xKit single-threaded HTTP/1.1 server vs Go `net/http` — GET/POST throughput and latency
HTTP/2 Server	xKit single-threaded HTTP/2 (h2c) server vs Go `net/http` h2c — GET/POST throughput and latency
HTTPS Server	xKit single-threaded HTTPS (TLS 1.3) server vs Go `net/http` — GET/POST throughput and latency

By Use Case

I want to...	Start here
Build an event-driven server	xbase/event.h → xbase/socket.h
Schedule timers	xbase/timer.h
Run tasks on a thread pool	xbase/task.h
Make async HTTP requests	xhttp/client.h
Stream LLM API responses (SSE)	xhttp/sse.c
Build an HTTP server	xhttp/server.h
Add WebSocket server	xhttp/ws.h
Connect as WebSocket client	xhttp/ws.h
Parse a URL	xnet/url.h
Resolve DNS asynchronously	xnet/dns.h
Make async TCP connections	xnet/tcp.h
Build a TCP server	xnet/tcp.h
Configure TLS	xnet/tls.h
Enable TLS (HTTPS)	xhttp/transport.h
Add async logging	xlog/logger.h
Manage object lifecycles	xbase/memory.h
Choose the right buffer type	xbuf overview
Build a lock-free producer/consumer pipeline	xbase/mpsc.h
See micro-benchmark results	Each module doc has a Benchmark section (e.g. mpsc.h)
See HTTP server benchmarks	HTTP/1.1 · HTTP/2 · HTTPS

By Dependency Level

Level 0 (no deps)     : atomic.h, error.h, time.h
Level 1 (atomic only) : heap.h, mpsc.h
Level 2 (Level 0-1)   : memory.h, log.h, backtrace.h, buf.h, ring.h
Level 3 (Level 0-2)   : event.h, io.h, url.h, tls.h
Level 4 (event loop)  : timer.h, task.h, socket.h, dns.h, tcp.h, logger.h, client.h, server.h, ws.h

Module Dependency Graph

graph BT
    subgraph "Level 0"
        ATOMIC["atomic.h"]
        ERROR["error.h"]
        TIME["time.h"]
    end

    subgraph "Level 1"
        HEAP["heap.h"]
        MPSC["mpsc.h"]
    end

    subgraph "Level 2"
        MEMORY["memory.h"]
        LOG["log.h"]
        BT_["backtrace.h"]
        BUF["buf.h"]
        RING["ring.h"]
    end

    subgraph "Level 3"
        EVENT["event.h"]
        IO["io.h"]
        URL["url.h"]
        TLS_CONF["tls.h"]
    end

    subgraph "Level 4"
        TIMER["timer.h"]
        TASK["task.h"]
        SOCKET["socket.h"]
        DNS["dns.h"]
        TCP["tcp.h"]
        LOGGER["logger.h"]
        CLIENT["client.h"]
        SERVER["server.h"]
        WS["ws.h"]
    end

    HEAP --> ATOMIC
    MPSC --> ATOMIC
    MEMORY --> ERROR
    LOG --> BT_
    IO --> ATOMIC
    IO --> BUF
    EVENT --> HEAP
    EVENT --> MPSC
    EVENT --> TIME
    TIMER --> EVENT
    TASK --> EVENT
    SOCKET --> EVENT
    DNS --> EVENT
    TCP --> EVENT
    TCP --> DNS
    TCP --> SOCKET
    TCP --> TLS_CONF
    LOGGER --> EVENT
    LOGGER --> MPSC
    LOGGER --> LOG
    CLIENT --> EVENT
    CLIENT --> BUF
    CLIENT --> URL
    CLIENT --> DNS
    CLIENT --> TLS_CONF
    SERVER --> SOCKET
    SERVER --> BUF
    SERVER --> TLS_CONF
    WS --> SERVER
    WS --> URL

    style EVENT fill:#50b86c,color:#fff
    style URL fill:#e74c3c,color:#fff
    style DNS fill:#e74c3c,color:#fff
    style TCP fill:#e74c3c,color:#fff
    style TLS_CONF fill:#e74c3c,color:#fff
    style CLIENT fill:#f5a623,color:#fff
    style SERVER fill:#f5a623,color:#fff
    style WS fill:#f5a623,color:#fff
    style LOGGER fill:#9b59b6,color:#fff

Build & Test

# Build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel

# Test
ctest --test-dir build --output-on-failure --parallel 4

See the project README for full build instructions, prerequisites, and container-based Linux testing.

Benchmark

Micro-benchmark results are included in each module's documentation page (see the Benchmark section at the bottom of each page, e.g. mpsc.h, buf.h).

End-to-end benchmarks:

Benchmark	Description
HTTP/1.1 Server	xKit vs Go `net/http` — 152K req/s single-threaded, +15~60% faster across all scenarios
HTTP/2 Server	xKit vs Go h2c — single-threaded HTTP/2 (h2c) throughput comparison
HTTPS Server	xKit vs Go HTTPS — single-threaded TLS 1.3 throughput comparison

License

Modules

xKit is organized into five modules, layered from low-level core primitives up to high-level async networking.

┌─────────────────────────────────────────────┐
│              Application Layer              │
├──────────────────────┬──────────────────────┤
│   xhttp              │   xlog               │
│   HTTP Client/Server │   Async Logging       │
│   WebSocket          │                      │
├──────────────────────┴──────────────────────┤
│   xnet — URL / DNS / TCP / TLS Config       │
├─────────────────────────────────────────────┤
│   xbuf — Linear / Ring / Block-Chain Buffer │
├─────────────────────────────────────────────┤
│   xbase — Event Loop / Timer / Task /       │
│           Memory / Atomic / MPSC Queue      │
└─────────────────────────────────────────────┘

Overview

Module	Description
xbase	Core primitives — event loop, timers, tasks, async sockets, memory, lock-free data structures
xbuf	Buffer primitives — linear, ring, and block-chain I/O buffers
xnet	Networking primitives — URL parser, async DNS resolver, TCP, shared TLS configuration types
xhttp	Async HTTP client & server — libcurl multi-socket client with SSE streaming, HTTP/1.1 & HTTP/2 async server with TLS, WebSocket server & client
xlog	Async logging — MPSC queue, timer/pipe flush, log rotation

Dependency Order

Level 0 (no deps)     : atomic.h, error.h, time.h
Level 1 (atomic only) : heap.h, mpsc.h
Level 2 (Level 0-1)   : memory.h, log.h, backtrace.h, buf.h, ring.h
Level 3 (Level 0-2)   : event.h, io.h, url.h, tls.h
Level 4 (event loop)  : timer.h, task.h, socket.h, dns.h, tcp.h, logger.h, client.h, server.h, ws.h

xbase — Event-Driven Async Foundation

Introduction

xbase is the foundational module of xKit, providing the core primitives for building event-driven, asynchronous C applications on macOS and Linux. It delivers a cross-platform event loop, monotonic timers, an N:M task model (thread pool), async sockets, reference-counted memory management, lock-free data structures, and essential utilities — all in a minimal, zero-dependency C99 package.

xbase is designed to be the "kernel" that higher-level xKit modules (xbuf, xhttp, xlog) build upon. Every I/O-bound or timer-driven feature in xKit ultimately relies on xbase's event loop and concurrency primitives.

Design Philosophy

Edge-Triggered by Default — The event loop operates in edge-triggered mode across all backends (kqueue, epoll, poll), encouraging callers to drain file descriptors completely. This yields higher throughput and fewer spurious wakeups compared to level-triggered designs.
Layered Abstraction — Low-level primitives (atomic, mpsc, heap) are composed into mid-level services (timer, task) which are then integrated into the high-level event loop. Each layer is independently usable.
Zero Allocation in the Hot Path — Data structures like the MPSC queue and min-heap are designed to avoid dynamic allocation during normal operation. Memory is pre-allocated or embedded in user structs.
Thread-Safety Where It Matters — APIs that are expected to be called cross-thread (e.g., xEventWake, xTimerSubmitAfter, xMpscPush) are explicitly designed to be thread-safe. Single-threaded APIs are documented as such.
vtable-Driven Lifecycle — The memory module uses a virtual table pattern (ctor/dtor/retain/release) to provide reference-counted object management in pure C, inspired by Objective-C's retain/release model.
Platform Adaptation at Build Time — Platform-specific code (kqueue vs. epoll, libunwind vs. execinfo) is selected via compile-time macros, keeping runtime overhead at zero.

Architecture

graph TD
    subgraph "High-Level Services"
        EVENT["event.h<br/>Event Loop"]
        TIMER["timer.h<br/>Monotonic Timer"]
        TASK["task.h<br/>N:M Task Model"]
        SOCKET["socket.h<br/>Async Socket"]
    end

    subgraph "Infrastructure"
        MEMORY["memory.h<br/>Ref-Counted Memory"]
        LOG["log.h<br/>Thread-Local Log"]
        BACKTRACE["backtrace.h<br/>Stack Backtrace"]
        ERROR["error.h<br/>Error Codes"]
        TIME["time.h<br/>Time Utilities"]
    end

    subgraph "Data Structures & Concurrency"
        HEAP["heap.h<br/>Min-Heap"]
        MPSC["mpsc.h<br/>Lock-Free MPSC Queue"]
        ATOMIC["atomic.h<br/>Atomic Operations"]
    end

    EVENT -->|"registers timers"| TIMER
    EVENT -->|"offloads work"| TASK
    EVENT -->|"wraps fd"| SOCKET
    SOCKET -->|"monitors I/O"| EVENT
    SOCKET -->|"idle timeout"| EVENT

    TIMER -->|"schedules entries"| HEAP
    TIMER -->|"poll-mode queue"| MPSC
    TIMER -->|"push-mode dispatch"| TASK
    TIMER -->|"reads clock"| TIME

    MPSC -->|"CAS operations"| ATOMIC
    MEMORY -->|"atomic refcount"| ATOMIC

    LOG -->|"fatal backtrace"| BACKTRACE
    LOG -->|"error formatting"| ERROR

    EVENT -->|"reads clock"| TIME

    style EVENT fill:#4a90d9,color:#fff
    style TIMER fill:#4a90d9,color:#fff
    style TASK fill:#4a90d9,color:#fff
    style SOCKET fill:#4a90d9,color:#fff
    style MEMORY fill:#50b86c,color:#fff
    style LOG fill:#50b86c,color:#fff
    style BACKTRACE fill:#50b86c,color:#fff
    style ERROR fill:#50b86c,color:#fff
    style TIME fill:#50b86c,color:#fff
    style HEAP fill:#f5a623,color:#fff
    style MPSC fill:#f5a623,color:#fff
    style ATOMIC fill:#f5a623,color:#fff

Sub-Module Overview

Header	Document	Description
`event.h`	event.md	Cross-platform event loop (edge-triggered) — kqueue / epoll / poll backends with built-in timer and thread-pool integration
`timer.h`	timer.md	Monotonic timer with push (thread-pool) and poll (lock-free MPSC) fire modes
`task.h`	task.md	N:M task model — lightweight tasks multiplexed onto a configurable thread pool
`socket.h`	socket.md	Async socket abstraction with idle-timeout support over xEventLoop
`memory.h`	memory.md	Reference-counted allocation with vtable-driven lifecycle (ctor/dtor/retain/release)
`log.h`	log.md	Per-thread callback-based logging with optional backtrace on fatal
`backtrace.h`	backtrace.md	Platform-adaptive stack trace capture (libunwind > execinfo > stub)
`error.h`	error.md	Unified error codes (`xErrno`) and human-readable messages
`heap.h`	heap.md	Generic min-heap with O(log n) insert/remove, used internally by the timer subsystem
`mpsc.h`	mpsc.md	Lock-free multi-producer / single-consumer intrusive queue
`atomic.h`	atomic.md	Compiler-portable atomic operations (GCC/Clang `__atomic` builtins)
`io.h`	io.md	Abstract I/O interfaces (Reader, Writer, Seeker, Closer) with convenience helpers (xReadFull, xReadAll, xWritev, etc.)
`time.h`	—	Time utilities: `xMonoMs()` (monotonic) and `xWallMs()` (wall-clock) in milliseconds

How to Choose

I need to…	Use
React to I/O readiness on file descriptors	`event.h` — register fds and get edge-triggered callbacks
Schedule delayed or periodic work	`timer.h` — standalone timer, or use `xEventLoopTimerAfter()` for event-loop-integrated timers
Run CPU-bound work off the main thread	`task.h` — submit to a thread pool, optionally collect results
Manage non-blocking TCP/UDP connections	`socket.h` — wraps socket + event loop + idle timeout
Allocate objects with automatic cleanup	`memory.h` — `XMALLOC(T)` + `xRetain`/`xRelease`
Report errors from library internals	`log.h` — thread-local callback, or stderr fallback
Capture a stack trace for debugging	`backtrace.h` — `xBacktrace()` fills a buffer
Handle error codes uniformly	`error.h` — `xErrno` enum + `xstrerror()`
Build a priority queue	`heap.h` — generic min-heap with index tracking
Pass messages between threads lock-free	`mpsc.h` — intrusive MPSC queue
Perform atomic read-modify-write	`atomic.h` — macro wrappers over compiler builtins
Get current time in milliseconds	`time.h` — `xMonoMs()` for elapsed time, `xWallMs()` for wall-clock
Read/write through abstract I/O interfaces	`io.h` — `xReader` / `xWriter` + helpers like `xReadFull`, `xReadAll`

Quick Start

A minimal example that creates an event loop, schedules a one-shot timer, and runs until the timer fires:

#include <stdio.h>
#include <xbase/event.h>

static void on_timer(void *arg) {
    printf("Timer fired!\n");
    xEventLoopStop((xEventLoop)arg);
}

int main(void) {
    // Create an event loop
    xEventLoop loop = xEventLoopCreate();
    if (!loop) return 1;

    // Schedule a timer to fire after 1 second
    xEventLoopTimerAfter(loop, on_timer, loop, 1000);

    // Run the event loop (blocks until xEventLoopStop is called)
    xEventLoopRun(loop);

    // Clean up
    xEventLoopDestroy(loop);
    return 0;
}

Compile with:

gcc -o example example.c -I/path/to/xkit -lxbase -lpthread

Relationship with Other Modules

graph LR
    XBASE["xbase"]
    XBUF["xbuf"]
    XHTTP["xhttp"]
    XLOG["xlog"]

    XHTTP -->|"event loop + timer"| XBASE
    XHTTP -->|"I/O buffers"| XBUF
    XLOG -->|"event loop + MPSC queue"| XBASE
    XBUF -.->|"no dependency"| XBASE
    XNET["xnet"]
    XNET -->|"event loop + thread pool + atomic"| XBASE
    XHTTP -->|"URL + DNS + TLS config"| XNET

    style XBASE fill:#4a90d9,color:#fff
    style XBUF fill:#50b86c,color:#fff
    style XHTTP fill:#f5a623,color:#fff
    style XLOG fill:#e74c3c,color:#fff
    style XNET fill:#e74c3c,color:#fff

xbuf — Buffer module. xIOBuffer uses xbase's atomic.h for lock-free block pool management. xhttp uses both xbase and xbuf together.
xhttp — The async HTTP client is built on top of xbase's event loop (xEventLoop) and timer infrastructure, and uses xbuf for response buffering.
xnet — The networking primitives module. The async DNS resolver uses xbase's event loop for thread-pool offload (xEventLoopSubmit) and atomic.h for the cancellation flag.
xlog — The async logger uses xbase's event loop for timer-based flushing and the MPSC queue for lock-free log message passing from application threads to the logger thread.

event.h — Cross-Platform Event Loop

Introduction

event.h provides a cross-platform, edge-triggered event loop abstraction for I/O multiplexing. It unifies three OS-specific backends — kqueue (macOS/BSD), epoll (Linux), and poll (POSIX fallback) — behind a single API. The event loop is the central coordination point in xbase: it monitors file descriptors for readiness, dispatches timer callbacks, offloads CPU-bound work to thread pools, and watches for POSIX signals — all from a single thread.

Design Philosophy

Edge-Triggered Everywhere — All three backends operate in edge-triggered mode. kqueue uses EV_CLEAR, epoll uses EPOLLET, and poll emulates edge-triggered behavior by clearing the event mask after each notification (requiring the caller to re-arm via xEventMod()). This design encourages callers to drain fds completely, reducing spurious wakeups.
Backend Selection at Compile Time — The backend is chosen via preprocessor macros (XK_HAS_KQUEUE, XK_HAS_EPOLL), with poll as the universal fallback. This means zero runtime dispatch overhead.
Integrated Timer Heap — Rather than requiring a separate timer facility, the event loop embeds a min-heap of timer entries. xEventWait() automatically adjusts its timeout to fire the earliest timer, providing sub-millisecond timer resolution without a dedicated timer thread.
Thread-Pool Offload — xEventLoopSubmit() bridges the event loop and the task system: CPU-bound work runs on a worker thread, and the completion callback is dispatched on the event loop thread via a lock-free MPSC queue + wake pipe, ensuring single-threaded callback semantics.
Self-Pipe Trick for Signals — On epoll and poll backends, signal delivery uses the self-pipe trick (a sigaction handler writes to a pipe) rather than signalfd, avoiding the fragile requirement of blocking signals in every thread. On kqueue, EVFILT_SIGNAL is used natively.

Architecture

graph TD
    subgraph "Event Loop (single thread)"
        WAIT["xEventWait()"]
        DISPATCH["Dispatch I/O callbacks"]
        TIMERS["Fire expired timers"]
        DONE["Drain done-queue"]
        SWEEP["Sweep deleted sources"]
    end

    subgraph "Backend (compile-time)"
        KQ["kqueue"]
        EP["epoll"]
        PO["poll"]
    end

    subgraph "Cross-Thread"
        WAKE["Wake Pipe"]
        MPSC_Q["MPSC Done Queue"]
        WORKER["Worker Thread Pool"]
    end

    WAIT --> KQ
    WAIT --> EP
    WAIT --> PO
    KQ --> DISPATCH
    EP --> DISPATCH
    PO --> DISPATCH
    DISPATCH --> TIMERS
    TIMERS --> DONE
    DONE --> SWEEP

    WORKER -->|"push result"| MPSC_Q
    MPSC_Q -->|"wake"| WAKE
    WAKE -->|"drain"| DONE

    style WAIT fill:#4a90d9,color:#fff
    style DISPATCH fill:#4a90d9,color:#fff
    style TIMERS fill:#f5a623,color:#fff
    style DONE fill:#50b86c,color:#fff

Event Loop Lifecycle

sequenceDiagram
    participant App
    participant EL as xEventLoop
    participant Backend as kqueue / epoll / poll
    participant Timer as Timer Heap

    App->>EL: xEventLoopCreate()
    App->>EL: xEventAdd(fd, mask, callback)
    App->>EL: xEventLoopTimerAfter(fn, 1000ms)
    App->>EL: xEventLoopRun()

    loop Main Loop
        EL->>Timer: Check earliest deadline
        Timer-->>EL: timeout = min(user_timeout, timer_deadline)
        EL->>Backend: wait(timeout)
        Backend-->>EL: ready events
        EL->>App: callback(fd, mask)
        EL->>Timer: Pop & fire expired timers
        EL->>EL: Sweep deleted sources
    end

    App->>EL: xEventLoopStop()
    App->>EL: xEventLoopDestroy()

Implementation Details

Backend Architecture

Each backend is implemented in a separate .c file that provides the full public API:

File	Backend	Trigger Mode	Selection
`event_kqueue.c`	kqueue	`EV_CLEAR` (native edge)	`#ifdef XK_HAS_KQUEUE`
`event_epoll.c`	epoll	`EPOLLET` (native edge)	`#ifdef XK_HAS_EPOLL`
`event_poll.c`	poll(2)	Emulated edge (mask cleared after dispatch)	Fallback

All backends share a common base structure (struct xEventLoop_) defined in event_private.h, which contains:

A dynamic source array with deferred deletion (sweep after dispatch)
A wake pipe (non-blocking) for cross-thread wakeup
A min-heap for builtin timers (protected by timer_mu mutex)
A lock-free MPSC done-queue for offload completion callbacks
Signal watch slots (up to XK_SIGNAL_MAX = 64)

Deferred Source Deletion

When xEventDel() is called during a callback dispatch, the source is marked deleted = 1 rather than freed immediately. After the dispatch batch completes, source_array_sweep() frees all deleted sources. This prevents use-after-free when multiple events reference the same source in a single xEventWait() call.

Wake Pipe

A non-blocking pipe (wake_rfd / wake_wfd) is registered with the backend. xEventWake() writes a single byte to the write end; the event loop drains the read end and processes the done-queue. Multiple wakes before the next xEventWait() are coalesced (EAGAIN on a full pipe is treated as success).

Timer Integration

Builtin timers are stored in a min-heap inside the event loop. Before each xEventWait() call, the effective timeout is clamped to the earliest timer deadline. After I/O dispatch, expired timers are popped and fired. Timer operations (xEventLoopTimerAfter, xEventLoopTimerAt, xEventLoopTimerCancel) are thread-safe, protected by timer_mu.

Signal Handling

Backend	Mechanism
kqueue	`EVFILT_SIGNAL` with `EV_CLEAR` — native kernel support
epoll	Self-pipe trick: `sigaction` handler writes to a per-signal pipe
poll	Self-pipe trick: same as epoll

The self-pipe approach avoids signalfd's requirement to block signals in all threads, which is fragile in the presence of third-party libraries and test frameworks.

API Reference

Types

Type	Description
`xEventMask`	Bitmask enum: `xEvent_Read` (1), `xEvent_Write` (2), `xEvent_Timeout` (4)
`xEventFunc`	`void ()(int fd, xEventMask mask, void arg)` — I/O callback
`xEventTimerFunc`	`void ()(void arg)` — Timer callback
`xEventSignalFunc`	`void ()(int signo, void arg)` — Signal callback
`xEventDoneFunc`	`void ()(void arg, void *result)` — Offload completion callback
`xEventLoop`	Opaque handle to an event loop
`xEventSource`	Opaque handle to a registered event source
`xEventTimer`	Opaque handle to a builtin timer

Functions

Lifecycle

Function	Signature	Thread Safety
`xEventLoopCreate`	`xEventLoop xEventLoopCreate(void)`	Not thread-safe
`xEventLoopCreateWithGroup`	`xEventLoop xEventLoopCreateWithGroup(xTaskGroup group)`	Not thread-safe
`xEventLoopDestroy`	`void xEventLoopDestroy(xEventLoop loop)`	Not thread-safe
`xEventLoopRun`	`void xEventLoopRun(xEventLoop loop)`	Not thread-safe (call from one thread)
`xEventLoopStop`	`void xEventLoopStop(xEventLoop loop)`	Thread-safe

I/O Sources

Function	Signature	Thread Safety
`xEventAdd`	`xEventSource xEventAdd(xEventLoop loop, int fd, xEventMask mask, xEventFunc fn, void *arg)`	Not thread-safe
`xEventMod`	`xErrno xEventMod(xEventLoop loop, xEventSource src, xEventMask mask)`	Not thread-safe
`xEventDel`	`xErrno xEventDel(xEventLoop loop, xEventSource src)`	Not thread-safe
`xEventWait`	`int xEventWait(xEventLoop loop, int timeout_ms)`	Not thread-safe

Timers

Function	Signature	Thread Safety
`xEventLoopTimerAfter`	`xEventTimer xEventLoopTimerAfter(xEventLoop loop, xEventTimerFunc fn, void *arg, uint64_t delay_ms)`	Thread-safe
`xEventLoopTimerAt`	`xEventTimer xEventLoopTimerAt(xEventLoop loop, xEventTimerFunc fn, void *arg, uint64_t abs_ms)`	Thread-safe
`xEventLoopTimerCancel`	`xErrno xEventLoopTimerCancel(xEventLoop loop, xEventTimer timer)`	Thread-safe

Cross-Thread

Function	Signature	Thread Safety
`xEventWake`	`xErrno xEventWake(xEventLoop loop)`	Thread-safe (signal-handler-safe)
`xEventLoopSubmit`	`xErrno xEventLoopSubmit(xEventLoop loop, xTaskGroup group, xTaskFunc work_fn, xEventDoneFunc done_fn, void *arg)`	Thread-safe

Signal

Function	Signature	Thread Safety
`xEventLoopSignalWatch`	`xErrno xEventLoopSignalWatch(xEventLoop loop, int signo, xEventSignalFunc fn, void *arg)`	Not thread-safe

Deprecated

Function	Signature	Replacement
`xEventLoopNowMs`	`uint64_t xEventLoopNowMs(void)`	`xMonoMs()` from `<xbase/time.h>`

Usage Examples

Basic Event Loop with Timer

#include <stdio.h>
#include <xbase/event.h>

static void on_timer(void *arg) {
    printf("Timer fired!\n");
    xEventLoopStop((xEventLoop)arg);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    if (!loop) return 1;

    // Fire after 500ms
    xEventLoopTimerAfter(loop, on_timer, loop, 500);

    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

Monitoring a File Descriptor

#include <stdio.h>
#include <unistd.h>
#include <xbase/event.h>

static void on_readable(int fd, xEventMask mask, void *arg) {
    char buf[1024];
    ssize_t n;
    // Edge-triggered: must drain completely
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        fwrite(buf, 1, (size_t)n, stdout);
    }
    (void)mask;
    (void)arg;
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    // Monitor stdin for readability
    xEventAdd(loop, STDIN_FILENO, xEvent_Read, on_readable, NULL);

    // Run for up to 10 seconds
    xEventLoopTimerAfter(loop, (xEventTimerFunc)xEventLoopStop, loop, 10000);
    xEventLoopRun(loop);

    xEventLoopDestroy(loop);
    return 0;
}

Offloading Work to a Thread Pool

#include <stdio.h>
#include <xbase/event.h>

static void *heavy_work(void *arg) {
    // Runs on a worker thread
    int *val = (int *)arg;
    *val *= 2;
    return val;
}

static void on_done(void *arg, void *result) {
    // Runs on the event loop thread
    int *val = (int *)result;
    printf("Result: %d\n", *val);
    (void)arg;
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    int value = 21;

    xEventLoopSubmit(loop, NULL, heavy_work, on_done, &value);

    // Run briefly to process the completion
    xEventLoopTimerAfter(loop, (xEventTimerFunc)xEventLoopStop, loop, 1000);
    xEventLoopRun(loop);

    xEventLoopDestroy(loop);
    return 0;
}

Use Cases

Network Servers — Register listening sockets and accepted connections with the event loop. Use edge-triggered callbacks to read/write data without blocking. Combine with xSocket for idle-timeout support.
Timer-Driven State Machines — Use xEventLoopTimerAfter() to schedule state transitions, retries, or heartbeat checks. The timer is integrated into the event loop, so no separate timer thread is needed.
Hybrid I/O + CPU Workloads — Use xEventLoopSubmit() to offload CPU-intensive parsing or compression to a thread pool, then process results on the event loop thread where I/O state is safely accessible.

Best Practices

Always drain fds in edge-triggered mode. Read/write until EAGAIN in every callback. Missing data means you won't be notified again until new data arrives.
Never block in callbacks. The event loop is single-threaded; a blocking call stalls all I/O and timer processing. Offload heavy work via xEventLoopSubmit().
Use xEventLoopRun() for the main loop. It handles timer dispatch and stop-flag checking automatically. Only use xEventWait() directly if you need custom loop logic.
Cancel timers you no longer need. Uncancelled timers hold memory until they fire. Use xEventLoopTimerCancel() to free them early.
Be aware of the poll backend's edge emulation. On systems without kqueue or epoll, the poll backend clears the event mask after dispatch. You must call xEventMod() to re-arm.

Comparison with Other Libraries

Feature	xbase event.h	libevent	libev	libuv
Trigger Mode	Edge-triggered only	Level (default), edge optional	Level + edge	Level-triggered
Backends	kqueue, epoll, poll	kqueue, epoll, poll, select, devpoll, IOCP	kqueue, epoll, poll, select, port	kqueue, epoll, poll, IOCP
Timer Integration	Built-in min-heap	Separate timer API	Built-in	Built-in
Thread Pool	Built-in (`xEventLoopSubmit`)	None (external)	None (external)	Built-in (`uv_queue_work`)
Signal Handling	Self-pipe / EVFILT_SIGNAL	evsignal	ev_signal	uv_signal
API Style	Opaque handles, C99	Struct-based, C89	Struct-based, C89	Handle-based, C99
Binary Size	~15 KB	~200 KB	~50 KB	~500 KB
Dependencies	None	None	None	None
Windows Support	Not yet	Yes (IOCP)	Yes (select)	Yes (IOCP)
Design Goal	Minimal building block	Full-featured framework	Minimal + performant	Cross-platform framework

Key Differentiator: xbase's event loop is intentionally minimal — it provides the essential primitives (I/O, timers, signals, thread-pool offload) without buffered I/O, DNS resolution, or HTTP parsing. This makes it ideal as a foundation layer for higher-level libraries (like xhttp) rather than a standalone application framework.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2), kqueue backend. Source: xbase/event_bench.cpp

Benchmark	Time (ns)	CPU (ns)	Iterations
`BM_EventLoop_CreateDestroy`	2,663	2,663	264,113
`BM_EventLoop_WakeLatency`	854	854	814,901
`BM_EventLoop_PipeAddDel`	1,107	1,107	627,088

Key Observations:

Create/Destroy takes ~2.7µs, reflecting the cost of kqueue fd creation and internal structure allocation. Acceptable for long-lived event loops.
Wake latency is ~854ns per wake+wait cycle, demonstrating efficient cross-thread notification via the internal wake mechanism.
Add/Del cycle (register + unregister a pipe fd) takes ~1.1µs, showing low overhead for dynamic fd management — important for short-lived connections.

timer.h — Monotonic Timer

Introduction

timer.h provides a standalone monotonic timer that schedules callbacks to fire after a delay or at an absolute time. It supports two fire modes — Push mode (dispatch to a thread pool) and Poll mode (enqueue to a lock-free MPSC queue for caller-driven execution) — making it suitable for both multi-threaded and single-threaded architectures.

Note: For timers integrated directly into an event loop, see xEventLoopTimerAfter() / xEventLoopTimerAt() in event.h. The standalone timer.h is useful when you need timers without an event loop, or when you want explicit control over which thread executes the callbacks.

Design Philosophy

Dual Fire Modes — Push mode hands expired callbacks to a thread pool for concurrent execution; Poll mode queues them for the caller to drain synchronously. This lets latency-sensitive code (e.g., an event loop) avoid thread-switch overhead by polling, while background services can use push mode for simplicity.
Dedicated Timer Thread — Each xTimer instance spawns one background thread that sleeps on a condition variable, waking only when the earliest deadline arrives or a new entry is submitted. This avoids busy-waiting and keeps CPU usage near zero when idle.
Min-Heap for O(log n) Scheduling — Timer entries are stored in a min-heap ordered by deadline. Insert, cancel, and fire-next are all O(log n). The heap is provided by heap.h.
Lock-Free Poll Queue — In poll mode, expired entries are pushed onto an intrusive MPSC queue (mpsc.h) without holding the mutex, minimizing contention between the timer thread and the polling thread.

Architecture

sequenceDiagram
    participant App
    participant Timer as xTimer
    participant Thread as Timer Thread
    participant Heap as Min-Heap
    participant Queue as MPSC Queue

    App->>Timer: xTimerCreate(group)
    Timer->>Thread: spawn

    App->>Timer: xTimerSubmitAfter(fn, 1000ms)
    Timer->>Heap: push(entry)
    Timer->>Thread: signal(cond)

    Thread->>Heap: peek → deadline
    Note over Thread: sleep until deadline

    Thread->>Heap: pop(entry)
    alt Push Mode
        Thread->>App: xTaskSubmit(fn)
    else Poll Mode
        Thread->>Queue: xMpscPush(entry)
        App->>Queue: xTimerPoll()
        Queue-->>App: callback(arg)
    end

Implementation Details

Internal Structure

struct xTimerTask_ {
    xMpsc        node;       // Intrusive MPSC node (poll mode)
    uint64_t     deadline;   // Absolute expiry time (CLOCK_MONOTONIC, ms)
    xTimerFunc   fn;         // User callback
    void        *arg;        // User argument
    size_t       heap_idx;   // Position in min-heap (TIMER_INVALID_IDX when not in heap)
    int          cancelled;  // Set to 1 under mutex before removal
};

struct xTimer_ {
    xHeap            heap;      // Min-heap ordered by deadline
    xTaskGroup       group;     // Non-NULL → push mode; NULL → poll mode
    xMpsc           *mq_head;   // Poll-mode MPSC queue head
    xMpsc           *mq_tail;   // Poll-mode MPSC queue tail
    pthread_t        thread;    // Background timer thread
    pthread_mutex_t  mu;        // Protects heap and stopped flag
    pthread_cond_t   cond;      // Wakes timer thread on new entry or stop
    int              stopped;   // Shutdown flag
};

Timer Thread Loop

The background thread follows this algorithm:

Wait — If the heap is empty, block on pthread_cond_wait().
Check top — Peek at the minimum-deadline entry.
Fire or sleep — If deadline ≤ now, pop and fire. Otherwise, pthread_cond_timedwait() until the deadline or a new signal.
Repeat until stopped is set.

When a new entry is submitted, pthread_cond_signal() wakes the thread so it can re-evaluate whether the new entry has an earlier deadline.

Push vs. Poll Mode

graph LR
    subgraph "Push Mode (group != NULL)"
        HEAP_P["Min-Heap"] -->|"pop expired"| FIRE_P["fire()"]
        FIRE_P -->|"xTaskSubmit"| POOL["Thread Pool"]
        POOL -->|"execute"| CB_P["callback(arg)"]
    end

    subgraph "Poll Mode (group == NULL)"
        HEAP_Q["Min-Heap"] -->|"pop expired"| FIRE_Q["fire()"]
        FIRE_Q -->|"xMpscPush"| MPSC["MPSC Queue"]
        MPSC -->|"xTimerPoll()"| CB_Q["callback(arg)"]
    end

    style POOL fill:#4a90d9,color:#fff
    style MPSC fill:#f5a623,color:#fff

Cancellation

xTimerCancel() acquires the mutex, checks if the entry is still in the heap (not already fired or cancelled), removes it via xHeapRemove(), marks it cancelled, and frees the memory. If the entry has already fired, xErrno_Cancelled is returned.

Memory Ownership

Push mode: The timer thread transfers ownership of the xTimerTask_ to the worker thread via xTaskSubmit(). The worker frees it after executing the callback.
Poll mode: The timer thread pushes the entry to the MPSC queue. xTimerPoll() pops and frees each entry after executing its callback.
Cancellation: The caller frees the entry immediately.
Destroy: Remaining heap entries and poll-queue entries are freed without firing.

API Reference

Types

Type	Description
`xTimerFunc`	`void ()(void arg)` — Timer callback signature
`xTimer`	Opaque handle to a timer instance
`xTimerTask`	Opaque handle to a submitted timer entry

Functions

Function	Signature	Description	Thread Safety
`xTimerCreate`	`xTimer xTimerCreate(xTaskGroup g)`	Create a timer. `g != NULL` → push mode, `g == NULL` → poll mode.	Not thread-safe
`xTimerDestroy`	`void xTimerDestroy(xTimer t)`	Stop the timer thread and free all resources. Pending entries are discarded.	Not thread-safe
`xTimerSubmitAfter`	`xTimerTask xTimerSubmitAfter(xTimer t, xTimerFunc fn, void *arg, uint64_t delay_ms)`	Schedule a callback after a relative delay.	Thread-safe
`xTimerSubmitAt`	`xTimerTask xTimerSubmitAt(xTimer t, xTimerFunc fn, void *arg, uint64_t abs_ms)`	Schedule a callback at an absolute monotonic time.	Thread-safe
`xTimerCancel`	`xErrno xTimerCancel(xTimer t, xTimerTask task)`	Cancel a pending entry. Returns `xErrno_Ok` if cancelled, `xErrno_Cancelled` if already fired.	Thread-safe
`xTimerPoll`	`int xTimerPoll(xTimer t)`	Execute all due callbacks (poll mode only). Returns count. No-op in push mode.	Not thread-safe
~~`xTimerNowMs`~~	`uint64_t xTimerNowMs(void)`	Deprecated. Use `xMonoMs()` from `<xbase/time.h>`.	Thread-safe

Usage Examples

Push Mode (Thread Pool Dispatch)

#include <stdio.h>
#include <xbase/timer.h>
#include <xbase/task.h>
#include <unistd.h>

static void on_timeout(void *arg) {
    printf("Timer fired on worker thread! arg=%p\n", arg);
}

int main(void) {
    xTaskGroup group = xTaskGroupCreate(NULL);
    xTimer timer = xTimerCreate(group);

    // Fire after 500ms on a worker thread
    xTimerSubmitAfter(timer, on_timeout, NULL, 500);

    sleep(1); // Wait for timer to fire

    xTimerDestroy(timer);
    xTaskGroupDestroy(group);
    return 0;
}

Poll Mode (Event Loop Integration)

#include <stdio.h>
#include <xbase/timer.h>
#include <xbase/time.h>

static void on_timeout(void *arg) {
    int *count = (int *)arg;
    printf("Timer #%d fired on caller thread\n", ++(*count));
}

int main(void) {
    xTimer timer = xTimerCreate(NULL); // Poll mode
    int count = 0;

    // Schedule 3 timers
    xTimerSubmitAfter(timer, on_timeout, &count, 100);
    xTimerSubmitAfter(timer, on_timeout, &count, 200);
    xTimerSubmitAfter(timer, on_timeout, &count, 300);

    // Poll loop
    uint64_t start = xMonoMs();
    while (xMonoMs() - start < 500) {
        int n = xTimerPoll(timer);
        if (n > 0) printf("  Polled %d timer(s)\n", n);
        usleep(10000); // 10ms
    }

    xTimerDestroy(timer);
    return 0;
}

Use Cases

Event Loop Timer Backend — The event loop's builtin timers (xEventLoopTimerAfter) use the same min-heap approach internally. Use standalone xTimer when you need timers independent of an event loop.
Retry / Backoff Logic — Schedule retries with exponential backoff using xTimerSubmitAfter(). Cancel pending retries with xTimerCancel() when a response arrives.
Periodic Health Checks — In poll mode, integrate xTimerPoll() into your main loop to execute periodic health checks without spawning additional threads.

Best Practices

Choose the right mode. Use push mode when callbacks are independent and can run concurrently. Use poll mode when callbacks must run on a specific thread (e.g., the event loop thread) or when you want to avoid thread-switch latency.
Don't use the handle after fire or cancel. Once a timer entry fires or is cancelled, the memory is freed. Accessing the handle is undefined behavior.
Destroy before the task group. If using push mode, destroy the timer before destroying the task group to ensure all in-flight callbacks complete.
Prefer xEventLoopTimerAfter() when using an event loop. It avoids the overhead of a separate timer thread and integrates seamlessly with I/O dispatch.

Comparison with Other Libraries

Feature	xbase timer.h	timerfd (Linux)	POSIX timer (`timer_create`)	libuv `uv_timer`
Platform	macOS + Linux	Linux only	POSIX (varies)	Cross-platform
Fire Mode	Push (thread pool) or Poll (MPSC)	fd-based (integrates with epoll)	Signal or thread	Event loop callback
Resolution	Millisecond (CLOCK_MONOTONIC)	Nanosecond	Nanosecond	Millisecond
Data Structure	Min-heap (O(log n))	Kernel-managed	Kernel-managed	Min-heap
Thread Safety	Submit/Cancel are thread-safe	fd operations are thread-safe	Varies	Not thread-safe
Cancellation	O(log n) via heap index	`timerfd_settime(0)`	`timer_delete()`	`uv_timer_stop()`
Overhead	1 background thread per xTimer	1 fd per timer	1 kernel timer per instance	Shared with event loop
Dependencies	heap.h, mpsc.h, task.h	Linux kernel	POSIX RT library	libuv

Key Differentiator: xbase's timer provides a unique dual-mode design (push/poll) that lets you choose between concurrent execution and single-threaded polling without changing your callback code. The poll mode's lock-free MPSC queue makes it ideal for integration with custom event loops.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbase/timer_bench.cpp

Benchmark	N	Time (ns)	CPU (ns)	Throughput
`BM_Timer_SubmitCancel`	—	149	121	—
`BM_Timer_SubmitBatch`	10	1,811	1,687	5.9 M items/s
`BM_Timer_SubmitBatch`	100	11,474	9,406	10.6 M items/s
`BM_Timer_SubmitBatch`	1,000	110,112	86,699	11.5 M items/s
`BM_Timer_FirePoll`	10	3,395	3,394	2.9 M items/s
`BM_Timer_FirePoll`	100	16,897	15,534	6.4 M items/s
`BM_Timer_FirePoll`	1,000	120,411	101,190	9.9 M items/s

Key Observations:

Submit+Cancel cycle takes ~121ns CPU time, reflecting the cost of one heap push + one heap remove. Fast enough for high-frequency timer management.
Batch submit throughput improves with batch size (5.9M → 11.5M items/s), showing good amortization of per-operation overhead.
Fire+Poll is slower than submit alone because it includes the MPSC queue transfer and callback invocation. At N=1000, it still achieves ~10M timer fires/s.

task.h — N:M Task Model

Introduction

task.h provides a lightweight N:M concurrent task model where N user tasks are multiplexed onto M OS threads managed by a task group (thread pool). It supports lazy thread creation, configurable queue capacity, per-task result retrieval, and a global shared task group for convenience.

Design Philosophy

Lazy Thread Spawning — Worker threads are created on-demand when tasks are submitted and no idle thread is available, up to the configured maximum. This avoids pre-allocating threads that may never be used, reducing resource consumption for bursty workloads.
Simple Submit/Wait Model — Tasks are submitted with xTaskSubmit() and optionally awaited with xTaskWait(). This mirrors the future/promise pattern found in higher-level languages, but in pure C with minimal overhead.
Configurable Capacity — The task group can be configured with a maximum thread count and queue capacity. When the queue is full, xTaskSubmit() returns NULL, giving the caller explicit backpressure.
Global Shared Group — xTaskGroupGlobal() provides a lazily-initialized, process-wide task group with default settings (unlimited threads, no queue cap). It's automatically destroyed at atexit(), making it convenient for fire-and-forget usage.

Architecture

graph TD
    subgraph "Task Group"
        QUEUE["Task Queue (FIFO)"]
        W1["Worker Thread 1"]
        W2["Worker Thread 2"]
        WN["Worker Thread N"]
    end

    APP["Application"] -->|"xTaskSubmit()"| QUEUE
    QUEUE -->|"dequeue"| W1
    QUEUE -->|"dequeue"| W2
    QUEUE -->|"dequeue"| WN

    W1 -->|"done"| RESULT["xTaskWait() → result"]
    W2 -->|"done"| RESULT
    WN -->|"done"| RESULT

    style APP fill:#4a90d9,color:#fff
    style QUEUE fill:#f5a623,color:#fff
    style RESULT fill:#50b86c,color:#fff

Implementation Details

Internal Structure

struct xTask_ {
    xTaskFunc       fn;       // User function
    void           *arg;      // User argument
    pthread_mutex_t lock;     // Protects done/result
    pthread_cond_t  cond;     // Signals completion
    bool            done;     // Completion flag
    void           *result;   // Return value of fn
    struct xTask_  *next;     // Intrusive queue linkage
};

struct xTaskGroup_ {
    pthread_t      *workers;      // Dynamic array of worker threads
    size_t          max_threads;  // Upper bound (SIZE_MAX if unlimited)
    size_t          nthreads;     // Currently spawned threads
    pthread_mutex_t qlock;        // Protects the task queue
    pthread_cond_t  qcond;        // Wakes idle workers
    struct xTask_  *qhead, *qtail; // FIFO task queue
    size_t          qsize, qcap;  // Current size and capacity
    size_t          idle;         // Number of idle workers
    atomic_size_t   pending;      // Submitted - finished
    atomic_size_t   done_count;   // Tasks completed
    pthread_cond_t  wcond;        // Dedicated cond for xTaskGroupWait()
    bool            shutdown;     // Shutdown flag
};

Worker Loop

Each worker thread runs worker_loop():

Acquire lock and increment idle count.
Wait on qcond while the queue is empty and not shutting down.
Dequeue one task, decrement idle.
Execute task->fn(task->arg).
Signal completion via pthread_cond_broadcast(&task->cond).
Update counters — decrement pending, signal wcond if all tasks are done.

Task Submission Flow

flowchart TD
    SUBMIT["xTaskSubmit(group, fn, arg)"]
    CHECK_CAP{"Queue full?"}
    ENQUEUE["Enqueue task"]
    CHECK_IDLE{"Idle workers > 0?"}
    SIGNAL["Signal qcond"]
    CHECK_MAX{"nthreads < max?"}
    SPAWN["Spawn new worker"]
    DONE["Return task handle"]
    FAIL["Return NULL"]

    SUBMIT --> CHECK_CAP
    CHECK_CAP -->|Yes| FAIL
    CHECK_CAP -->|No| ENQUEUE
    ENQUEUE --> CHECK_IDLE
    CHECK_IDLE -->|Yes| SIGNAL
    CHECK_IDLE -->|No| CHECK_MAX
    CHECK_MAX -->|Yes| SPAWN
    CHECK_MAX -->|No| DONE
    SPAWN --> SIGNAL
    SIGNAL --> DONE

    style SUBMIT fill:#4a90d9,color:#fff
    style FAIL fill:#e74c3c,color:#fff
    style DONE fill:#50b86c,color:#fff

Separate Wait Conditions

The implementation uses two separate condition variables:

qcond — Wakes idle workers when a new task arrives.
wcond — Wakes xTaskGroupWait() callers when all tasks complete.

Using a single condition variable caused lost wakeups: pthread_cond_signal() could wake an idle worker instead of the GroupWait caller, leaving it blocked forever.

Global Task Group

xTaskGroupGlobal() uses pthread_once for thread-safe lazy initialization. The group is registered with atexit() for automatic cleanup. It uses default configuration (unlimited threads, no queue cap).

API Reference

Types

Type	Description
`xTaskFunc`	`void ()(void *arg)` — Task function signature. Returns a result pointer.
`xTask`	Opaque handle to a submitted task
`xTaskGroup`	Opaque handle to a task group (thread pool)
`xTaskGroupConf`	Configuration struct: `nthreads` (0 = auto), `queue_cap` (0 = unbounded)

Functions

Function	Signature	Description	Thread Safety
`xTaskGroupCreate`	`xTaskGroup xTaskGroupCreate(const xTaskGroupConf *conf)`	Create a task group. NULL conf = defaults.	Not thread-safe
`xTaskGroupDestroy`	`void xTaskGroupDestroy(xTaskGroup g)`	Wait for pending tasks, then destroy.	Not thread-safe
`xTaskSubmit`	`xTask xTaskSubmit(xTaskGroup g, xTaskFunc fn, void *arg)`	Submit a task. Returns NULL if queue is full.	Thread-safe
`xTaskWait`	`xErrno xTaskWait(xTask t, void **result)`	Block until task completes. Frees the task handle.	Thread-safe
`xTaskGroupWait`	`xErrno xTaskGroupWait(xTaskGroup g)`	Block until all pending tasks complete.	Thread-safe
`xTaskGroupThreads`	`size_t xTaskGroupThreads(xTaskGroup g)`	Return number of spawned worker threads.	Thread-safe (atomic read)
`xTaskGroupPending`	`size_t xTaskGroupPending(xTaskGroup g)`	Return number of pending tasks.	Thread-safe (atomic read)
`xTaskGroupGlobal`	`xTaskGroup xTaskGroupGlobal(void)`	Get the global shared task group (lazy init).	Thread-safe

Usage Examples

Basic Task Submission

#include <stdio.h>
#include <xbase/task.h>

static void *compute(void *arg) {
    int *val = (int *)arg;
    *val *= 2;
    return val;
}

int main(void) {
    xTaskGroup group = xTaskGroupCreate(NULL);

    int value = 21;
    xTask task = xTaskSubmit(group, compute, &value);

    void *result;
    xTaskWait(task, &result);
    printf("Result: %d\n", *(int *)result); // 42

    xTaskGroupDestroy(group);
    return 0;
}

Parallel Map

#include <stdio.h>
#include <xbase/task.h>

#define N 8

static void *square(void *arg) {
    int *val = (int *)arg;
    *val = (*val) * (*val);
    return val;
}

int main(void) {
    xTaskGroupConf conf = { .nthreads = 4, .queue_cap = 0 };
    xTaskGroup group = xTaskGroupCreate(&conf);

    int data[N] = {1, 2, 3, 4, 5, 6, 7, 8};
    xTask tasks[N];

    for (int i = 0; i < N; i++)
        tasks[i] = xTaskSubmit(group, square, &data[i]);

    // Wait for all
    xTaskGroupWait(group);

    for (int i = 0; i < N; i++)
        printf("data[%d] = %d\n", i, data[i]);

    // Clean up task handles
    for (int i = 0; i < N; i++)
        xTaskWait(tasks[i], NULL);

    xTaskGroupDestroy(group);
    return 0;
}

Using the Global Task Group

#include <stdio.h>
#include <xbase/task.h>

static void *work(void *arg) {
    printf("Running on global pool: %s\n", (char *)arg);
    return NULL;
}

int main(void) {
    xTask t = xTaskSubmit(xTaskGroupGlobal(), work, "hello");
    xTaskWait(t, NULL);
    // No need to destroy the global group
    return 0;
}

Use Cases

CPU-Bound Parallel Processing — Distribute computation across multiple cores. Use xTaskGroupWait() to synchronize at barriers.
Event Loop Offload — The event loop's xEventLoopSubmit() uses xTaskGroup internally to run work functions on worker threads, then delivers results back to the loop thread.
Background I/O — Offload blocking file I/O (e.g., fsync, large reads) to a thread pool to keep the main thread responsive.

Best Practices

Always call xTaskWait() or let xTaskGroupDestroy() clean up. Each xTaskSubmit() allocates a task struct with a mutex and condvar. xTaskWait() frees them. Leaking task handles leaks resources.
Set queue_cap for backpressure. Without a cap, unbounded submission can exhaust memory. A bounded queue lets you detect overload via NULL returns from xTaskSubmit().
Don't destroy the global group. xTaskGroupGlobal() is managed internally and destroyed at atexit(). Passing it to xTaskGroupDestroy() is undefined behavior.
Use xTaskGroupWait() for barriers, not busy-polling. It uses a dedicated condition variable and blocks efficiently.

Comparison with Other Libraries

Feature	xbase task.h	pthread	C11 threads	GCD (libdispatch)
Abstraction	Task (submit/wait)	Thread (create/join)	Thread (create/join)	Block (dispatch_async)
Thread Management	Automatic (lazy spawn)	Manual	Manual	Automatic
Queue	Built-in FIFO with cap	N/A	N/A	Built-in (serial/concurrent)
Result Retrieval	`xTaskWait(t, &result)`	`pthread_join(t, &result)`	`thrd_join(t, &result)`	Completion handler
Group Wait	`xTaskGroupWait()`	Manual barrier	Manual barrier	`dispatch_group_wait()`
Backpressure	`queue_cap` → NULL on full	N/A	N/A	N/A (unbounded)
Global Pool	`xTaskGroupGlobal()`	N/A	N/A	`dispatch_get_global_queue()`
Platform	macOS + Linux	POSIX	C11	macOS + Linux (via libdispatch)
Dependencies	pthread	OS	OS	OS / libdispatch

Key Differentiator: xbase's task model provides a simple, portable thread pool with lazy spawning and explicit backpressure — features that require significant boilerplate with raw pthreads. Unlike GCD, it gives you direct control over thread count and queue capacity.

memory.h — Reference-Counted Memory Management

Introduction

memory.h provides a vtable-driven, reference-counted memory management system for C. It enables object lifecycle management (construction, destruction, retain, release, copy, move) through a virtual table pattern, bringing RAII-like semantics to pure C. The XMALLOC(T) macro allocates an object with an embedded header that tracks the reference count and vtable pointer.

Design Philosophy

vtable-Driven Lifecycle — Each object type defines a static xVTable with optional function pointers for ctor, dtor, retain, release, copy, and move. This decouples lifecycle logic from the allocation mechanism, similar to C++ virtual destructors or Objective-C's class methods.
Hidden Header Pattern — A Header struct is prepended to every allocation, storing the type name (for debugging), size, reference count, and vtable pointer. The user receives a pointer past the header, so the header is invisible to normal usage.
Atomic Reference Counting — xRetain() and xRelease() use atomic operations (__ATOMIC_SEQ_CST) to safely manage reference counts across threads. When the count reaches zero, the destructor is called and memory is freed.
Macro Convenience — XMALLOC(T) and XMALLOCEX(T, sz) macros generate the correct xAlloc() call with the type name string, size, and vtable pointer, reducing boilerplate.

Architecture

graph TD
    MACRO["XMALLOC(T) / XMALLOCEX(T, sz)"]
    ALLOC["xAlloc(name, size, count, vtab)"]
    HEADER["Header + Object"]
    RETAIN["xRetain(ptr)<br/>atomic refs++"]
    RELEASE["xRelease(ptr)<br/>atomic refs--"]
    FREE["xFree(ptr)<br/>dtor + free"]
    COPY["xCopy(ptr, other)"]
    MOVE["xMove(ptr, other)"]

    MACRO --> ALLOC
    ALLOC --> HEADER
    HEADER --> RETAIN
    HEADER --> RELEASE
    RELEASE -->|"refs == 0"| FREE
    HEADER --> COPY
    HEADER --> MOVE

    style MACRO fill:#4a90d9,color:#fff
    style RELEASE fill:#e74c3c,color:#fff
    style FREE fill:#e74c3c,color:#fff

Implementation Details

Memory Layout

graph LR
    subgraph "malloc'd block"
        HDR["Header<br/>name | size | refs | vtab"]
        OBJ["User Object<br/>(sizeof(T) bytes)"]
        EXTRA["Extra bytes<br/>(XMALLOCEX only)"]
    end

    PTR["xAlloc() returns →"] --> OBJ

    style HDR fill:#f5a623,color:#fff
    style OBJ fill:#4a90d9,color:#fff
    style EXTRA fill:#50b86c,color:#fff

The actual memory layout:

┌──────────────────────────────────────────────────────┐
│ Header (hidden)                                      │
│   const char *name   — type name string (e.g. "Foo") │
│   size_t      size   — sizeof(T)                     │
│   size_t      refs   — reference count (starts at 1) │
│   xVTable    *vtab   — pointer to static vtable      │
├──────────────────────────────────────────────────────┤
│ User Object (returned pointer)                       │
│   T fields...                                        │
│   [optional extra bytes from XMALLOCEX]              │
└──────────────────────────────────────────────────────┘

XMALLOC / XMALLOCEX Macro Expansion

// Given:
typedef struct Foo Foo;
struct Foo { int x; char buf[]; };

XDEF_VTABLE(Foo) { .ctor = FooCtor, .dtor = FooDtor };
XDEF_CTOR(Foo) { self->x = 0; }
XDEF_DTOR(Foo) { /* cleanup */ }

// XMALLOC(Foo) expands to:
(Foo *)xAlloc("Foo", sizeof(Foo), 1, &FooVTable)

// XMALLOCEX(Foo, 128) expands to:
(Foo *)xAlloc("Foo", sizeof(Foo) + 128, 1, &FooVTable)

Reference Count Lifecycle

sequenceDiagram
    participant App
    participant Alloc as xAlloc
    participant Header
    participant VTable

    App->>Alloc: XMALLOC(Foo)
    Alloc->>Header: malloc(sizeof(Header) + sizeof(Foo))
    Alloc->>Header: refs = 1
    Alloc->>VTable: vtab->ctor(ptr)
    Alloc-->>App: Foo *ptr

    App->>Header: xRetain(ptr) → refs = 2
    App->>Header: xRelease(ptr) → refs = 1
    App->>Header: xRelease(ptr) → refs = 0
    Header->>VTable: vtab->release(ptr)
    Header->>VTable: vtab->dtor(ptr)
    Header->>Header: free(hdr)

Thread Safety

xRetain() and xRelease() are thread-safe — they use xAtomicAdd / xAtomicSub with sequential consistency ordering.
xAlloc(), xFree(), xCopy(), and xMove() are not thread-safe — they should be called from a single owner or with external synchronization.

API Reference

Macros

Macro	Expansion	Description
`XDEF_VTABLE(T)`	`static xVTable TVTable =`	Define a static vtable for type T
`XDEF_CTOR(T)`	`static void TCtor(T *self)`	Define a constructor for type T
`XDEF_DTOR(T)`	`static void TDtor(T *self)`	Define a destructor for type T
`XMALLOC(T)`	`(T *)xAlloc("T", sizeof(T), 1, &TVTable)`	Allocate one T with vtable
`XMALLOCEX(T, sz)`	`(T *)xAlloc("T", sizeof(T) + sz, 1, &TVTable)`	Allocate T + extra bytes

Types

Type	Description
`xVTable`	Struct with function pointers: `ctor`, `dtor`, `retain`, `release`, `copy`, `move`

Functions

Function	Signature	Description	Thread Safety
`xAlloc`	`void xAlloc(const char name, size_t size, size_t count, xVTable *vtab)`	Allocate object(s) with header and call ctor.	Not thread-safe
`xFree`	`void xFree(void *ptr)`	Call dtor and free. Ignores NULL.	Not thread-safe
`xRetain`	`void xRetain(void *ptr)`	Increment reference count atomically. Calls vtab->retain if set.	Thread-safe
`xRelease`	`void xRelease(void *ptr)`	Decrement reference count atomically. Calls vtab->release then xFree when refs reach 0.	Thread-safe
`xCopy`	`void xCopy(void ptr, void other)`	Call vtab->copy if set.	Not thread-safe
`xMove`	`void xMove(void ptr, void other)`	Call vtab->move if set.	Not thread-safe

Usage Examples

Basic Object with Constructor/Destructor

#include <stdio.h>
#include <string.h>
#include <xbase/memory.h>

typedef struct Connection Connection;
struct Connection {
    int fd;
    char host[256];
};

XDEF_CTOR(Connection) {
    self->fd = -1;
    memset(self->host, 0, sizeof(self->host));
    printf("Connection created\n");
}

XDEF_DTOR(Connection) {
    if (self->fd >= 0) {
        // close(self->fd);
        printf("Connection closed (fd=%d)\n", self->fd);
    }
}

XDEF_VTABLE(Connection) {
    .ctor = ConnectionCtor,
    .dtor = ConnectionDtor,
};

int main(void) {
    Connection *conn = XMALLOC(Connection);
    conn->fd = 42;
    strcpy(conn->host, "example.com");

    xRetain(conn);   // refs = 2
    xRelease(conn);  // refs = 1
    xRelease(conn);  // refs = 0 → dtor called → freed

    return 0;
}

Flexible Array Member with XMALLOCEX

#include <stdio.h>
#include <string.h>
#include <xbase/memory.h>

typedef struct Buffer Buffer;
struct Buffer {
    size_t len;
    char   data[];  // flexible array member
};

XDEF_CTOR(Buffer) { self->len = 0; }
XDEF_DTOR(Buffer) { /* nothing to clean up */ }
XDEF_VTABLE(Buffer) { .ctor = BufferCtor, .dtor = BufferDtor };

int main(void) {
    // Allocate Buffer + 1024 extra bytes for data[]
    Buffer *buf = XMALLOCEX(Buffer, 1024);

    memcpy(buf->data, "Hello, xKit!", 12);
    buf->len = 12;

    printf("Buffer: %.*s\n", (int)buf->len, buf->data);

    xRelease(buf); // refs 1 → 0 → freed
    return 0;
}

Use Cases

Shared Ownership — Multiple components hold references to the same object (e.g., a connection shared between a reader and a writer). xRetain/xRelease ensures the object is freed only when the last reference is dropped.
Plugin/Extension Objects — Define vtables for different object types that share a common interface. The vtable pattern enables polymorphic behavior in C.
Debug-Friendly Allocation — The name field in the header enables allocation tracking and leak detection by type name.

Best Practices

Always pair xRetain with xRelease. Every retain must have a corresponding release, or you'll leak memory.
Use XMALLOC instead of raw xAlloc. The macro handles type name, size, and vtable automatically.
Set unused vtable fields to NULL. The implementation checks for NULL before calling each vtable function.
Don't mix with free(). Objects allocated with xAlloc have a hidden header. Calling free() directly on the user pointer corrupts the heap.
Use XMALLOCEX for flexible array members. It adds extra bytes after the struct for variable-length data.

Comparison with Other Libraries

Feature	xbase memory.h	C++ RAII	Objective-C ARC	GLib GObject
Mechanism	vtable + atomic refcount	Destructor + smart pointers	Compiler-inserted retain/release	GType + refcount
Automation	Manual retain/release	Automatic (scope-based)	Automatic (compiler)	Manual ref/unref
Thread Safety	Atomic refcount	`shared_ptr` is atomic	Atomic	Atomic
Polymorphism	vtable function pointers	Virtual functions	Method dispatch	Signal/slot + vtable
Overhead	1 header per object (~32 bytes)	0 (stack) or control block	1 isa pointer + refcount	Large (GTypeInstance)
Flexible Arrays	`XMALLOCEX(T, sz)`	`std::vector`	`NSMutableData`	`GArray`
Debug Info	Type name in header	RTTI	Class name	GType name
Language	C99	C++	Objective-C	C (with macros)

Key Differentiator: xbase's memory system brings reference-counted lifecycle management to C with minimal overhead — just a 32-byte header per object. The vtable pattern provides extensibility (custom ctor/dtor/copy/move) without requiring a complex type system like GObject.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbase/memory_bench.cpp

Benchmark	Size (bytes)	Time (ns)	CPU (ns)	Iterations
`BM_Memory_XAlloc`	16	23.3	23.3	29,809,940
`BM_Memory_XAlloc`	64	21.1	21.1	32,551,024
`BM_Memory_XAlloc`	256	22.4	22.4	31,207,508
`BM_Memory_XAlloc`	1,024	20.1	20.1	34,024,352
`BM_Memory_XAlloc`	4,096	24.2	24.2	29,002,681
`BM_Memory_Malloc`	16	17.5	17.5	39,883,995
`BM_Memory_Malloc`	64	18.7	18.7	37,576,831
`BM_Memory_Malloc`	256	19.0	19.0	34,505,536
`BM_Memory_Malloc`	1,024	23.0	23.0	30,557,144
`BM_Memory_Malloc`	4,096	17.7	17.7	39,849,483
`BM_Memory_RetainRelease`	—	3.90	3.90	183,068,277

Key Observations:

xAlloc vs malloc overhead is only ~3–5ns across all sizes. The extra cost covers header initialization, vtable setup, and constructor invocation — negligible for most workloads.
Retain/Release cycle takes ~3.9ns, dominated by the atomic increment/decrement. This is fast enough for hot-path reference counting.
Allocation time is nearly constant across sizes (16B–4KB), confirming that the overhead is in the header management, not the underlying malloc.

error.h — Unified Error Codes

Introduction

error.h defines a unified set of error codes (xErrno) used throughout xKit. Every function that can fail returns an xErrno value, providing a consistent error handling pattern across all modules. The companion function xstrerror() converts error codes to human-readable strings for logging and debugging.

Design Philosophy

Single Error Enum — All xKit modules share one error code enum, avoiding the confusion of module-specific error types. This makes error handling uniform: check for xErrno_Ok everywhere.
Descriptive Codes — Each error code maps to a specific failure category (invalid argument, out of memory, wrong state, etc.), giving callers enough information to decide how to handle the error without inspecting errno or platform-specific codes.
Human-Readable Messages — xstrerror() returns a static string for each code, suitable for direct inclusion in log messages. It never returns NULL.

Architecture

graph LR
    MODULES["All xKit Modules"] -->|"return"| ERRNO["xErrno"]
    ERRNO -->|"xstrerror()"| MSG["Human-readable string"]
    MSG -->|"xLog()"| LOG["Log output"]

    style ERRNO fill:#4a90d9,color:#fff
    style MSG fill:#50b86c,color:#fff

Implementation Details

Error Code Values

The error codes are defined as an int-based enum (via XDEF_ENUM), starting from 0:

Code	Value	Meaning
`xErrno_Ok`	0	Success
`xErrno_Unknown`	1	Unspecified error (legacy / catch-all)
`xErrno_InvalidArg`	2	NULL or invalid argument
`xErrno_NoMemory`	3	Memory allocation failed
`xErrno_InvalidState`	4	Object is in the wrong state for this call
`xErrno_SysError`	5	Underlying syscall / OS error
`xErrno_NotFound`	6	Requested item does not exist
`xErrno_AlreadyExists`	7	Item already registered / bound
`xErrno_Cancelled`	8	Operation was cancelled

Usage Pattern

The idiomatic xKit error handling pattern:

xErrno err = xSomeFunction(args);
if (err != xErrno_Ok) {
    xLog(false, "operation failed: %s", xstrerror(err));
    return err; // propagate
}

Internal Usage

xErrno is used by:

event.h — xEventMod(), xEventDel(), xEventWake(), xEventLoopTimerCancel(), xEventLoopSubmit(), xEventLoopSignalWatch()
timer.h — xTimerCancel()
task.h — xTaskWait(), xTaskGroupWait()
socket.h — xSocketSetMask(), xSocketSetTimeout()
heap.h — xHeapPush(), xHeapUpdate()

API Reference

Types

Type	Description
`xErrno`	`int`-based enum of error codes

Enum Values

Value	Description
`xErrno_Ok`	Success
`xErrno_Unknown`	Unspecified error (legacy / catch-all)
`xErrno_InvalidArg`	NULL or invalid argument
`xErrno_NoMemory`	Memory allocation failed
`xErrno_InvalidState`	Object is in the wrong state for this call
`xErrno_SysError`	Underlying syscall / OS error
`xErrno_NotFound`	Requested item does not exist
`xErrno_AlreadyExists`	Item already registered / bound
`xErrno_Cancelled`	Operation was cancelled

Functions

Function	Signature	Description	Thread Safety
`xstrerror`	`const char *xstrerror(xErrno err)`	Return a human-readable error message. Never returns NULL.	Thread-safe (returns static strings)

Usage Examples

Error Handling Pattern

#include <stdio.h>
#include <xbase/error.h>
#include <xbase/event.h>

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    if (!loop) {
        fprintf(stderr, "Failed to create event loop\n");
        return 1;
    }

    xErrno err = xEventMod(loop, NULL, xEvent_Read);
    if (err != xErrno_Ok) {
        fprintf(stderr, "xEventMod failed: %s\n", xstrerror(err));
        // Output: "xEventMod failed: NULL or invalid argument"
    }

    xEventLoopDestroy(loop);
    return 0;
}

Propagating Errors

#include <xbase/error.h>
#include <xbase/socket.h>

xErrno setup_socket(xEventLoop loop, xSocket *out) {
    xSocket sock = xSocketCreate(loop, AF_INET, SOCK_STREAM, 0,
                                  xEvent_Read, my_callback, NULL);
    if (!sock) return xErrno_SysError;

    xErrno err = xSocketSetTimeout(sock, 5000, 0);
    if (err != xErrno_Ok) {
        xSocketDestroy(loop, sock);
        return err;
    }

    *out = sock;
    return xErrno_Ok;
}

Use Cases

Uniform Error Propagation — Functions return xErrno and callers check against xErrno_Ok. This eliminates the need for module-specific error types.
Logging and Diagnostics — xstrerror() provides instant human-readable messages for log output without maintaining separate message tables.
Error Classification — Callers can switch on specific error codes to implement different recovery strategies (e.g., retry on xErrno_SysError, abort on xErrno_NoMemory).

Best Practices

Always check return values. Functions that return xErrno should be checked. Functions that return handles (pointers) should be checked for NULL.
Use xstrerror() in log messages. It's more informative than printing the raw integer.
Don't compare against raw integers. Always use the enum constants (xErrno_Ok, xErrno_InvalidArg, etc.) for readability and forward compatibility.
Prefer specific codes over xErrno_Unknown. When adding new error paths, choose the most specific applicable code.

Comparison with Other Libraries

Feature	xbase error.h	POSIX errno	Windows HRESULT	GLib GError
Type	`int` enum	`int` (thread-local)	`LONG`	Struct (domain + code + message)
Scope	Library-wide	System-wide	System-wide	Per-domain
String Conversion	`xstrerror()`	`strerror()`	`FormatMessage()`	`g_error->message`
Thread Safety	Return value (inherently safe)	Thread-local global	Return value	Heap-allocated
Extensibility	Add to enum	Platform-defined	Facility codes	Custom domains
Overhead	Zero (int return)	Zero (thread-local)	Zero (int return)	Heap allocation per error

Key Differentiator: xbase's error system is intentionally simple — a single enum with descriptive codes and a string conversion function. It avoids the complexity of domain-based systems (GError) and the thread-local pitfalls of POSIX errno, while providing enough granularity for library-level error handling.

heap.h — Min-Heap

Introduction

heap.h provides a generic binary min-heap that stores opaque pointers and orders them via a user-supplied comparison function. Each element carries its heap index (maintained via a callback), enabling O(log n) removal and priority updates by index. It is the core data structure behind xbase's timer subsystem.

Design Philosophy

Generic via Function Pointers — The heap stores void * elements and uses a xHeapCmpFunc for ordering. This makes it reusable for any element type without code generation or macros.
Index Tracking — A xHeapSetIdxFunc callback notifies elements of their current position in the heap array. This enables O(1) lookup for xHeapRemove() and xHeapUpdate(), which would otherwise require O(n) search.
Dynamic Array Backend — The heap uses a dynamically-growing array (2x expansion) starting from a default capacity of 16. This provides cache-friendly access patterns and amortized O(1) growth.
No Element Ownership — The heap does not own the elements it stores. xHeapDestroy() frees the heap structure but NOT the elements. This gives the caller full control over element lifecycle.

Architecture

graph TD
    PUSH["xHeapPush(elem)"] --> APPEND["Append to data[size]"]
    APPEND --> SIFTUP["Sift Up"]
    SIFTUP --> NOTIFY["setidx(elem, new_idx)"]

    POP["xHeapPop()"] --> SWAP["Swap data[0] with data[size-1]"]
    SWAP --> SIFTDOWN["Sift Down from 0"]
    SIFTDOWN --> NOTIFY

    REMOVE["xHeapRemove(idx)"] --> SWAP2["Swap data[idx] with data[size-1]"]
    SWAP2 --> BOTH["Sift Up + Sift Down"]
    BOTH --> NOTIFY

    style PUSH fill:#4a90d9,color:#fff
    style POP fill:#f5a623,color:#fff
    style REMOVE fill:#e74c3c,color:#fff

Implementation Details

Data Structure

struct xHeap_ {
    void          **data;    // Dynamic array of element pointers
    size_t          size;    // Current number of elements
    size_t          cap;     // Allocated capacity
    xHeapCmpFunc    cmp;     // Comparison function
    xHeapSetIdxFunc setidx;  // Index notification callback
};

Array Layout

Index:  0     1     2     3     4     5     6
       [min] [  ] [  ] [  ] [  ] [  ] [  ]
        │     │    │
        │     ├────┤
        │     children of 0
        ├─────┤
        parent of 1,2

Parent of i:     (i - 1) / 2
Left child of i:  2 * i + 1
Right child of i: 2 * i + 2

Operations and Complexity

Operation	Function	Time Complexity	Description
Insert	`xHeapPush`	O(log n)	Append to end, sift up
Peek min	`xHeapPeek`	O(1)	Return `data[0]`
Extract min	`xHeapPop`	O(log n)	Swap with last, sift down
Remove by index	`xHeapRemove`	O(log n)	Swap with last, sift up + down
Update priority	`xHeapUpdate`	O(log n)	Sift up + down at index
Size	`xHeapSize`	O(1)	Return `size` field
Grow	`ensure_cap`	Amortized O(1)	2x realloc

Sift Operations

Sift Up — Compare element with parent; swap if smaller. Repeat until heap property is restored or root is reached.
Sift Down — Compare element with children; swap with the smallest child if it's smaller. Repeat until heap property is restored or a leaf is reached.

Remove by Index

xHeapRemove(h, idx) replaces the element at idx with the last element, then applies both sift-up and sift-down. This handles both cases: the replacement may be smaller (needs to go up) or larger (needs to go down) than its new neighbors.

API Reference

Types

Type	Description
`xHeapCmpFunc`	`int ()(const void a, const void *b)` — Returns negative if a < b, 0 if equal, positive if a > b
`xHeapSetIdxFunc`	`void ()(void elem, size_t idx)` — Called when an element's index changes
`xHeap`	Opaque handle to a min-heap

Functions

Function	Signature	Description	Thread Safety
`xHeapCreate`	`xHeap xHeapCreate(xHeapCmpFunc cmp, xHeapSetIdxFunc setidx, size_t cap)`	Create a heap. `cap = 0` uses default (16).	Not thread-safe
`xHeapDestroy`	`void xHeapDestroy(xHeap h)`	Free the heap. Does NOT free elements.	Not thread-safe
`xHeapPush`	`xErrno xHeapPush(xHeap h, void *elem)`	Insert an element. O(log n).	Not thread-safe
`xHeapPeek`	`void *xHeapPeek(xHeap h)`	Return the minimum element without removing. O(1).	Not thread-safe
`xHeapPop`	`void *xHeapPop(xHeap h)`	Remove and return the minimum element. O(log n).	Not thread-safe
`xHeapRemove`	`void *xHeapRemove(xHeap h, size_t idx)`	Remove element at index. O(log n).	Not thread-safe
`xHeapUpdate`	`xErrno xHeapUpdate(xHeap h, size_t idx)`	Re-heapify after priority change. O(log n).	Not thread-safe
`xHeapSize`	`size_t xHeapSize(xHeap h)`	Return element count. O(1).	Not thread-safe

Usage Examples

Timer-Style Priority Queue

#include <stdio.h>
#include <stdlib.h>
#include <xbase/heap.h>

typedef struct {
    uint64_t deadline;
    size_t   heap_idx;
    char     name[32];
} TimerEntry;

static int cmp_entry(const void *a, const void *b) {
    const TimerEntry *ea = (const TimerEntry *)a;
    const TimerEntry *eb = (const TimerEntry *)b;
    if (ea->deadline < eb->deadline) return -1;
    if (ea->deadline > eb->deadline) return  1;
    return 0;
}

static void set_idx(void *elem, size_t idx) {
    ((TimerEntry *)elem)->heap_idx = idx;
}

int main(void) {
    xHeap heap = xHeapCreate(cmp_entry, set_idx, 0);

    TimerEntry entries[] = {
        { .deadline = 300, .name = "C" },
        { .deadline = 100, .name = "A" },
        { .deadline = 200, .name = "B" },
    };

    for (int i = 0; i < 3; i++)
        xHeapPush(heap, &entries[i]);

    // Pop in order: A (100), B (200), C (300)
    while (xHeapSize(heap) > 0) {
        TimerEntry *e = (TimerEntry *)xHeapPop(heap);
        printf("%s (deadline=%llu)\n", e->name, e->deadline);
    }

    xHeapDestroy(heap);
    return 0;
}

Use Cases

Timer Subsystem — timer.h uses the min-heap to order timer entries by deadline. The timer thread peeks at the minimum to determine how long to sleep, then pops expired entries.
Event Loop Timers — The event loop's builtin timer heap (event.h) uses the same pattern to integrate timer dispatch with I/O polling.
Custom Priority Queues — Any scenario requiring efficient insert/extract-min with O(log n) removal by index.

Best Practices

Always implement xHeapSetIdxFunc. Without index tracking, xHeapRemove() and xHeapUpdate() cannot locate elements efficiently.
Store the index in your element struct. The setidx callback should write the index into a field of your element (e.g., elem->heap_idx = idx).
Don't free elements while they're in the heap. Remove them first with xHeapRemove() or xHeapPop().
Use xHeapUpdate() after changing an element's priority. The heap doesn't detect priority changes automatically.

Comparison with Other Libraries

Feature	xbase heap.h	C++ `std::priority_queue`	Linux kernel `prio_heap`	Go `container/heap`
Element Type	`void *` (generic)	Template	Fixed struct	`interface{}`
Index Tracking	Built-in (`setidx` callback)	Not available	Not available	Manual (`Fix` method)
Remove by Index	O(log n)	Not supported	Not supported	O(log n) via `Remove`
Update Priority	O(log n) via `xHeapUpdate`	Not supported	Not supported	O(log n) via `Fix`
Ownership	No (caller owns elements)	Yes (copies/moves)	No	No
Thread Safety	Not thread-safe	Not thread-safe	Not thread-safe	Not thread-safe

Key Differentiator: xbase's heap provides built-in index tracking via the setidx callback, enabling O(log n) removal and priority updates — features that std::priority_queue lacks entirely. This makes it ideal for timer implementations where cancellation is a common operation.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbase/heap_bench.cpp

Benchmark	N	Time (ns)	CPU (ns)	Throughput
`BM_Heap_Push`	8	983	987	8.1 M items/s
`BM_Heap_Push`	64	1,694	1,699	37.7 M items/s
`BM_Heap_Push`	512	8,722	8,725	58.7 M items/s
`BM_Heap_Push`	4,096	56,854	56,853	72.0 M items/s
`BM_Heap_Pop`	8	1,020	1,024	7.8 M items/s
`BM_Heap_Pop`	64	2,807	2,809	22.8 M items/s
`BM_Heap_Pop`	512	26,334	26,337	19.4 M items/s
`BM_Heap_Pop`	4,096	297,382	297,325	13.8 M items/s
`BM_Heap_Remove`	8	1,015	1,020	7.8 M items/s
`BM_Heap_Remove`	64	1,808	1,811	35.3 M items/s
`BM_Heap_Remove`	512	8,914	8,903	57.5 M items/s
`BM_Heap_Remove`	4,096	68,017	68,016	60.2 M items/s

Key Observations:

Push throughput scales well with heap size — amortized cost per element decreases as batch size grows, reaching 72M items/s at N=4096.
Pop is more expensive than push at large N due to the sift-down operation traversing more levels. At N=4096, pop throughput drops to ~14M items/s.
Remove (random index removal) performs comparably to push, thanks to the O(log n) index-tracked removal. This validates the setidx callback design for timer cancellation workloads.

mpsc.h — Lock-Free MPSC Queue

Introduction

mpsc.h provides a lock-free, intrusive multi-producer single-consumer (MPSC) queue. Multiple threads can push nodes concurrently without locks, while a single consumer thread pops nodes. It is the backbone of xbase's poll-mode timer dispatch and the event loop's offload completion queue.

Design Philosophy

Intrusive Design — Nodes embed an xMpsc struct directly, avoiding heap allocation per enqueue. This is critical for hot paths like timer expiry and offload completion where allocation overhead would be unacceptable.
Lock-Free Push — xMpscPush() uses a single atomic exchange (xAtomicXchg) on the tail pointer, making it wait-free for producers. No mutex, no CAS retry loop.
Single-Consumer Pop — xMpscPop() is designed for exactly one consumer thread. It uses atomic loads and a single CAS for the edge case of popping the last element. This simplification avoids the ABA problem that plagues multi-consumer designs.
Minimal Memory Ordering — The implementation uses xAtomicAcqRel for the exchange and xAtomicAcquire/xAtomicRelease for loads/stores, providing the minimum ordering needed for correctness without the overhead of sequential consistency.

Architecture

graph LR
    P1["Producer 1"] -->|"xMpscPush"| TAIL["tail"]
    P2["Producer 2"] -->|"xMpscPush"| TAIL
    P3["Producer 3"] -->|"xMpscPush"| TAIL

    HEAD["head"] -->|"xMpscPop"| C["Consumer"]

    subgraph "Queue"
        HEAD --> N1["Node 1"] --> N2["Node 2"] --> N3["Node 3"]
        N3 --- TAIL
    end

    style P1 fill:#4a90d9,color:#fff
    style P2 fill:#4a90d9,color:#fff
    style P3 fill:#4a90d9,color:#fff
    style C fill:#50b86c,color:#fff

Implementation Details

Data Structure

XDEF_STRUCT(xMpsc) {
    xMpsc *volatile next;  // Pointer to next node
};

The queue is represented by two external pointers:

head — Points to the oldest node (consumer reads from here)
tail — Points to the newest node (producers append here)

Push Algorithm

void xMpscPush(xMpsc **head, xMpsc **tail, xMpsc *node) {
    node->next = NULL;
    xMpsc *prev_tail = xAtomicXchg(tail, node, xAtomicAcqRel);
    if (prev_tail)
        prev_tail->next = node;  // Link to previous tail
    else
        xAtomicStore(head, node, xAtomicRelease);  // First node
}

The key insight: xAtomicXchg atomically replaces the tail and returns the old value. If the old tail was non-NULL, we link it to the new node. If it was NULL (empty queue), we also update the head.

Pop Algorithm

The pop operation handles three cases:

Empty queue — head is NULL, return NULL.
Multiple nodes — Advance head to head->next, return old head.
Single node — CAS tail to NULL. If CAS succeeds, also CAS head to NULL. If CAS fails (concurrent push in progress), spin until head->next becomes non-NULL.

flowchart TD
    START["xMpscPop()"]
    CHECK_HEAD{"head == NULL?"}
    EMPTY["Return NULL"]
    CHECK_NEXT{"head->next == NULL?"}
    MULTI["Advance head<br/>Return old head"]
    CAS_TAIL{"CAS tail → NULL?"}
    CAS_HEAD["CAS head → NULL<br/>Return old head"]
    SPIN["Spin until head->next != NULL"]
    ADVANCE["Advance head<br/>Return old head"]

    START --> CHECK_HEAD
    CHECK_HEAD -->|Yes| EMPTY
    CHECK_HEAD -->|No| CHECK_NEXT
    CHECK_NEXT -->|No| MULTI
    CHECK_NEXT -->|Yes| CAS_TAIL
    CAS_TAIL -->|Success| CAS_HEAD
    CAS_TAIL -->|Fail: concurrent push| SPIN
    SPIN --> ADVANCE

    style EMPTY fill:#e74c3c,color:#fff
    style MULTI fill:#50b86c,color:#fff
    style CAS_HEAD fill:#50b86c,color:#fff
    style ADVANCE fill:#50b86c,color:#fff

Memory Ordering Analysis

Operation	Ordering	Reason
`xAtomicXchg(tail, node)`	`AcqRel`	Acquire: see previous tail's `next` field. Release: make `node` visible to consumer.
`xAtomicStore(head, node)`	`Release`	Make the new head visible to the consumer.
`xAtomicLoad(head)`	`Acquire`	See the node written by the producer.
`xAtomicLoad(&head->next)`	`Acquire`	See the next pointer written by the producer.
`xAtomicCasStrong(tail, ...)`	`Release`	Publish the NULL tail to concurrent pushers.

Thread Safety

xMpscPush() — Thread-safe (multiple producers).
xMpscPop() — Single-consumer only. Must not be called concurrently.
xMpscEmpty() — Thread-safe (atomic load).

API Reference

Types

Type	Description
`xMpsc`	Intrusive queue node. Embed in your struct and use `xContainerOf()` to recover the enclosing struct.

Functions

Function	Signature	Description	Thread Safety
`xMpscPush`	`void xMpscPush(xMpsc head, xMpsc tail, xMpsc *node)`	Push a node. Wait-free for producers.	Thread-safe (multi-producer)
`xMpscPop`	`xMpsc xMpscPop(xMpsc head, xMpsc *tail)`	Pop the oldest node. Returns NULL if empty.	Single-consumer only
`xMpscEmpty`	`bool xMpscEmpty(xMpsc **head)`	Check if the queue is empty.	Thread-safe

Usage Examples

Basic Producer-Consumer

#include <stdio.h>
#include <pthread.h>
#include <xbase/mpsc.h>
#include <xbase/base.h>

typedef struct {
    xMpsc node;   // Must embed xMpsc
    int   value;
} Message;

static xMpsc *g_head = NULL;
static xMpsc *g_tail = NULL;

static void *producer(void *arg) {
    Message *msg = (Message *)arg;
    xMpscPush(&g_head, &g_tail, &msg->node);
    return NULL;
}

int main(void) {
    Message msgs[] = {
        { .value = 1 },
        { .value = 2 },
        { .value = 3 },
    };

    // Push from multiple threads
    pthread_t threads[3];
    for (int i = 0; i < 3; i++)
        pthread_create(&threads[i], NULL, producer, &msgs[i]);
    for (int i = 0; i < 3; i++)
        pthread_join(threads[i], NULL);

    // Pop from single consumer
    xMpsc *node;
    while ((node = xMpscPop(&g_head, &g_tail)) != NULL) {
        Message *msg = xContainerOf(node, Message, node);
        printf("Received: %d\n", msg->value);
    }

    return 0;
}

Use Cases

Timer Poll Mode — timer.h uses the MPSC queue in poll mode to pass expired timer entries from the timer thread to the polling thread without locks.
Event Loop Offload — The event loop's offload mechanism (event.h) uses an MPSC queue to deliver completed work items from worker threads to the event loop thread.
xlog Async Logger — logger.h uses the MPSC queue to pass log messages from application threads to the logger's flush thread.

Best Practices

Embed xMpsc in your struct. Don't allocate xMpsc nodes separately. Use xContainerOf() to recover the enclosing struct after popping.
Initialize head and tail to NULL. An empty queue has both pointers set to NULL.
Only one thread may call xMpscPop(). The single-consumer constraint is fundamental to the algorithm's correctness. Violating it causes data races.
Don't access a node after pushing it. Once pushed, the node is owned by the queue until popped.

Comparison with Other Libraries

Feature	xbase mpsc.h	Dmitry Vyukov MPSC	`concurrentqueue` (C++)	Linux `llist`
Design	Intrusive, lock-free	Intrusive, lock-free	Non-intrusive, lock-free	Intrusive, lock-free
Push	Wait-free (1 atomic xchg)	Wait-free (1 atomic xchg)	Lock-free (CAS loop)	Wait-free (1 atomic xchg)
Pop	Lock-free (single consumer)	Lock-free (single consumer)	Lock-free (multi-consumer)	Batch pop (splice)
Memory Ordering	AcqRel / Acquire / Release	SeqCst	Relaxed + fences	Varies
Allocation	None (intrusive)	None (intrusive)	Per-element (internal)	None (intrusive)
Multi-Consumer	No	No	Yes	No (batch only)
Language	C99	C/C++	C++11	C (kernel)

Key Differentiator: xbase's MPSC queue is minimal and intrusive — zero allocation overhead, wait-free push, and carefully chosen memory orderings. It's designed specifically for the single-consumer patterns found in event loops and timer systems.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbase/mpsc_bench.cpp

Benchmark	Time (ns)	CPU (ns)	Iterations	Throughput
`BM_Mpsc_SingleProducer`	3,712	3,712	187,897	275.9 M items/s
`BM_Mpsc_MultiProducer/2`	609,432	87,797	8,075	227.8 M items/s
`BM_Mpsc_MultiProducer/4`	1,327,965	148,356	4,768	269.6 M items/s
`BM_Mpsc_MultiProducer/8`	4,466,805	292,260	1,000	273.7 M items/s

Key Observations:

Single-producer push/pop achieves ~276M items/s, demonstrating the minimal overhead of the lock-free algorithm.
Multi-producer scaling maintains ~270M items/s aggregate throughput even with 8 concurrent producers, showing excellent scalability. The wall-clock time increases due to thread synchronization overhead, but per-CPU throughput remains stable.
The gap between wall-clock time and CPU time in multi-producer benchmarks reflects the cost of thread creation and barrier synchronization, not the queue operations themselves.

atomic.h — Atomic Operations

Introduction

atomic.h provides a set of macro wrappers over GCC/Clang __atomic builtins, offering portable atomic operations with explicit memory ordering. These macros are used throughout xbase for reference counting (memory.h), lock-free queues (mpsc.h), and event loop internals (event.h).

Design Philosophy

Thin Macro Wrappers — Each macro maps directly to a compiler builtin with zero overhead. No abstraction layers, no runtime dispatch.
Explicit Memory Ordering — Every atomic operation requires an explicit memory order parameter (xAtomicAcquire, xAtomicRelease, etc.), forcing the programmer to think about ordering requirements rather than defaulting to the expensive SeqCst.
GCC/Clang Builtins — The __atomic builtins are supported by GCC ≥ 4.7 and all versions of Clang. They generate optimal instructions for each target architecture (x86: lock prefix, ARM: ldrex/strex or LSE atomics).

Architecture

graph TD
    subgraph "xbase Atomic Users"
        MEMORY["memory.h<br/>xRetain / xRelease<br/>(SeqCst refcount)"]
        MPSC["mpsc.h<br/>xMpscPush / xMpscPop<br/>(AcqRel / Acquire / Release)"]
        EVENT["event_private.h<br/>inflight counter<br/>(Relaxed)"]
        TASK["task.c<br/>pending / done_count<br/>(stdatomic)"]
    end

    subgraph "atomic.h Macros"
        LOAD["xAtomicLoad"]
        STORE["xAtomicStore"]
        XCHG["xAtomicXchg"]
        CAS["xAtomicCas*"]
        ADD["xAtomicAdd/Sub"]
        FETCH["xAtomicFetch*"]
    end

    MEMORY --> ADD
    MPSC --> XCHG
    MPSC --> LOAD
    MPSC --> STORE
    MPSC --> CAS
    EVENT --> FETCH

    style MEMORY fill:#4a90d9,color:#fff
    style MPSC fill:#f5a623,color:#fff
    style EVENT fill:#50b86c,color:#fff

Implementation Details

Memory Order Constants

Macro	Value	Meaning
`xAtomicRelaxed`	`__ATOMIC_RELAXED`	No ordering constraints. Only guarantees atomicity.
`xAtomicConsume`	`__ATOMIC_CONSUME`	Data-dependent ordering (rarely used in practice).
`xAtomicAcquire`	`__ATOMIC_ACQUIRE`	Prevents reads/writes from being reordered before this operation.
`xAtomicRelease`	`__ATOMIC_RELEASE`	Prevents reads/writes from being reordered after this operation.
`xAtomicAcqRel`	`__ATOMIC_ACQ_REL`	Combines Acquire and Release.
`xAtomicSeqCst`	`__ATOMIC_SEQ_CST`	Full sequential consistency. Most expensive.

Operation Macros

Load / Store

Macro	Expansion	Description
`xAtomicLoad(p, o)`	`__atomic_load_n(p, o)`	Atomically read `*p`
`xAtomicStore(p, v, o)`	`__atomic_store_n(p, v, o)`	Atomically write `v` to `*p`

Exchange / CAS

Macro	Expansion	Description
`xAtomicXchg(p, v, o)`	`__atomic_exchange_n(p, v, o)`	Atomically swap `*p` with `v`, return old value
`xAtomicCasWeak(p, e, d, o)`	`__atomic_compare_exchange_n(p, e, d, true, o, Relaxed)`	Weak CAS (may spuriously fail)
`xAtomicCasStrong(p, e, d, o)`	`__atomic_compare_exchange_n(p, e, d, false, o, Relaxed)`	Strong CAS (no spurious failure)

Note: Both CAS macros use xAtomicRelaxed as the failure ordering. The success ordering is specified by the o parameter.

Arithmetic

Macro	Expansion	Returns
`xAtomicAdd(p, v, o)`	`__atomic_add_fetch(p, v, o)`	New value (`*p + v`)
`xAtomicSub(p, v, o)`	`__atomic_sub_fetch(p, v, o)`	New value (`*p - v`)
`xAtomicFetchAdd(p, v, o)`	`__atomic_fetch_add(p, v, o)`	Old value (before add)
`xAtomicFetchSub(p, v, o)`	`__atomic_fetch_sub(p, v, o)`	Old value (before sub)

Bitwise

Macro	Expansion	Returns
`xAtomicAnd(p, v, o)`	`__atomic_and_fetch(p, v, o)`	New value
`xAtomicOr(p, v, o)`	`__atomic_or_fetch(p, v, o)`	New value
`xAtomicXor(p, v, o)`	`__atomic_xor_fetch(p, v, o)`	New value
`xAtomicNand(p, v, o)`	`__atomic_nand_fetch(p, v, o)`	New value
`xAtomicFetchAnd(p, v, o)`	`__atomic_fetch_and(p, v, o)`	Old value
`xAtomicFetchOr(p, v, o)`	`__atomic_fetch_or(p, v, o)`	Old value
`xAtomicFetchXor(p, v, o)`	`__atomic_fetch_xor(p, v, o)`	Old value

API Reference

See the Operation Macros section above for the complete list. All macros are defined in <xbase/atomic.h> and require no function calls — they expand directly to compiler builtins.

Usage Examples

Atomic Counter

#include <stdio.h>
#include <pthread.h>
#include <xbase/atomic.h>

static int g_counter = 0;

static void *increment(void *arg) {
    (void)arg;
    for (int i = 0; i < 100000; i++) {
        xAtomicAdd(&g_counter, 1, xAtomicRelaxed);
    }
    return NULL;
}

int main(void) {
    pthread_t threads[4];
    for (int i = 0; i < 4; i++)
        pthread_create(&threads[i], NULL, increment, NULL);
    for (int i = 0; i < 4; i++)
        pthread_join(threads[i], NULL);

    printf("Counter: %d\n", xAtomicLoad(&g_counter, xAtomicRelaxed));
    // Output: Counter: 400000
    return 0;
}

Spinlock (Educational)

#include <xbase/atomic.h>

typedef struct { int locked; } Spinlock;

static inline void spin_lock(Spinlock *s) {
    while (xAtomicXchg(&s->locked, 1, xAtomicAcquire) != 0) {
        // Spin
    }
}

static inline void spin_unlock(Spinlock *s) {
    xAtomicStore(&s->locked, 0, xAtomicRelease);
}

Use Cases

Reference Counting — memory.h uses xAtomicAdd/xAtomicSub with SeqCst ordering for thread-safe reference count management.
Lock-Free Data Structures — mpsc.h uses xAtomicXchg for wait-free push and xAtomicCasStrong for the single-element pop edge case.
Event Loop Internals — The event loop uses xAtomicFetchAdd/xAtomicFetchSub with Relaxed ordering to track in-flight offload workers.

Best Practices

Use the weakest sufficient ordering. Relaxed for simple counters, Acquire/Release for producer-consumer patterns, SeqCst only when you need a total order visible to all threads.
Prefer xAtomicCasStrong over xAtomicCasWeak unless you're in a retry loop where spurious failures are acceptable (e.g., lock-free stack push).
Note the CAS failure ordering. Both CAS macros hardcode xAtomicRelaxed as the failure ordering. If you need stronger failure ordering, use the raw xAtomicCas macro directly.
Don't mix with C11 <stdatomic.h>. While both use the same underlying compiler builtins, mixing the two styles in the same translation unit can be confusing. xbase uses <stdatomic.h> in task.c for atomic_size_t but atomic.h macros everywhere else.

Comparison with Other Libraries

Feature	xbase atomic.h	C11 `<stdatomic.h>`	C++ `<atomic>`	Linux kernel atomics
Style	Macros over `__atomic` builtins	Language-level types	Template class	Inline functions + asm
Memory Order	Explicit parameter	Explicit parameter	Explicit parameter	Implicit (varies)
Types	Any scalar (via pointer)	`_Atomic` qualified types	`std::atomic<T>`	`atomic_t`, `atomic64_t`
CAS	`xAtomicCasWeak`/`Strong`	`atomic_compare_exchange_*`	`compare_exchange_*`	`cmpxchg`
Compiler	GCC ≥ 4.7, Clang	C11	C++11	GCC (kernel)
Portability	GCC/Clang only	Standard C11	Standard C++11	Linux kernel only

Key Differentiator: xbase's atomic macros are the thinnest possible wrapper — they add naming consistency (xAtomic* prefix) and explicit ordering parameters without any abstraction overhead. They work with any scalar type via pointer, unlike C11's _Atomic qualifier which requires type annotations.

log.h — Thread-Local Log Callback

Introduction

log.h provides a per-thread, callback-based logging mechanism for xKit's internal error reporting. Each thread can register its own log callback via xLogSetCallback(); when xLog() is called, the formatted message is dispatched to that callback. If no callback is registered, messages fall back to stderr. On fatal errors, a stack backtrace is captured and abort() is called.

Design Philosophy

Thread-Local Callbacks — Each thread has its own log callback and userdata, stored in __thread (thread-local storage). This avoids global locks and allows different threads to route log messages to different destinations (e.g., the xlog async logger, a test harness, or a custom handler).
Minimal and Non-Allocating — xLog() formats into a fixed-size thread-local buffer (XLOG_BUF_SIZE, default 512 bytes). No heap allocation occurs during logging, making it safe to call from low-level code paths.
Fatal with Backtrace — When fatal = true, xLog() captures a stack trace via xBacktrace() before calling abort(). This provides immediate diagnostic information for unrecoverable errors.
Bridge to xlog — The callback mechanism is designed to integrate with the higher-level xlog module. The xlog logger registers itself as the thread's log callback, so internal xKit errors are automatically routed through the async logging pipeline.

Architecture

graph TD
    subgraph "Thread 1"
        LOG1["xLog()"] --> CB1["Custom Callback"]
    end

    subgraph "Thread 2"
        LOG2["xLog()"] --> CB2["xlog Logger"]
    end

    subgraph "Thread 3 (no callback)"
        LOG3["xLog()"] --> STDERR["stderr"]
    end

    CB1 --> FILE["Log File"]
    CB2 --> XLOG["Async Logger Pipeline"]

    style LOG1 fill:#4a90d9,color:#fff
    style LOG2 fill:#4a90d9,color:#fff
    style LOG3 fill:#4a90d9,color:#fff

Implementation Details

Thread-Local State

XDEF_STRUCT(xLogCtx) {
    xLogCallback cb;        // User callback (NULL = stderr fallback)
    void        *userdata;  // Forwarded to callback
    char         buf[XLOG_BUF_SIZE];   // Format buffer (512 bytes)
    char         bt[XLOG_BT_SIZE];     // Backtrace buffer (2048 bytes)
};

static __thread xLogCtx tl_ctx;

Each thread gets ~2.5 KB of thread-local storage for logging. The buffers are reused across calls, so there's no allocation overhead.

xLog() Flow

flowchart TD
    CALL["xLog(fatal, fmt, ...)"]
    FMT["vsnprintf → tl_ctx.buf"]
    CHECK_FATAL{"fatal?"}
    BT["xBacktraceSkip(2, bt, size)"]
    CHECK_CB{"callback set?"}
    CB["cb(msg, backtrace, userdata)"]
    STDERR["fprintf(stderr, msg)"]
    ABORT["abort()"]

    CALL --> FMT
    FMT --> CHECK_FATAL
    CHECK_FATAL -->|Yes| BT
    CHECK_FATAL -->|No| CHECK_CB
    BT --> CHECK_CB
    CHECK_CB -->|Yes| CB
    CHECK_CB -->|No| STDERR
    CB --> CHECK_FATAL2{"fatal?"}
    STDERR --> CHECK_FATAL2
    CHECK_FATAL2 -->|Yes| ABORT
    CHECK_FATAL2 -->|No| DONE["Return"]

    style ABORT fill:#e74c3c,color:#fff
    style DONE fill:#50b86c,color:#fff

Buffer Size Configuration

The format buffer size can be overridden at compile time:

#define XLOG_BUF_SIZE 1024  // Must be defined before #include <xbase/log.h>
#include <xbase/log.h>

API Reference

Macros

Macro	Default	Description
`XLOG_BUF_SIZE`	512	Format buffer size in bytes. Override before including the header.

Types

Type	Description
`xLogCallback`	`void ()(const char msg, const char backtrace, void userdata)` — Log callback. `backtrace` is non-NULL only on fatal.

Functions

Function	Signature	Description	Thread Safety
`xLogSetCallback`	`void xLogSetCallback(xLogCallback cb, void *userdata)`	Register (or clear with NULL) the current thread's log callback.	Thread-local (each thread sets its own)
`xLog`	`void xLog(bool fatal, const char *fmt, ...)`	Format and dispatch a log message. If `fatal`, captures backtrace and calls `abort()`.	Thread-local (uses calling thread's callback)

Usage Examples

Basic Logging with Custom Callback

#include <stdio.h>
#include <xbase/log.h>

static void my_log_handler(const char *msg, const char *backtrace,
                            void *userdata) {
    FILE *f = (FILE *)userdata;
    fprintf(f, "[MyApp] %s\n", msg);
    if (backtrace) {
        fprintf(f, "Stack trace:\n%s", backtrace);
    }
}

int main(void) {
    // Route this thread's logs to a file
    FILE *logfile = fopen("app.log", "w");
    xLogSetCallback(my_log_handler, logfile);

    xLog(false, "Application started, version %d.%d", 1, 0);
    xLog(false, "Processing %d items", 42);

    // Clear callback (revert to stderr)
    xLogSetCallback(NULL, NULL);
    xLog(false, "This goes to stderr");

    fclose(logfile);
    return 0;
}

Fatal Error with Backtrace

#include <xbase/log.h>

void dangerous_operation(void) {
    // This will print the message, capture a backtrace, and abort()
    xLog(true, "Unrecoverable error: corrupted state detected");
    // Never reaches here
}

Use Cases

xKit Internal Error Reporting — All xKit modules use xLog() to report internal errors (e.g., allocation failures, invalid states). By registering a callback, applications can capture these messages in their logging pipeline.
xlog Integration — The xlog module registers its logger as the thread's callback via xLogSetCallback(), routing all internal xKit messages through the async logging system.
Test Frameworks — Test harnesses can register a callback that captures log messages for assertion, rather than letting them go to stderr.

Best Practices

Register callbacks early. Set up xLogSetCallback() before calling any xKit functions to ensure all messages are captured.
Don't block in callbacks. The callback runs synchronously on the calling thread. Blocking delays the caller. For async logging, use the xlog module.
Handle NULL backtrace. The backtrace parameter is NULL for non-fatal messages. Always check before using it.
Be aware of buffer truncation. Messages longer than XLOG_BUF_SIZE are truncated. Increase the size at compile time if needed.

Comparison with Other Libraries

Feature	xbase log.h	syslog	fprintf(stderr)	GLib g_log
Callback	Per-thread	Global handler	N/A	Global handler
Thread Safety	Thread-local (no locks)	Thread-safe (kernel)	Thread-safe (stdio lock)	Thread-safe (global lock)
Backtrace	Built-in on fatal	No	No	Optional (G_DEBUG)
Allocation	None (stack buffer)	None (kernel)	None (stdio buffer)	Heap (GString)
Fatal Handling	`abort()` with backtrace	N/A	N/A	`abort()` (G_LOG_FLAG_FATAL)
Customization	Per-thread callback	openlog()	Redirect fd	g_log_set_handler()

Key Differentiator: xbase's log is designed as a lightweight internal error channel, not a full logging framework. Its per-thread callback design avoids global locks and integrates naturally with the xlog async logger for production use.

backtrace.h — Platform-Adaptive Stack Backtrace

Introduction

backtrace.h captures the current call stack and formats it into a human-readable multi-line string. The unwinding backend is selected at build time with the following priority: libunwind > execinfo (macOS/glibc) > stub (unsupported platforms). It is used internally by xLog() to provide stack traces on fatal errors.

Design Philosophy

Build-Time Backend Selection — The backend is chosen via CMake-detected macros (XK_HAS_LIBUNWIND, XK_HAS_EXECINFO). This avoids runtime overhead and ensures the best available unwinder is used on each platform.
Graceful Degradation — On platforms without libunwind or execinfo, a stub backend returns a "not supported" message rather than crashing. This ensures xBacktrace() is always safe to call.
Automatic Frame Skipping — Internal frames (xBacktrace → xBacktraceSkip → bt_capture) are automatically skipped so the output starts from the caller's perspective. The skip parameter allows additional frames to be skipped (useful when called through wrapper functions like xLog).
Buffer-Based Output — The caller provides a buffer; no heap allocation occurs. This makes it safe to call from signal handlers, fatal error paths, and low-memory situations.

Architecture

graph TD
    API["xBacktrace() / xBacktraceSkip()"]
    SELECT{"Build-time selection"}
    LIBUNWIND["libunwind<br/>unw_step() loop"]
    EXECINFO["execinfo<br/>backtrace() + backtrace_symbols()"]
    STUB["stub<br/>'not supported' message"]
    BUF["User buffer<br/>(formatted output)"]

    API --> SELECT
    SELECT -->|XK_HAS_LIBUNWIND| LIBUNWIND
    SELECT -->|XK_HAS_EXECINFO| EXECINFO
    SELECT -->|fallback| STUB
    LIBUNWIND --> BUF
    EXECINFO --> BUF
    STUB --> BUF

    style LIBUNWIND fill:#50b86c,color:#fff
    style EXECINFO fill:#4a90d9,color:#fff
    style STUB fill:#f5a623,color:#fff

Implementation Details

Backend Selection

Backend	Macro	Platform	Quality
libunwind	`XK_HAS_LIBUNWIND`	Linux (with libunwind installed)	Best — accurate unwinding, symbol + offset
execinfo	`XK_HAS_EXECINFO`	macOS, Linux (glibc)	Good — requires `-rdynamic` on Linux for symbols
stub	(fallback)	Any	Minimal — returns "not supported" message

Output Format

Each frame is formatted as:

#0 0x7fff8a1b2c3d symbol_name+0x1a
#1 0x7fff8a1b2c3d another_function+0x42
#2 0x7fff8a1b2c3d <unknown>

#N — Frame number (0 = most recent)
0xADDR — Instruction pointer address
symbol+offset — Function name and offset (if available)
<unknown> — When symbol resolution fails

Frame Skipping

Call stack:
  bt_capture()         ← INTERNAL_SKIP (2 frames)
  xBacktraceSkip()     ← INTERNAL_SKIP
  xLog()               ← user skip = 2 (from xLog)
  user_function()      ← first visible frame
  main()

xBacktrace() calls xBacktraceSkip(0, ...), which adds INTERNAL_SKIP = 2 to skip its own frames. xLog() calls xBacktraceSkip(2, ...) to also skip xLog and xLogSetCallback frames.

libunwind Backend

Uses unw_getcontext() → unw_init_local() → unw_step() loop. For each frame:

unw_get_reg(UNW_REG_IP) — Get instruction pointer
unw_get_proc_name() — Get symbol name and offset

execinfo Backend

Uses backtrace() to capture frame addresses, then backtrace_symbols() to resolve names. On Linux, link with -rdynamic to export symbols for resolution.

API Reference

Functions

Function	Signature	Description	Thread Safety
`xBacktrace`	`int xBacktrace(char *buf, size_t size)`	Capture the call stack into `buf`. Equivalent to `xBacktraceSkip(0, buf, size)`.	Thread-safe (uses only local/stack state)
`xBacktraceSkip`	`int xBacktraceSkip(int skip, char *buf, size_t size)`	Capture the call stack, skipping `skip` additional frames beyond internal frames.	Thread-safe

Parameters

Parameter	Description
`skip`	Number of additional frames to skip (0 = no extra skipping)
`buf`	Destination buffer. May be NULL (returns 0).
`size`	Size of `buf` in bytes.

Return Value

Number of bytes written (excluding trailing \0), or 0 if buf is NULL or size is 0.

Usage Examples

Capture and Print Stack Trace

#include <stdio.h>
#include <xbase/backtrace.h>

void foo(void) {
    char buf[4096];
    int n = xBacktrace(buf, sizeof(buf));
    if (n > 0) {
        printf("Stack trace:\n%s", buf);
    }
}

void bar(void) { foo(); }

int main(void) {
    bar();
    return 0;
}

Output (with execinfo on macOS):

Stack trace:
#0 0x100003f20 foo+0x20
#1 0x100003f80 bar+0x10
#2 0x100003fa0 main+0x10

Skip Wrapper Frames

#include <xbase/backtrace.h>

// Custom error reporter that skips its own frame
void report_error(const char *msg) {
    char bt[2048];
    xBacktraceSkip(1, bt, sizeof(bt)); // Skip report_error itself
    fprintf(stderr, "Error: %s\nBacktrace:\n%s", msg, bt);
}

Use Cases

Fatal Error Diagnostics — xLog() captures a backtrace on fatal errors, providing immediate context for debugging crashes.
Debug Assertions — Custom assertion macros can include xBacktrace() to show where the assertion failed.
Memory Leak Detection — Record allocation backtraces to identify where leaked objects were created.

Best Practices

Provide a large enough buffer. 4096 bytes is usually sufficient for 20-30 frames. The output is truncated (not corrupted) if the buffer is too small.
Link with -rdynamic on Linux. Without it, the execinfo backend shows only addresses, not symbol names.
Install libunwind for best results on Linux. It provides more accurate unwinding than execinfo, especially through optimized code and signal handlers.
Don't call from signal handlers with execinfo. backtrace_symbols() calls malloc(), which is not async-signal-safe. libunwind is safer in this context.

Comparison with Other Libraries

Feature	xbase backtrace.h	glibc `backtrace()`	libunwind	Boost.Stacktrace	Windows `CaptureStackBackTrace`
Platform	macOS + Linux + stub	Linux (glibc)	Linux + macOS	Cross-platform	Windows
Accuracy	Backend-dependent	Good (glibc)	Excellent	Backend-dependent	Good
Symbol Resolution	Built-in	`backtrace_symbols()`	`unw_get_proc_name()`	Backend-dependent	`SymFromAddr()`
Allocation	None (user buffer)	`malloc()` for symbols	None	Heap	None
Signal Safety	libunwind: yes, execinfo: no	No (`malloc`)	Yes	No	Yes
Frame Skipping	Built-in (`skip` param)	Manual	Manual	Manual	`FramesToSkip` param

Key Differentiator: xbase's backtrace provides a simple, buffer-based API with automatic frame skipping and graceful degradation across platforms. It's designed for integration into error reporting paths where heap allocation is undesirable.

socket.h — Async Socket

Introduction

socket.h provides an async socket abstraction built on top of xEventLoop. It wraps the POSIX socket API with automatic non-blocking setup, event loop registration, and idle-timeout support. When a socket becomes readable, writable, or times out, a single unified callback is invoked with the appropriate event mask.

Design Philosophy

Thin Wrapper, Not a Framework — xSocket adds just enough abstraction to eliminate boilerplate (non-blocking setup, FD_CLOEXEC, event registration) without hiding the underlying fd. You can always retrieve the raw fd via xSocketFd() for direct system calls.
Idle-Timeout Semantics — Read and write timeouts are reset on every corresponding I/O event, implementing idle-timeout behavior. This is ideal for detecting dead connections: if no data arrives within the timeout period, the callback fires with xEvent_Timeout.
Unified Callback — A single xSocketFunc callback handles all events (read, write, timeout). The mask parameter tells you what happened, and the xEvent_Timeout flag is OR'd with xEvent_Read or xEvent_Write to indicate which direction timed out.
Lifecycle Tied to Event Loop — A socket is created and destroyed in the context of an event loop. xSocketDestroy() cancels timers, removes the event source, closes the fd, and frees the handle in one call.

Architecture

graph TD
    APP["Application"] -->|"xSocketCreate()"| SOCKET["xSocket"]
    SOCKET -->|"xEventAdd()"| LOOP["xEventLoop"]
    LOOP -->|"I/O ready"| TRAMP["trampoline()"]
    TRAMP -->|"reset timers"| TIMER["Timer Heap"]
    TRAMP -->|"forward"| CB["callback(sock, mask, userp)"]
    TIMER -->|"timeout"| TIMEOUT_CB["timeout_cb()"]
    TIMEOUT_CB -->|"xEvent_Timeout"| CB

    style SOCKET fill:#4a90d9,color:#fff
    style LOOP fill:#f5a623,color:#fff
    style CB fill:#50b86c,color:#fff

Implementation Details

Internal Structure

struct xSocket_ {
    int              fd;               // Underlying file descriptor
    xEventLoop       loop;             // Bound event loop
    xEventSource     source;           // Registered event source
    xEventMask       mask;             // Current event mask
    xSocketFunc      callback;         // User callback
    void            *userp;            // User data
    xEventTimer      read_timer;       // Read idle timeout timer
    xEventTimer      write_timer;      // Write idle timeout timer
    int              read_timeout_ms;  // Read timeout setting (0 = disabled)
    int              write_timeout_ms; // Write timeout setting (0 = disabled)
};

Trampoline Pattern

The socket registers an internal trampoline() function as the event callback with the event loop. This trampoline:

Resets idle timers — On xEvent_Read, cancels and re-arms the read timer. On xEvent_Write, cancels and re-arms the write timer.
Forwards to user callback — Calls callback(sock, mask, userp) with the original event mask.

This ensures idle timers are always reset transparently, without requiring the user to manage them manually.

Socket Creation

xSocketCreate() performs these steps atomically:

socket(family, type, protocol) — On Linux/BSD with SOCK_CLOEXEC | SOCK_NONBLOCK, both flags are set in one syscall. On other platforms, fcntl() is used as a fallback.
xEventAdd(loop, fd, mask, trampoline, socket) — Registers with the event loop.
Returns the opaque xSocket handle.

Timeout Mechanism

sequenceDiagram
    participant App
    participant Socket as xSocket
    participant L as xEventLoop
    participant Timer as Timer Heap

    App->>Socket: xSocketSetTimeout(sock, 5000, 3000)
    Socket->>Timer: arm read timer (5s)
    Socket->>Timer: arm write timer (3s)

    Note over L: Data arrives on fd
    L->>Socket: trampoline(fd, xEvent_Read)
    Socket->>Timer: cancel + re-arm read timer (5s)
    Socket->>App: callback(sock, xEvent_Read)

    Note over Timer: 5 seconds of silence...
    Timer->>Socket: read_timeout_cb()
    Socket->>App: callback(sock, xEvent_Timeout | xEvent_Read)

API Reference

Types

Type	Description
`xSocket`	Opaque handle to an async socket
`xSocketFunc`	`void ()(xSocket sock, xEventMask mask, void arg)` — Socket event callback

Functions

Function	Signature	Description	Thread Safety
`xSocketCreate`	`xSocket xSocketCreate(xEventLoop loop, int family, int type, int protocol, xEventMask mask, xSocketFunc callback, void *userp)`	Create a non-blocking socket and register with the event loop.	Not thread-safe
`xSocketDestroy`	`void xSocketDestroy(xEventLoop loop, xSocket sock)`	Cancel timers, remove from event loop, close fd, free handle. Safe with NULL.	Not thread-safe
`xSocketSetMask`	`xErrno xSocketSetMask(xEventLoop loop, xSocket sock, xEventMask mask)`	Change the watched event mask.	Not thread-safe
`xSocketSetTimeout`	`xErrno xSocketSetTimeout(xSocket sock, int read_timeout_ms, int write_timeout_ms)`	Set idle timeouts. Pass 0 to cancel. Replaces previous settings.	Not thread-safe
`xSocketFd`	`int xSocketFd(xSocket sock)`	Return the underlying fd, or -1 if NULL.	Thread-safe (read-only)
`xSocketMask`	`xEventMask xSocketMask(xSocket sock)`	Return the current event mask, or 0 if NULL.	Thread-safe (read-only)

Callback Mask Values

Mask	Meaning
`xEvent_Read`	Socket is readable
`xEvent_Write`	Socket is writable
`xEvent_Timeout \| xEvent_Read`	Read idle timeout fired
`xEvent_Timeout \| xEvent_Write`	Write idle timeout fired

Usage Examples

TCP Echo Client with Timeout

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <xbase/socket.h>

static xEventLoop g_loop;

static void on_socket(xSocket sock, xEventMask mask, void *arg) {
    (void)arg;

    if (mask & xEvent_Timeout) {
        printf("Timeout on %s\n",
               (mask & xEvent_Read) ? "read" : "write");
        xSocketDestroy(g_loop, sock);
        xEventLoopStop(g_loop);
        return;
    }

    if (mask & xEvent_Read) {
        char buf[1024];
        ssize_t n;
        while ((n = read(xSocketFd(sock), buf, sizeof(buf))) > 0) {
            printf("Received: %.*s\n", (int)n, buf);
        }
    }

    if (mask & xEvent_Write) {
        const char *msg = "Hello, server!";
        write(xSocketFd(sock), msg, strlen(msg));
        // Switch to read-only after sending
        xSocketSetMask(g_loop, sock, xEvent_Read);
    }
}

int main(void) {
    g_loop = xEventLoopCreate();

    xSocket sock = xSocketCreate(g_loop, AF_INET, SOCK_STREAM, 0,
                                  xEvent_Write, on_socket, NULL);
    if (!sock) return 1;

    // Set 5-second read idle timeout
    xSocketSetTimeout(sock, 5000, 0);

    // Connect (non-blocking)
    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port   = htons(8080),
    };
    inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
    connect(xSocketFd(sock), (struct sockaddr *)&addr, sizeof(addr));

    xEventLoopRun(g_loop);
    xEventLoopDestroy(g_loop);
    return 0;
}

UDP Receiver with Idle Timeout

#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <xbase/socket.h>

static void on_udp(xSocket sock, xEventMask mask, void *arg) {
    xEventLoop loop = (xEventLoop)arg;

    if (mask & xEvent_Timeout) {
        printf("No data for 10 seconds, shutting down.\n");
        xSocketDestroy(loop, sock);
        xEventLoopStop(loop);
        return;
    }

    if (mask & xEvent_Read) {
        char buf[65536];
        ssize_t n;
        while ((n = read(xSocketFd(sock), buf, sizeof(buf))) > 0) {
            printf("UDP: %.*s\n", (int)n, buf);
        }
    }
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xSocket sock = xSocketCreate(loop, AF_INET, SOCK_DGRAM, 0,
                                  xEvent_Read, on_udp, loop);

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port   = htons(9999),
        .sin_addr.s_addr = INADDR_ANY,
    };
    bind(xSocketFd(sock), (struct sockaddr *)&addr, sizeof(addr));

    // 10-second read idle timeout
    xSocketSetTimeout(sock, 10000, 0);

    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

Use Cases

Network Servers — Create listening sockets, accept connections, and manage each client with its own xSocket + idle timeout. Dead connections are automatically detected.
Protocol Clients — Build async clients (HTTP, Redis, etc.) that connect, send requests, and wait for responses with timeout protection.
Real-Time Data Feeds — Monitor UDP multicast sockets with idle timeouts to detect feed outages.

Best Practices

Always drain in edge-triggered mode. Since the underlying event loop is edge-triggered, read/write until EAGAIN in every callback.
Use idle timeouts for connection health. Set read_timeout_ms to detect dead peers. The timeout resets automatically on each read event.
Destroy sockets before the event loop. xSocketDestroy() calls xEventDel() and xEventLoopTimerCancel(), which require a valid event loop.
Check the timeout direction. When xEvent_Timeout fires, check mask & xEvent_Read vs. mask & xEvent_Write to know which direction timed out.
Don't close the fd manually. xSocketDestroy() closes it for you. Closing it separately leads to double-close bugs.

Comparison with Other Libraries

Feature	xbase socket.h	POSIX socket API	libuv `uv_tcp_t`	Boost.Asio
Non-blocking Setup	Automatic (`SOCK_NONBLOCK` + `FD_CLOEXEC`)	Manual (`fcntl`)	Automatic	Automatic
Event Registration	Automatic (via `xEventLoop`)	Manual (`epoll_ctl` / `kevent`)	Automatic	Automatic
Idle Timeout	Built-in (`xSocketSetTimeout`)	Manual (timer + bookkeeping)	Manual (`uv_timer`)	Manual (`deadline_timer`)
Callback Style	Single unified callback with mask	N/A (blocking or manual poll)	Separate read/write callbacks	Separate handlers
Raw fd Access	`xSocketFd()`	Direct	`uv_fileno()`	`native_handle()`
Buffered I/O	No (raw fd)	No	Yes (`uv_read_start`)	Yes (async_read)
Platform	macOS + Linux	POSIX	Cross-platform	Cross-platform

Key Differentiator: xbase's socket abstraction is intentionally thin — it handles the boilerplate (non-blocking, event registration, idle timeout) but leaves data reading/writing to the caller via the raw fd. This gives maximum flexibility without imposing a buffering strategy.

io.h — Abstract I/O Interfaces

Introduction

io.h defines four lightweight I/O interfaces — xReader, xWriter, xSeeker, xCloser — inspired by Go's io.Reader / io.Writer / io.Seeker / io.Closer. Each interface is a small struct containing a function pointer and an opaque void *ctx, making it trivial to adapt any object that provides the matching function signature.

On top of these interfaces, io.h provides a set of convenience functions (xRead, xReadFull, xReadAll, xWrite, xWritev, xSeek, xClose) that operate generically on any implementation, enabling code reuse across TCP connections, TLS streams, file descriptors, in-memory buffers, and more.

Design Philosophy

Value-Type Interfaces — Each interface is a plain struct (function pointer + context), not a heap-allocated object. They are cheap to copy, pass by value, and require no memory management.
POSIX Semantics — Function signatures mirror their POSIX counterparts: read(2), writev(2), lseek(2), close(2). This makes the learning curve near-zero for C developers.
Composable Helpers — Higher-level functions like xReadFull and xReadAll are built on top of xReader, so any object that provides a reader automatically gains these capabilities.
Zero-Initialized = Invalid — A zero-initialized struct (all NULL) is treated as "not set". Convenience functions can detect this and return an error instead of crashing.

Architecture

graph TD
    subgraph "Interfaces"
        R["xReader<br/>ssize_t read(ctx, buf, len)"]
        W["xWriter<br/>ssize_t writev(ctx, iov, iovcnt)"]
        S["xSeeker<br/>off_t seek(ctx, offset, whence)"]
        C["xCloser<br/>int close(ctx)"]
    end

    subgraph "Convenience Functions"
        XR["xRead"]
        XRF["xReadFull"]
        XRA["xReadAll"]
        XW["xWrite"]
        XWV["xWritev"]
        XS["xSeek"]
        XC["xClose"]
    end

    subgraph "Implementations"
        TCP["xTcpConn<br/>xTcpConnReader / xTcpConnWriter"]
        IOB["xIOBuffer<br/>(read/writev funcs)"]
        FD["File Descriptor<br/>(custom wrapper)"]
    end

    XR --> R
    XRF --> R
    XRA --> R
    XW --> W
    XWV --> W
    XS --> S
    XC --> C

    TCP -.->|"adapts to"| R
    TCP -.->|"adapts to"| W
    IOB -.->|"adapts to"| R
    IOB -.->|"adapts to"| W
    FD -.->|"adapts to"| R
    FD -.->|"adapts to"| W

    style R fill:#4a90d9,color:#fff
    style W fill:#4a90d9,color:#fff
    style S fill:#4a90d9,color:#fff
    style C fill:#4a90d9,color:#fff
    style XRF fill:#50b86c,color:#fff
    style XRA fill:#50b86c,color:#fff

Implementation Details

Interface Structs

Each interface is a two-field struct:

Interface	Function Pointer	Semantics
`xReader`	`ssize_t (read)(void ctx, void *buf, size_t len)`	Returns bytes read, 0 on EOF, -1 on error
`xWriter`	`ssize_t (writev)(void ctx, const struct iovec *iov, int iovcnt)`	Returns bytes written, -1 on error
`xSeeker`	`off_t (seek)(void ctx, off_t offset, int whence)`	Returns resulting offset, -1 on error
`xCloser`	`int (close)(void ctx)`	Returns 0 on success, -1 on failure

xReadFull — Retry Logic

xReadFull loops calling r.read until exactly len bytes are read or EOF is reached. It automatically retries on EAGAIN and EINTR, making it suitable for both blocking and non-blocking file descriptors:

while (total < len):
    n = r.read(ctx, buf + total, len - total)
    if n > 0:  total += n
    if n == 0: break          // EOF
    if n == -1:
        if EAGAIN or EINTR: continue
        else: return -1       // real error
return total

xReadAll — Dynamic Buffer Growth

xReadAll reads until EOF into a dynamically allocated buffer. It starts with a 4096-byte allocation and doubles the capacity each time the buffer fills up:

cap = 4096, buf = malloc(cap)
loop:
    if total == cap: realloc(buf, cap * 2)
    n = r.read(ctx, buf + total, cap - total)
    if n > 0:  total += n
    if n == 0: *out = buf, *out_len = total, return 0
    if n == -1:
        if EAGAIN or EINTR: continue
        else: free(buf), return -1

The caller is responsible for freeing the returned buffer with free().

xWrite — Single Buffer Convenience

xWrite wraps a contiguous buffer into a single struct iovec and delegates to w.writev, avoiding the need for callers to construct iovec arrays for simple writes:

ssize_t xWrite(xWriter w, const void *buf, size_t len) {
    struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
    return w.writev(w.ctx, &iov, 1);
}

API Reference

Types

Type	Description
`xReader`	Abstract reader — `{ ssize_t (read)(void, void, size_t), void ctx }`
`xWriter`	Abstract writer — `{ ssize_t (writev)(void, const struct iovec, int), void ctx }`
`xSeeker`	Abstract seeker — `{ off_t (seek)(void, off_t, int), void *ctx }`
`xCloser`	Abstract closer — `{ int (close)(void), void *ctx }`

Functions

Function	Signature	Description
`xRead`	`ssize_t xRead(xReader r, void *buf, size_t len)`	Single read; returns bytes read, 0 on EOF, -1 on error
`xWrite`	`ssize_t xWrite(xWriter w, const void *buf, size_t len)`	Write a contiguous buffer (wraps into single iovec)
`xWritev`	`ssize_t xWritev(xWriter w, const struct iovec *iov, int iovcnt)`	Scatter-gather write
`xSeek`	`off_t xSeek(xSeeker s, off_t offset, int whence)`	Reposition offset (SEEK_SET / SEEK_CUR / SEEK_END)
`xClose`	`int xClose(xCloser c)`	Close the underlying resource
`xReadFull`	`ssize_t xReadFull(xReader r, void *buf, size_t len)`	Read exactly `len` bytes, retrying on partial reads and EAGAIN/EINTR
`xReadAll`	`int xReadAll(xReader r, void *out, size_t out_len)`	Read until EOF into a malloc'd buffer; caller must `free(*out)`

Usage Examples

Creating a Custom Reader

#include <xbase/io.h>
#include <unistd.h>

// Adapt a file descriptor into an xReader
static ssize_t fd_read(void *ctx, void *buf, size_t len) {
    int fd = (int)(intptr_t)ctx;
    return read(fd, buf, len);
}

xReader make_fd_reader(int fd) {
    xReader r;
    r.read = fd_read;
    r.ctx  = (void *)(intptr_t)fd;
    return r;
}

Reading Exactly N Bytes

#include <xbase/io.h>

void read_header(xReader r) {
    char header[64];
    ssize_t n = xReadFull(r, header, sizeof(header));
    if (n < 0) {
        // error
    } else if ((size_t)n < sizeof(header)) {
        // EOF before full header
    } else {
        // got all 64 bytes
    }
}

Reading All Data Until EOF

#include <xbase/io.h>
#include <stdlib.h>

void read_body(xReader r) {
    void  *data;
    size_t data_len;

    if (xReadAll(r, &data, &data_len) == 0) {
        // process data (data_len bytes at data)
        free(data);
    } else {
        // error
    }
}

Using with xTcpConn

xTcpConn (from <xnet/tcp.h>) provides adapter functions that return xReader and xWriter bound to the connection's transport layer. This allows TCP connections to be used with all generic I/O helpers:

#include <xbase/io.h>
#include <xnet/tcp.h>

void handle_connection(xTcpConn conn) {
    // Get I/O adapters from the TCP connection
    xReader r = xTcpConnReader(conn);
    xWriter w = xTcpConnWriter(conn);

    // Read a fixed-size header
    char header[16];
    ssize_t n = xReadFull(r, header, sizeof(header));
    if (n < (ssize_t)sizeof(header)) return;

    // Read the entire body until the peer closes
    void  *body;
    size_t body_len;
    if (xReadAll(r, &body, &body_len) != 0) return;

    // Echo back through the generic writer
    xWrite(w, body, body_len);
    free(body);
}

Scatter-Gather Write

#include <xbase/io.h>

void send_http_response(xWriter w) {
    const char *header = "HTTP/1.1 200 OK\r\nContent-Length: 5\r\n\r\n";
    const char *body   = "Hello";

    struct iovec iov[2] = {
        { .iov_base = (void *)header, .iov_len = strlen(header) },
        { .iov_base = (void *)body,   .iov_len = 5 },
    };

    xWritev(w, iov, 2);
}

Integration with xTcpConn

xTcpConn provides two adapter functions that bridge the TCP connection to the generic I/O interfaces:

Function	Returns	Description
`xTcpConnReader(conn)`	`xReader`	Reader bound to `transport.read` — equivalent to `xTcpConnRecv`
`xTcpConnWriter(conn)`	`xWriter`	Writer bound to `transport.writev` — equivalent to `xTcpConnSendIov`

These adapters are zero-allocation: they copy the function pointer and context from the connection's internal xTransport into a stack-allocated struct. The returned interfaces are valid as long as the connection (and its transport) remains alive.

Why no xCloser adapter? xTcpConnClose() requires an xEventLoop parameter to properly unregister the socket from the event loop, which does not fit the int (*close)(void *ctx) signature.

Best Practices

Prefer xReadFull over manual loops when you need an exact number of bytes. It handles EAGAIN, EINTR, and partial reads correctly.
Always free() the buffer from xReadAll on success. On error, the function cleans up internally.
Use xWrite for simple writes, xWritev for multi-buffer writes. xWrite is a thin wrapper that constructs a single iovec — no performance penalty.
Check for zero-initialized interfaces before passing them to helpers. If xTcpConnReader(NULL) returns a zero struct, calling xRead on it will dereference a NULL function pointer.
Obtain adapters once, use many times. Since xTcpConnReader / xTcpConnWriter are value types, you can call them once at the start of a handler and reuse the result throughout.

Comparison with Other Libraries

Feature	xbase io.h	Go io.Reader/Writer	POSIX read/write	C++ std::iostream
Abstraction	Struct (fn ptr + ctx)	Interface (vtable)	Raw syscall	Class hierarchy
Allocation	Zero (stack value)	Heap (interface value)	N/A	Heap (stream object)
Composability	Via helper functions	Via io.Copy, io.ReadAll, etc.	Manual loops	Via stream operators
Scatter-Gather	Built-in (xWritev)	No (use io.MultiWriter)	writev(2)	No
Read-Until-EOF	xReadAll (malloc'd buffer)	io.ReadAll ([]byte)	Manual loop	std::istreambuf_iterator
Error Model	Return value (-1 + errno)	(n, error) tuple	Return value (-1 + errno)	Stream state flags

xbuf — Buffer Toolkit

Introduction

xbuf is xKit's buffer module, providing three distinct buffer types optimized for different use cases: a linear auto-growing buffer, a fixed-size ring buffer, and a reference-counted block-chain I/O buffer. Together they cover the full spectrum of buffering needs — from simple byte accumulation to zero-copy network I/O.

Design Philosophy

One Buffer Does Not Fit All — Rather than a single "universal" buffer, xbuf offers three specialized types. Each makes different trade-offs between simplicity, performance, and memory efficiency.
Flexible Array Member Layout — Both xBuffer and xRingBuffer allocate header + data in a single malloc() call using C99 flexible array members. This eliminates pointer indirection and improves cache locality.
Reference-Counted Block Sharing — xIOBuffer uses reference-counted blocks that can be shared across multiple buffers. This enables zero-copy split and append operations critical for high-performance network protocols.
I/O Integration — All three types provide ReadFd/WriteFd helpers that handle EINTR retries and scatter-gather I/O (readv/writev), making them ready for event-driven network programming.

Architecture

graph TD
    subgraph "xbuf Module"
        BUF["xBuffer<br/>Linear auto-growing<br/>Single contiguous allocation"]
        RING["xRingBuffer<br/>Fixed-size circular<br/>Power-of-2 masking"]
        IO["xIOBuffer<br/>Block-chain<br/>Reference-counted"]
    end

    subgraph "Shared Infrastructure"
        POOL["Block Pool<br/>Treiber stack freelist"]
        ATOMIC["xbase/atomic.h<br/>Lock-free operations"]
    end

    IO --> POOL
    POOL --> ATOMIC

    subgraph "I/O Layer"
        READ["read() / readv()"]
        WRITE["write() / writev()"]
    end

    BUF --> READ
    BUF --> WRITE
    RING --> READ
    RING --> WRITE
    IO --> READ
    IO --> WRITE

    style BUF fill:#4a90d9,color:#fff
    style RING fill:#f5a623,color:#fff
    style IO fill:#50b86c,color:#fff

Sub-Module Overview

Header	Type	Description	Doc
`buf.h`	`xBuffer`	Linear auto-growing byte buffer with flexible array member layout	buf.md
`ring.h`	`xRingBuffer`	Fixed-size circular buffer with power-of-2 bitmask indexing	ring.md
`io.h`	`xIOBuffer`	Reference-counted block-chain I/O buffer with zero-copy operations	io.md

How to Choose

Criterion	xBuffer	xRingBuffer	xIOBuffer
Memory layout	Contiguous	Contiguous (circular)	Non-contiguous (block chain)
Growth	Auto-growing (2x realloc)	Fixed size (never grows)	Auto-growing (new blocks)
Best for	Accumulating variable-length data	Fixed-capacity producer-consumer	High-throughput network I/O
Zero-copy split	No	No	Yes
Zero-copy append	No	No	Yes (between xIOBuffers)
Scatter-gather I/O	No (single buffer)	Yes (up to 2 iovecs)	Yes (N iovecs)
Memory overhead	Minimal (1 allocation)	Minimal (1 allocation)	Per-block overhead + ref array
Thread safety	Not thread-safe	Not thread-safe	Block pool is thread-safe

Decision Guide

Need to accumulate data of unknown size?
  → xBuffer (simple, auto-growing)

Need a fixed-capacity FIFO between producer and consumer?
  → xRingBuffer (no allocation after creation)

Need zero-copy operations or scatter-gather I/O for networking?
  → xIOBuffer (block-chain with reference counting)

Quick Start

#include <stdio.h>
#include <xbuf/buf.h>
#include <xbuf/ring.h>
#include <xbuf/io.h>

int main(void) {
    // 1. Linear buffer: accumulate data
    xBuffer buf = xBufferCreate(256);
    xBufferAppend(&buf, "Hello, ", 7);
    xBufferAppend(&buf, "xbuf!", 5);
    printf("buf: %.*s\n", (int)xBufferLen(buf), (const char *)xBufferData(buf));
    xBufferDestroy(buf);

    // 2. Ring buffer: fixed-capacity FIFO
    xRingBuffer ring = xRingBufferCreate(1024);
    xRingBufferWrite(ring, "circular", 8);
    char out[16];
    size_t n = xRingBufferRead(ring, out, sizeof(out));
    printf("ring: %.*s\n", (int)n, out);
    xRingBufferDestroy(ring);

    // 3. IO buffer: block-chain with zero-copy
    xIOBuffer io;
    xIOBufferInit(&io);
    xIOBufferAppend(&io, "block-chain I/O", 15);
    char linear[64];
    xIOBufferCopyTo(&io, linear);
    printf("io: %.*s\n", (int)xIOBufferLen(&io), linear);
    xIOBufferDeinit(&io);

    return 0;
}

Relationship with Other Modules

xbase — xIOBuffer uses atomic.h for lock-free block pool management and reference counting.
xhttp — The HTTP client (client.h) uses xIOBuffer for response body accumulation and SSE stream parsing.
xlog — The async logger (logger.h) may use xBuffer for log message formatting.

buf.h — Linear Auto-Growing Buffer

Introduction

buf.h provides xBuffer, a simple contiguous byte buffer that automatically grows when more space is needed. It maintains separate read and write positions, supporting efficient append-and-consume patterns. The buffer header and data area are allocated in a single malloc() call using a C99 flexible array member, avoiding an extra pointer indirection.

Design Philosophy

Single Allocation — Header and data live in one contiguous block (struct + flexible array member). This means one malloc(), one free(), and excellent cache locality.
Handle Indirection — Because realloc() may relocate the entire object, write APIs take xBuffer *bufp (pointer to handle) so the caller's handle stays valid after growth.
Compact Before Grow — When the buffer needs more space, it first tries to compact (slide unread data to the front) before resorting to realloc(). This reclaims consumed space without allocation.
2x Growth — When reallocation is necessary, capacity doubles each time, providing amortized O(1) append.

Architecture

graph LR
    subgraph "xBuffer Lifecycle"
        CREATE["xBufferCreate(cap)"] --> USE["Append / Read / Consume"]
        USE --> GROW{"Need more space?"}
        GROW -->|Compact| USE
        GROW -->|Realloc 2x| USE
        USE --> DESTROY["xBufferDestroy()"]
    end

    style CREATE fill:#4a90d9,color:#fff
    style DESTROY fill:#e74c3c,color:#fff

Implementation Details

Memory Layout

Single malloc() allocation:
┌──────────────────┬──────────────────────────────────────────┐
│  xBuffer_ header │  data[cap]  (flexible array member)      │
│  rpos, wpos, cap │                                          │
└──────────────────┴──────────────────────────────────────────┘
                    ↑          ↑                    ↑
                    data+rpos  data+wpos            data+cap
                    │←readable→│←────writable──────→│

Internal Structure

XDEF_STRUCT(xBuffer_) {
    size_t rpos;   // Read position (start of unread data)
    size_t wpos;   // Write position (end of unread data)
    size_t cap;    // Total data capacity
    char   data[]; // Flexible array member
};

Growth Strategy

flowchart TD
    APPEND["xBufferAppend(bufp, data, len)"]
    CHECK{"wpos + len <= cap?"}
    WRITE["memcpy at wpos, advance wpos"]
    COMPACT{"rpos > 0 AND<br/>unread + len <= cap?"}
    MEMMOVE["memmove data to front<br/>rpos=0, wpos=unread"]
    REALLOC["realloc(cap * 2)"]
    UPDATE["Update *bufp"]

    APPEND --> CHECK
    CHECK -->|Yes| WRITE
    CHECK -->|No| COMPACT
    COMPACT -->|Yes| MEMMOVE --> WRITE
    COMPACT -->|No| REALLOC --> UPDATE --> WRITE

    style WRITE fill:#50b86c,color:#fff
    style REALLOC fill:#f5a623,color:#fff

Operations and Complexity

Operation	Time Complexity	Notes
`xBufferAppend`	Amortized O(1) per byte	May trigger compact or realloc
`xBufferConsume`	O(1)	Advances read position
`xBufferCompact`	O(n)	`memmove` of unread data
`xBufferData`	O(1)	Returns `data + rpos`
`xBufferLen`	O(1)	Returns `wpos - rpos`
`xBufferReadFd`	O(1)	Single `read()` syscall
`xBufferWriteFd`	O(1)	Single `write()` syscall

API Reference

Lifecycle

Function	Signature	Description	Thread Safety
`xBufferCreate`	`xBuffer xBufferCreate(size_t initial_cap)`	Create a buffer. Min capacity is 64.	Not thread-safe
`xBufferDestroy`	`void xBufferDestroy(xBuffer buf)`	Free the buffer. NULL is a no-op.	Not thread-safe
`xBufferReset`	`void xBufferReset(xBuffer buf)`	Discard all data, keep memory.	Not thread-safe

Write

Function	Signature	Description	Thread Safety
`xBufferAppend`	`xErrno xBufferAppend(xBuffer bufp, const void data, size_t len)`	Append bytes, growing if needed.	Not thread-safe
`xBufferAppendStr`	`xErrno xBufferAppendStr(xBuffer bufp, const char str)`	Append a C string (excluding NUL).	Not thread-safe
`xBufferReserve`	`xErrno xBufferReserve(xBuffer *bufp, size_t additional)`	Ensure at least `additional` writable bytes.	Not thread-safe

Read

Function	Signature	Description	Thread Safety
`xBufferData`	`const void *xBufferData(xBuffer buf)`	Pointer to readable data. Valid until next mutation.	Not thread-safe
`xBufferLen`	`size_t xBufferLen(xBuffer buf)`	Number of readable bytes.	Not thread-safe
`xBufferCap`	`size_t xBufferCap(xBuffer buf)`	Total allocated capacity.	Not thread-safe
`xBufferWritable`	`size_t xBufferWritable(xBuffer buf)`	Writable bytes (`cap - wpos`).	Not thread-safe
`xBufferConsume`	`void xBufferConsume(xBuffer buf, size_t n)`	Advance read position by `n` bytes.	Not thread-safe
`xBufferCompact`	`void xBufferCompact(xBuffer buf)`	Move unread data to front, maximize writable space.	Not thread-safe

I/O Helpers

Function	Signature	Description	Thread Safety
`xBufferReadFd`	`ssize_t xBufferReadFd(xBuffer *bufp, int fd)`	Read from fd into buffer (ensures 4KB space).	Not thread-safe
`xBufferWriteFd`	`ssize_t xBufferWriteFd(xBuffer buf, int fd)`	Write readable data to fd, consume written bytes.	Not thread-safe

Usage Examples

Basic Append and Read

#include <stdio.h>
#include <xbuf/buf.h>

int main(void) {
    xBuffer buf = xBufferCreate(256);

    // Append data
    xBufferAppend(&buf, "Hello, ", 7);
    xBufferAppendStr(&buf, "World!");

    // Read data
    printf("Content: %.*s\n", (int)xBufferLen(buf),
           (const char *)xBufferData(buf));
    // Output: Content: Hello, World!

    // Consume partial data
    xBufferConsume(buf, 7);
    printf("After consume: %.*s\n", (int)xBufferLen(buf),
           (const char *)xBufferData(buf));
    // Output: After consume: World!

    // Compact to reclaim consumed space
    xBufferCompact(buf);

    xBufferDestroy(buf);
    return 0;
}

Network I/O

#include <xbuf/buf.h>
#include <unistd.h>

void handle_connection(int sockfd) {
    xBuffer buf = xBufferCreate(4096);

    // Read from socket
    ssize_t n = xBufferReadFd(&buf, sockfd);
    if (n > 0) {
        // Process data...
        // Write response back
        xBufferAppendStr(&buf, "HTTP/1.1 200 OK\r\n\r\n");
        xBufferWriteFd(buf, sockfd);
    }

    xBufferDestroy(buf);
}

Use Cases

HTTP Response Accumulation — Accumulate response body chunks of unknown total size. The auto-growing behavior handles variable-length responses.
Protocol Parsing — Append incoming data, parse complete messages from the front, consume parsed bytes. The compact operation reclaims space without reallocation.
Log Message Formatting — Build log messages incrementally with multiple append calls before flushing.

Best Practices

Always pass &buf to write APIs. Functions that may grow the buffer take xBuffer *bufp because realloc() may relocate the object.
Call xBufferCompact() periodically if you consume data incrementally. This avoids unnecessary reallocation by reclaiming consumed space.
Check return values. xBufferAppend() and xBufferReserve() return xErrno_NoMemory on allocation failure.
Don't cache xBufferData() pointers across mutating calls. Any append/reserve/compact may invalidate the pointer.

Comparison with Other Libraries

Feature	xbuf buf.h	Go `bytes.Buffer`	Rust `Vec<u8>`	C++ `std::vector<char>`
Layout	Header + data in one allocation (FAM)	Separate header + slice	Heap-allocated array	Heap-allocated array
Growth	2x realloc + compact	2x (with copy)	2x (with copy)	Implementation-defined
Read/Write cursors	Yes (rpos/wpos)	Yes (read offset)	No (manual tracking)	No (manual tracking)
Compact	Built-in (`xBufferCompact`)	Built-in (implicit)	Manual	Manual
I/O helpers	`ReadFd`/`WriteFd`	`ReadFrom`/`WriteTo`	Via `Read`/`Write` traits	No
Handle invalidation	Caller updates via `*bufp`	GC handles	Borrow checker	Iterator invalidation

Key Differentiator: xBuffer's single-allocation layout (flexible array member) eliminates one level of pointer indirection compared to typical buffer implementations. The compact-before-grow strategy minimizes reallocation frequency for append-consume workloads.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbuf/buf_bench.cpp

Benchmark	Chunk Size	Time (ns)	CPU (ns)	Throughput
`BM_Buffer_Append`	16	4,776	4,776	3.1 GiB/s
`BM_Buffer_Append`	64	4,400	4,400	13.5 GiB/s
`BM_Buffer_Append`	256	7,892	7,892	30.2 GiB/s
`BM_Buffer_Append`	1,024	21,834	21,811	43.7 GiB/s
`BM_Buffer_Append`	4,096	91,029	90,958	41.9 GiB/s
`BM_Buffer_AppendConsume`	64	4,999	4,999	11.9 GiB/s
`BM_Buffer_AppendConsume`	256	8,241	8,240	28.9 GiB/s
`BM_Buffer_AppendConsume`	1,024	22,859	22,859	41.7 GiB/s

Key Observations:

Append throughput peaks at ~44 GiB/s for 1KB chunks, limited by memcpy bandwidth and reallocation overhead.
AppendConsume (interleaved append + consume) achieves comparable throughput to pure append, validating the compact-before-grow strategy — consumed space is reclaimed without reallocation.
Small chunks (16B) show lower throughput due to per-call overhead dominating the memcpy cost.

ring.h — Fixed-Size Ring Buffer

Introduction

ring.h provides xRingBuffer, a fixed-capacity circular buffer that never reallocates. It is ideal for bounded producer-consumer scenarios where a fixed memory budget is required. The capacity is rounded up to the next power of two internally, enabling bitmask indexing instead of expensive modulo operations.

Design Philosophy

Fixed Capacity, Zero Reallocation — Once created, the ring buffer never grows. Writes that exceed capacity return xErrno_NoMemory. This makes memory usage predictable and avoids allocation latency spikes.
Power-of-Two Masking — The internal capacity is always a power of two. Index computation uses head & mask instead of head % cap, which is significantly faster on most architectures.
Monotonic Cursors — head (write) and tail (read) grow monotonically and never wrap. The actual array index is computed via bitmask. This simplifies the full/empty distinction: head - tail gives the exact readable byte count.
Single Allocation — Like xBuffer, the header and data area are allocated together using a flexible array member.
Scatter-Gather I/O — The ring buffer provides ReadIov/WriteIov helpers that fill iovec arrays for efficient readv()/writev() syscalls, handling the wrap-around transparently.

Architecture

graph LR
    PRODUCER["Producer"] -->|"xRingBufferWrite"| RB["xRingBuffer<br/>(fixed capacity)"]
    RB -->|"xRingBufferRead"| CONSUMER["Consumer"]

    RB -->|"xRingBufferReadIov"| IOV1["iovec[2]"] -->|"writev()"| FD1["fd"]
    FD2["fd"] -->|"readv()"| IOV2["iovec[2]"] -->|"xRingBufferWriteIov"| RB

    style RB fill:#f5a623,color:#fff

Implementation Details

Memory Layout

Single malloc() allocation:
┌───────────────────────┬──────────────────────────────────────┐
│  xRingBuffer_ header  │  data[cap]  (flexible array member)  │
│  cap, mask, head, tail│                                      │
└───────────────────────┴──────────────────────────────────────┘

Circular data layout (cap=8, mask=7):
         tail & mask          head & mask
              ↓                    ↓
  ┌───┬───┬───┬───┬───┬───┬───┬───┐
  │   │   │ R │ R │ R │ W │   │   │
  └───┴───┴───┴───┴───┴───┴───┴───┘
  0   1   2   3   4   5   6   7

  R = readable data (tail..head)
  W = next write position

Internal Structure

XDEF_STRUCT(xRingBuffer_) {
    size_t cap;   // Capacity (power of two)
    size_t mask;  // cap - 1 (for bitmask indexing)
    size_t head;  // Write cursor (monotonic)
    size_t tail;  // Read cursor (monotonic)
    char   data[];// Flexible array member
};

Power-of-Two Rounding

static size_t next_pow2(size_t v) {
    if (v < 16) v = 16;
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    // v |= v >> 32;  (on 64-bit)
    return v + 1;
}

This ensures cap is always a power of two, so mask = cap - 1 produces a valid bitmask. For example, cap = 8 → mask = 0b111.

Bitmask Indexing

Instead of:

size_t idx = head % cap;  // Expensive division

The ring buffer uses:

size_t idx = head & mask;  // Single AND instruction

This works because cap is a power of two: x % (2^n) == x & (2^n - 1).

Wrap-Around Write

flowchart TD
    WRITE["xRingBufferWrite(rb, data, len)"]
    CHECK{"len <= writable?"}
    FAIL["Return xErrno_NoMemory"]
    POS["pos = head & mask"]
    FIRST["first = cap - pos"]
    WRAP{"len <= first?"}
    SINGLE["memcpy(data+pos, src, len)"]
    SPLIT["memcpy(data+pos, src, first)<br/>memcpy(data, src+first, len-first)"]
    ADVANCE["head += len"]

    WRITE --> CHECK
    CHECK -->|No| FAIL
    CHECK -->|Yes| POS --> FIRST --> WRAP
    WRAP -->|Yes| SINGLE --> ADVANCE
    WRAP -->|No| SPLIT --> ADVANCE

    style FAIL fill:#e74c3c,color:#fff
    style ADVANCE fill:#50b86c,color:#fff

Operations and Complexity

Operation	Time Complexity	Notes
`xRingBufferWrite`	O(n)	Up to 2 `memcpy` calls
`xRingBufferRead`	O(n)	Up to 2 `memcpy` calls
`xRingBufferPeek`	O(n)	Like Read but doesn't advance tail
`xRingBufferDiscard`	O(1)	Just advances tail
`xRingBufferLen`	O(1)	`head - tail`
`xRingBufferReadFd`	O(1)	Single `readv()` syscall
`xRingBufferWriteFd`	O(1)	Single `writev()` syscall

API Reference

Lifecycle

Function	Signature	Description	Thread Safety
`xRingBufferCreate`	`xRingBuffer xRingBufferCreate(size_t min_cap)`	Create a ring buffer. Capacity rounded up to power of 2.	Not thread-safe
`xRingBufferDestroy`	`void xRingBufferDestroy(xRingBuffer rb)`	Free the ring buffer. NULL is a no-op.	Not thread-safe
`xRingBufferReset`	`void xRingBufferReset(xRingBuffer rb)`	Discard all data, keep memory.	Not thread-safe

Query

Function	Signature	Description	Thread Safety
`xRingBufferLen`	`size_t xRingBufferLen(xRingBuffer rb)`	Readable bytes.	Not thread-safe
`xRingBufferCap`	`size_t xRingBufferCap(xRingBuffer rb)`	Total capacity.	Not thread-safe
`xRingBufferWritable`	`size_t xRingBufferWritable(xRingBuffer rb)`	Writable bytes.	Not thread-safe
`xRingBufferEmpty`	`bool xRingBufferEmpty(xRingBuffer rb)`	True if no readable data.	Not thread-safe
`xRingBufferFull`	`bool xRingBufferFull(xRingBuffer rb)`	True if no writable space.	Not thread-safe

Write

Function	Signature	Description	Thread Safety
`xRingBufferWrite`	`xErrno xRingBufferWrite(xRingBuffer rb, const void *data, size_t len)`	Write bytes. Returns `xErrno_NoMemory` if full.	Not thread-safe

Read

Function	Signature	Description	Thread Safety
`xRingBufferRead`	`size_t xRingBufferRead(xRingBuffer rb, void *out, size_t len)`	Read and consume bytes. Returns actual count.	Not thread-safe
`xRingBufferPeek`	`size_t xRingBufferPeek(xRingBuffer rb, void *out, size_t len)`	Read without consuming.	Not thread-safe
`xRingBufferDiscard`	`size_t xRingBufferDiscard(xRingBuffer rb, size_t n)`	Discard bytes without copying.	Not thread-safe

I/O Helpers

Function	Signature	Description	Thread Safety
`xRingBufferReadIov`	`int xRingBufferReadIov(xRingBuffer rb, struct iovec iov[2])`	Fill iovecs with readable regions (for `writev`).	Not thread-safe
`xRingBufferWriteIov`	`int xRingBufferWriteIov(xRingBuffer rb, struct iovec iov[2])`	Fill iovecs with writable regions (for `readv`).	Not thread-safe
`xRingBufferReadFd`	`ssize_t xRingBufferReadFd(xRingBuffer rb, int fd)`	Read from fd using `readv()`.	Not thread-safe
`xRingBufferWriteFd`	`ssize_t xRingBufferWriteFd(xRingBuffer rb, int fd)`	Write to fd using `writev()`.	Not thread-safe

Usage Examples

Basic FIFO

#include <stdio.h>
#include <xbuf/ring.h>

int main(void) {
    // Request 1000 bytes; actual capacity will be 1024 (next power of 2)
    xRingBuffer rb = xRingBufferCreate(1000);
    printf("Capacity: %zu\n", xRingBufferCap(rb)); // 1024

    // Write data
    const char *msg = "Hello, Ring!";
    xRingBufferWrite(rb, msg, 12);

    // Read data
    char out[32];
    size_t n = xRingBufferRead(rb, out, sizeof(out));
    printf("Read %zu bytes: %.*s\n", n, (int)n, out);

    xRingBufferDestroy(rb);
    return 0;
}

Network Socket Buffer

#include <xbuf/ring.h>

void event_loop_handler(int sockfd) {
    xRingBuffer rb = xRingBufferCreate(65536); // 64KB ring

    // Read from socket into ring buffer
    ssize_t n = xRingBufferReadFd(rb, sockfd);
    if (n > 0) {
        // Process data...
        // Write processed data back
        xRingBufferWriteFd(rb, sockfd);
    }

    xRingBufferDestroy(rb);
}

Use Cases

Fixed-Budget Network Buffers — When you need predictable memory usage per connection (e.g., 64KB per socket), the ring buffer provides a hard capacity limit.
Logging Ring Buffer — Capture the last N bytes of log output, automatically discarding old data when the buffer wraps.
Inter-Thread Communication — With external synchronization, a ring buffer can serve as a bounded channel between producer and consumer threads.

Best Practices

Choose capacity carefully. The ring buffer never grows. If you write more than the capacity, the write fails. Size it for your worst-case scenario.
Use scatter-gather I/O. xRingBufferReadFd/WriteFd use readv()/writev() to handle wrap-around in a single syscall, avoiding the need to linearize data.
Be aware of power-of-two rounding. Requesting 1000 bytes gives you 1024. Requesting 1025 gives you 2048. Plan accordingly.
Check xRingBufferWritable() before writing if you want to handle partial writes gracefully.

Comparison with Other Libraries

Feature	xbuf ring.h	Linux `kfifo`	Boost `circular_buffer`	DPDK `rte_ring`
Capacity	Fixed, power-of-2	Fixed, power-of-2	Fixed, any size	Fixed, power-of-2
Indexing	Bitmask	Bitmask	Modulo	Bitmask
Layout	FAM (single alloc)	Separate alloc	Heap array	Huge pages
Thread Safety	Not thread-safe	Single-producer/single-consumer	Not thread-safe	Multi-producer/multi-consumer
I/O Helpers	`readv`/`writev`	`kfifo_to_user`/`kfifo_from_user`	No	No (packet-oriented)
Language	C99	C (kernel)	C++	C

Key Differentiator: xbuf's ring buffer combines the power-of-two bitmask optimization (like kfifo) with scatter-gather I/O helpers (readv/writev) in a single-allocation design. It's purpose-built for event-driven network programming where fixed memory budgets and efficient syscalls are essential.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbuf/ring_bench.cpp

Benchmark	Size	Time (ns)	CPU (ns)	Throughput
`BM_Ring_WriteRead`	64	6.05	6.05	19.7 GiB/s
`BM_Ring_WriteRead`	256	16.8	16.8	28.4 GiB/s
`BM_Ring_WriteRead`	1,024	27.4	27.4	69.6 GiB/s
`BM_Ring_WriteRead`	4,096	99.2	99.2	76.9 GiB/s
`BM_Ring_Throughput`	4,096	225	225	17.0 GiB/s
`BM_Ring_Throughput`	16,384	806	806	18.9 GiB/s
`BM_Ring_Throughput`	65,536	3,198	3,198	19.1 GiB/s

Key Observations:

WriteRead (single write + read cycle) achieves up to ~77 GiB/s at 4KB chunks, demonstrating the efficiency of the bitmask-based wrap-around and memcpy for larger transfers.
Throughput (sustained writes until full) stabilizes at ~19 GiB/s regardless of capacity, showing consistent performance as the ring scales.
The ring buffer's zero-overhead indexing (bitmask instead of modulo) keeps per-operation cost extremely low — just 6ns for a 64-byte write+read cycle.

io.h — Reference-Counted Block-Chain I/O Buffer

Introduction

io.h provides xIOBuffer, a non-contiguous byte buffer composed of a chain of reference-counted memory blocks. It supports zero-copy split, append, and scatter-gather I/O (readv/writev). Inspired by brpc's IOBuf, it is designed for high-throughput network I/O where avoiding memory copies is critical.

Design Philosophy

Block-Chain Architecture — Data is stored across multiple fixed-size blocks (default 8KB each), linked through a reference array. This avoids large contiguous allocations and enables zero-copy operations.
Reference Counting — Each xIOBlock is reference-counted. Multiple xIOBuffer instances can share the same block (e.g., after a Cut operation). Blocks are freed (returned to pool) when the last reference is released.
Zero-Copy Operations — xIOBufferAppendIOBuffer() transfers block references without copying data. xIOBufferCut() splits a buffer by adjusting offsets and sharing blocks at the boundary.
Lock-Free Block Pool — Released blocks are returned to a global Treiber stack (lock-free) for reuse, avoiding malloc/free overhead in steady state.
Inline Ref Array — Small buffers (≤ 8 refs) use an inline array, avoiding heap allocation for the ref array itself. Larger buffers transition to a heap-allocated array.

Architecture

graph TD
    subgraph "xIOBuffer API"
        APPEND["Append / AppendStr"]
        APPEND_IO["AppendIOBuffer<br/>(zero-copy)"]
        READ["Read / CopyTo"]
        CUT["Cut<br/>(zero-copy split)"]
        CONSUME["Consume"]
        IO_READ["ReadFd"]
        IO_WRITE["WriteFd<br/>(writev)"]
    end

    subgraph "Block Management"
        ACQUIRE["xIOBlockAcquire"]
        RETAIN["xIOBlockRetain"]
        RELEASE["xIOBlockRelease"]
    end

    subgraph "Block Pool (Treiber Stack)"
        POOL["g_pool_head"]
        WARMUP["xIOBlockPoolWarmup"]
        DRAIN["xIOBlockPoolDrain"]
    end

    APPEND --> ACQUIRE
    IO_READ --> ACQUIRE
    CUT --> RETAIN
    CONSUME --> RELEASE
    READ --> RELEASE
    ACQUIRE --> POOL
    RELEASE --> POOL
    WARMUP --> POOL
    DRAIN --> POOL

    style POOL fill:#f5a623,color:#fff

Implementation Details

Block Structure

XDEF_STRUCT(xIOBlock) {
    size_t refs;                       // Reference count (atomic)
    size_t size;                       // Usable data size
    char   data[XIOBUFFER_BLOCK_SIZE]; // 8KB inline data
};

Reference Structure

XDEF_STRUCT(xIOBufferRef) {
    xIOBlock *block;   // Pointer to the underlying block
    size_t    offset;  // Start offset within block->data
    size_t    length;  // Number of valid bytes from offset
};

IOBuffer Structure

XDEF_STRUCT(xIOBuffer) {
    xIOBufferRef  inlined[XIOBUFFER_INLINE_REFS]; // Inline ref storage (8)
    xIOBufferRef *refs;    // Pointer to ref array (inlined or heap)
    size_t        nrefs;   // Number of active refs
    size_t        cap;     // Capacity of refs array
    size_t        nbytes;  // Total logical byte count (cached)
};

Block-Chain Architecture

graph TD
    subgraph "xIOBuffer"
        REF1["Ref 0<br/>block=A, off=0, len=8192"]
        REF2["Ref 1<br/>block=B, off=0, len=8192"]
        REF3["Ref 2<br/>block=C, off=0, len=3000"]
    end

    subgraph "Shared Blocks"
        A["xIOBlock A<br/>refs=1, 8KB"]
        B["xIOBlock B<br/>refs=2, 8KB"]
        C["xIOBlock C<br/>refs=1, 8KB"]
    end

    REF1 --> A
    REF2 --> B
    REF3 --> C

    subgraph "Another xIOBuffer (after Cut)"
        REF4["Ref 0<br/>block=B, off=4096, len=4096"]
    end

    REF4 --> B

    style A fill:#4a90d9,color:#fff
    style B fill:#f5a623,color:#fff
    style C fill:#50b86c,color:#fff

Treiber Stack Block Pool

The global block pool uses a lock-free Treiber stack:

// Pool node overlays xIOBlock memory
XDEF_STRUCT(PoolNode_) {
    PoolNode_ *next;
};

static PoolNode_ *volatile g_pool_head = NULL;

Push (return to pool):

do {
    head = atomic_load(g_pool_head)
    node->next = head
} while (!CAS(g_pool_head, head, node))

Pop (acquire from pool):

do {
    head = atomic_load(g_pool_head)
    if (!head) return malloc(new block)
    next = head->next
} while (!CAS(g_pool_head, head, next))
return head

Zero-Copy Cut

xIOBufferCut(io, dst, n) moves the first n bytes from io to dst:

Fully consumed refs — Ownership transfers directly (no refcount change).
Boundary ref — The block is shared: xIOBlockRetain() increments the refcount, and both buffers hold a ref with different offset/length.

flowchart TD
    CUT["xIOBufferCut(io, dst, n)"]
    LOOP{"More bytes to cut?"}
    FULL{"ref.length <= remaining?"}
    TRANSFER["Transfer entire ref to dst<br/>(no refcount change)"]
    SPLIT["Share block: Retain + split ref<br/>dst gets [offset, chunk]<br/>io keeps [offset+chunk, rest]"]
    SHIFT["Shift consumed refs out of io"]
    DONE["Update nbytes for both"]

    CUT --> LOOP
    LOOP -->|Yes| FULL
    FULL -->|Yes| TRANSFER --> LOOP
    FULL -->|No| SPLIT --> SHIFT --> DONE
    LOOP -->|No| SHIFT

    style TRANSFER fill:#50b86c,color:#fff
    style SPLIT fill:#f5a623,color:#fff

Append Strategy

xIOBufferAppend(io, data, len):

First tries to fill the tail block's remaining space (avoids allocating a new block for small appends).
Allocates new blocks for remaining data, each up to XIOBUFFER_BLOCK_SIZE bytes.

API Reference

Configuration

Macro	Default	Description
`XIOBUFFER_BLOCK_SIZE`	8192	Block data size in bytes
`XIOBUFFER_INLINE_REFS`	8	Inline ref array capacity

Block API

Function	Signature	Description	Thread Safety
`xIOBlockAcquire`	`xIOBlock *xIOBlockAcquire(void)`	Get a block from pool (or malloc). refs=1.	Thread-safe (lock-free pool)
`xIOBlockRetain`	`void xIOBlockRetain(xIOBlock *blk)`	Increment refcount.	Thread-safe (atomic)
`xIOBlockRelease`	`void xIOBlockRelease(xIOBlock *blk)`	Decrement refcount; return to pool at 0.	Thread-safe (atomic + lock-free pool)
`xIOBlockPoolWarmup`	`xErrno xIOBlockPoolWarmup(size_t n)`	Pre-allocate `n` blocks into pool.	Thread-safe
`xIOBlockPoolDrain`	`void xIOBlockPoolDrain(void)`	Free all pooled blocks. Call at shutdown.	Not thread-safe (no concurrent use)

IOBuffer Lifecycle

Function	Signature	Description	Thread Safety
`xIOBufferInit`	`void xIOBufferInit(xIOBuffer *io)`	Initialize an empty IOBuffer.	Not thread-safe
`xIOBufferDeinit`	`void xIOBufferDeinit(xIOBuffer *io)`	Release all refs and free ref array.	Not thread-safe
`xIOBufferReset`	`void xIOBufferReset(xIOBuffer *io)`	Release all refs, keep ref array.	Not thread-safe

IOBuffer Query

Function	Signature	Description	Thread Safety
`xIOBufferLen`	`size_t xIOBufferLen(const xIOBuffer *io)`	Total readable bytes.	Not thread-safe
`xIOBufferEmpty`	`bool xIOBufferEmpty(const xIOBuffer *io)`	True if no data.	Not thread-safe
`xIOBufferRefCount`	`size_t xIOBufferRefCount(const xIOBuffer *io)`	Number of block refs.	Not thread-safe

IOBuffer Write

Function	Signature	Description	Thread Safety
`xIOBufferAppend`	`xErrno xIOBufferAppend(xIOBuffer io, const void data, size_t len)`	Append bytes (allocates blocks as needed).	Not thread-safe
`xIOBufferAppendStr`	`xErrno xIOBufferAppendStr(xIOBuffer io, const char str)`	Append C string.	Not thread-safe
`xIOBufferAppendIOBuffer`	`xErrno xIOBufferAppendIOBuffer(xIOBuffer io, xIOBuffer other)`	Zero-copy: move all refs from `other`.	Not thread-safe

IOBuffer Read

Function	Signature	Description	Thread Safety
`xIOBufferRead`	`size_t xIOBufferRead(xIOBuffer io, void out, size_t len)`	Copy and consume bytes.	Not thread-safe
`xIOBufferCut`	`size_t xIOBufferCut(xIOBuffer io, xIOBuffer dst, size_t n)`	Zero-copy split: move first `n` bytes to `dst`.	Not thread-safe
`xIOBufferConsume`	`size_t xIOBufferConsume(xIOBuffer *io, size_t n)`	Discard first `n` bytes.	Not thread-safe
`xIOBufferCopyTo`	`size_t xIOBufferCopyTo(const xIOBuffer io, void out)`	Linearize: copy all data to contiguous buffer.	Not thread-safe

IOBuffer I/O

Function	Signature	Description	Thread Safety
`xIOBufferReadIov`	`int xIOBufferReadIov(const xIOBuffer io, struct iovec iov, int max_iov)`	Fill iovecs for `writev()`.	Not thread-safe
`xIOBufferReadFd`	`ssize_t xIOBufferReadFd(xIOBuffer *io, int fd)`	Read from fd into IOBuffer.	Not thread-safe
`xIOBufferWriteFd`	`ssize_t xIOBufferWriteFd(xIOBuffer *io, int fd)`	Write to fd using `writev()`.	Not thread-safe

Usage Examples

Basic Usage

#include <stdio.h>
#include <xbuf/io.h>

int main(void) {
    xIOBuffer io;
    xIOBufferInit(&io);

    // Append data (may span multiple blocks)
    xIOBufferAppend(&io, "Hello, ", 7);
    xIOBufferAppend(&io, "IOBuffer!", 9);

    printf("Length: %zu, Refs: %zu\n",
           xIOBufferLen(&io), xIOBufferRefCount(&io));

    // Linearize for processing
    char buf[64];
    xIOBufferCopyTo(&io, buf);
    printf("Content: %.*s\n", (int)xIOBufferLen(&io), buf);

    xIOBufferDeinit(&io);
    return 0;
}

Zero-Copy Split (Protocol Parsing)

#include <xbuf/io.h>

void parse_protocol(xIOBuffer *io) {
    // Cut the 4-byte header from the front
    xIOBuffer header;
    xIOBufferInit(&header);

    size_t cut = xIOBufferCut(io, &header, 4);
    if (cut == 4) {
        char hdr[4];
        xIOBufferRead(&header, hdr, 4);
        // Parse header...
        // io now contains only the body (zero-copy!)
    }

    xIOBufferDeinit(&header);
}

High-Throughput Network I/O

#include <xbuf/io.h>

void handle_data(int sockfd) {
    // Pre-warm the block pool at startup
    xIOBlockPoolWarmup(64);

    xIOBuffer io;
    xIOBufferInit(&io);

    // Read from socket (allocates blocks from pool)
    ssize_t n = xIOBufferReadFd(&io, sockfd);
    if (n > 0) {
        // Write back using scatter-gather I/O
        xIOBufferWriteFd(&io, sockfd);
    }

    xIOBufferDeinit(&io);

    // At shutdown
    xIOBlockPoolDrain();
}

Use Cases

HTTP Response Body — The xhttp module uses xIOBuffer to accumulate response chunks from libcurl without copying between buffers.
Protocol Framing — Use xIOBufferCut() to split headers from body in a zero-copy fashion, then process each part independently.
Data Pipeline — Chain multiple processing stages that each append to or cut from xIOBuffer instances, sharing blocks to minimize copies.

Best Practices

Call xIOBlockPoolWarmup() at startup to pre-allocate blocks and avoid allocation spikes during initial traffic.
Call xIOBlockPoolDrain() at shutdown for clean valgrind reports.
Use xIOBufferAppendIOBuffer() instead of copying when combining buffers. It transfers ownership without data copies.
Use xIOBufferCut() for protocol parsing. It's more efficient than xIOBufferRead() when you need to pass the cut data to another component.
Monitor xIOBufferRefCount() to understand memory fragmentation. Many small refs may indicate suboptimal block utilization.

Comparison with Other Libraries

Feature	xbuf io.h	brpc `IOBuf`	Netty `ByteBuf`	Go `bytes.Buffer`
Architecture	Block-chain (ref array)	Block-chain (linked list)	Composite buffer	Contiguous slice
Block Size	8KB (configurable)	8KB	Configurable	N/A
Reference Counting	Atomic (per block)	Atomic (per block)	Atomic (per buffer)	GC
Zero-Copy Split	`xIOBufferCut`	`cutn`	`slice`	No
Zero-Copy Append	`xIOBufferAppendIOBuffer`	`append(IOBuf)`	`addComponent`	No
Block Pool	Treiber stack (lock-free)	Thread-local + global	Arena allocator	N/A
Scatter-Gather I/O	`writev` via `ReadIov`	`writev` via `pappend`	`nioBuffers`	No
Inline Optimization	8 inline refs	No	No	N/A
Language	C99	C++	Java	Go

Key Differentiator: xbuf's xIOBuffer combines brpc-style block-chain architecture with a lock-free Treiber stack block pool and inline ref optimization. The zero-copy Cut and AppendIOBuffer operations make it ideal for protocol parsing and data pipeline scenarios in C.

Benchmark

Environment: Apple M3 Pro, 36 GB RAM, macOS 26.4, Release build (-O2). Source: xbuf/io_bench.cpp

Benchmark	Size	Time (ns)	CPU (ns)	Throughput
`BM_IOBuffer_Append`	64	3,720	3,720	16.0 GiB/s
`BM_IOBuffer_Append`	256	7,569	7,568	31.5 GiB/s
`BM_IOBuffer_Append`	1,024	22,341	22,340	42.7 GiB/s
`BM_IOBuffer_Append`	4,096	79,796	79,794	47.8 GiB/s
`BM_IOBuffer_Append`	8,192	187,167	187,165	40.8 GiB/s
`BM_IOBuffer_AppendConsume`	64	5,230	5,230	11.4 GiB/s
`BM_IOBuffer_AppendConsume`	256	8,232	8,232	29.0 GiB/s
`BM_IOBuffer_AppendConsume`	1,024	23,040	23,040	41.4 GiB/s
`BM_IOBuffer_Cut`	8,192	167	167	45.6 GiB/s
`BM_IOBuffer_Cut`	65,536	1,651	1,651	37.0 GiB/s
`BM_IOBuffer_Cut`	262,144	8,122	8,122	30.1 GiB/s
`BM_IOBuffer_AppendIOBuffer`	1,024	3,196	3,196	29.8 GiB/s
`BM_IOBuffer_AppendIOBuffer`	4,096	9,307	9,307	41.0 GiB/s
`BM_IOBuffer_AppendIOBuffer`	8,192	17,604	17,602	43.3 GiB/s
`BM_IOBuffer_BlockPool`	—	8.91	8.89	—

Key Observations:

Append peaks at ~48 GiB/s for 4KB chunks. The slight drop at 8KB reflects block boundary crossing overhead.
Cut (zero-copy split) is extremely fast — 167ns for 8KB — because it only manipulates reference metadata, not data. This validates the block-chain architecture for protocol parsing.
AppendIOBuffer (zero-copy concatenation) achieves ~43 GiB/s, confirming that block ownership transfer avoids data copies.
BlockPool acquire/release cycle takes ~9ns, showing the lock-free Treiber stack's efficiency for block recycling.

xnet — Networking Primitives

Introduction

xnet is xKit's networking utility module, providing three foundational components for network programming: a lightweight URL parser, an asynchronous DNS resolver, and shared TLS configuration types. These building blocks are used internally by higher-level modules like xhttp, and are also available for direct use in application code.

Design Philosophy

Zero-Copy URL Parsing — xUrlParse() makes a single internal copy of the input string. All component fields (scheme, host, port, etc.) are pointer+length pairs referencing this copy, avoiding per-field allocations.
Async DNS via Thread-Pool Offload — DNS resolution uses getaddrinfo() offloaded to the event loop's thread pool. The callback is always invoked on the event loop thread, keeping the async programming model consistent with the rest of xKit.
Shared TLS Types — xTlsConf is a plain data structure shared across modules. It decouples TLS configuration from any specific TLS backend (OpenSSL, mbedTLS).
Async TCP with Transport Abstraction — xTcpConnect chains DNS → connect → optional TLS handshake into a single async operation. xTcpConn wraps an xSocket + xTransport vtable, providing Recv/Send/SendIov helpers that work transparently over plain TCP or TLS.

Architecture

graph TD
    subgraph "xnet Module"
        URL["xUrl<br/>URL Parser<br/>url.h"]
        DNS["xDnsResolve<br/>Async DNS<br/>dns.h"]
        TLS["xTlsConf<br/>TLS Config Types<br/>tls.h"]
        TCP["xTcpConn / xTcpConnect / xTcpListener<br/>Async TCP<br/>tcp.h"]
    end

    subgraph "xbase Infrastructure"
        EV["xEventLoop<br/>event.h"]
        POOL["Thread Pool<br/>xEventLoopSubmit()"]
        ATOMIC["Atomic Ops<br/>atomic.h"]
    end

    subgraph "Consumers"
        HTTP_C["xhttp Client"]
        HTTP_S["xhttp Server"]
        WS["WebSocket"]
    end

    DNS --> EV
    DNS --> POOL
    DNS --> ATOMIC
    TCP --> EV
    TCP --> DNS
    TCP --> TLS

    HTTP_C --> URL
    HTTP_C --> TCP
    HTTP_S --> TCP
    WS --> URL
    WS --> TCP

    style URL fill:#4a90d9,color:#fff
    style DNS fill:#50b86c,color:#fff
    style TLS fill:#f5a623,color:#fff
    style TCP fill:#e74c3c,color:#fff

Sub-Module Overview

Header	Component	Description	Doc
`url.h`	`xUrl`	Lightweight URL parser	url.md
`dns.h`	`xDnsResolve`	Async DNS resolution	dns.md
`tls.h`	`xTlsConf`	Shared TLS config types	tls.md
`tcp.h`	`xTcpConn` / `xTcpConnect` / `xTcpListener`	Async TCP connection, connector & listener	tcp.md

Quick Start

#include <stdio.h>
#include <xbase/event.h>
#include <xnet/url.h>
#include <xnet/dns.h>
#include <xnet/tls.h>

// 1. Parse a URL
static void url_example(void) {
    xUrl url;
    xErrno err = xUrlParse(
        "wss://example.com:8443/ws?token=abc", &url);
    if (err == xErrno_Ok) {
        printf("scheme: %.*s\n",
               (int)url.scheme_len, url.scheme);
        printf("host:   %.*s\n",
               (int)url.host_len, url.host);
        printf("port:   %u\n", xUrlPort(&url));
        printf("path:   %.*s\n",
               (int)url.path_len, url.path);
        xUrlFree(&url);
    }
}

// 2. Async DNS resolution
static void on_resolved(xDnsResult *result, void *arg) {
    (void)arg;
    if (result->error == xErrno_Ok) {
        int count = 0;
        for (xDnsAddr *a = result->addrs; a; a = a->next)
            count++;
        printf("Resolved %d address(es)\n", count);
    }
    xDnsResultFree(result);
    // stop the loop after resolution
}

static void dns_example(xEventLoop loop) {
    xDnsResolve(loop, "example.com", "443",
                NULL, on_resolved, NULL);
}

// 3. TLS configuration
static void tls_example(void) {
    xTlsConf client_tls = {0};
    client_tls.ca = "ca.pem";

    xTlsConf server_tls = {
        .cert = "server.pem",
        .key  = "server-key.pem",
    };
    (void)client_tls;
    (void)server_tls;
}

Relationship with Other Modules

xbase — The DNS resolver depends on xEventLoop for thread-pool offload and uses atomic.h for the cancellation flag.
xhttp — The HTTP client uses xUrl for URL parsing, xDnsResolve for hostname resolution, and xTlsConf for TLS configuration. The WebSocket client supports both xTlsConf and a shared xTlsCtx for wss:// connections. See the TLS Deployment Guide for end-to-end examples.
WebSocket — The WebSocket client uses xUrl to parse ws:// and wss:// URLs, and optionally accepts a shared xTlsCtx to avoid per-connection TLS context creation.

url.h — Lightweight URL Parser

Introduction

url.h provides xUrl, a lightweight URL parser that decomposes a URL string into its RFC 3986 components: scheme, userinfo, host, port, path, query, and fragment. The parser makes a single internal copy of the input; all component fields are pointer+length pairs referencing this copy, so the caller may discard the original string immediately after parsing.

Design Philosophy

Single Copy, Zero Per-Field Allocation — xUrlParse() calls strdup() once. All output fields point into this copy, avoiding per-component heap allocations.
Pointer+Length Pairs — Fields use const char * + size_t pairs rather than NUL-terminated strings. This avoids mutating the internal copy and supports efficient substring access.
Scheme-Aware Default Ports — xUrlPort() returns well-known default ports (80 for http/ws, 443 for https/wss) when no explicit port is present, simplifying connection logic.
IPv6 Literal Support — The parser correctly handles bracketed IPv6 addresses ([::1]:8080), extracting the bare address without brackets.

Architecture

flowchart LR
    INPUT["Raw URL string"]
    PARSE["xUrlParse()"]
    COPY["strdup() internal copy"]
    FIELDS["Pointer+Length fields"]
    PORT["xUrlPort()"]
    FREE["xUrlFree()"]

    INPUT --> PARSE
    PARSE --> COPY
    COPY --> FIELDS
    FIELDS --> PORT
    FIELDS --> FREE

    style PARSE fill:#4a90d9,color:#fff
    style FREE fill:#e74c3c,color:#fff

Implementation Details

URL Format

scheme://[userinfo@]host[:port][/path][?query][#fragment]

Parsing Steps

flowchart TD
    START["Input: raw URL string"]
    SCHEME["Find '://' → extract scheme"]
    AUTH["Parse authority section"]
    USERINFO{"Contains '@'?"}
    UI_YES["Extract userinfo"]
    HOST{"Starts with '['?"}
    IPV6["Parse IPv6 bracket literal"]
    IPV4["Scan backwards for ':'"]
    PORT["Extract port (if present)"]
    PATH{"Starts with '/'?"}
    PATH_YES["Extract path"]
    QUERY{"Starts with '?'?"}
    QUERY_YES["Extract query"]
    FRAG{"Starts with '#'?"}
    FRAG_YES["Extract fragment"]
    DONE["Return xErrno_Ok"]

    START --> SCHEME --> AUTH
    AUTH --> USERINFO
    USERINFO -->|Yes| UI_YES --> HOST
    USERINFO -->|No| HOST
    HOST -->|Yes| IPV6 --> PORT
    HOST -->|No| IPV4 --> PORT
    PORT --> PATH
    PATH -->|Yes| PATH_YES --> QUERY
    PATH -->|No| QUERY
    QUERY -->|Yes| QUERY_YES --> FRAG
    QUERY -->|No| FRAG
    FRAG -->|Yes| FRAG_YES --> DONE
    FRAG -->|No| DONE

    style DONE fill:#50b86c,color:#fff

Memory Layout

xUrl struct (stack or heap):
┌──────────┬──────────────────────────────────┐
│  raw_    │→ strdup("https://host:443/path") │
│  scheme  │→ ───────┘                        │
│  host    │→ ──────────────┘                 │
│  port    │→ ───────────────────┘            │
│  path    │→ ────────────────────────┘       │
│  ...     │                                  │
└──────────┴──────────────────────────────────┘
All pointers reference the single raw_ copy.

Operations and Complexity

Operation	Complexity	Notes
`xUrlParse`	O(n)	Single pass over the URL string
`xUrlPort`	O(1)	Converts port string or returns default
`xUrlFree`	O(1)	Frees the internal copy, zeroes struct

API Reference

Lifecycle

Function	Signature	Description
`xUrlParse`	`xErrno xUrlParse(const char raw, xUrl url)`	Parse a URL into components
`xUrlFree`	`void xUrlFree(xUrl *url)`	Free internal copy, zero all fields

Query

Function	Signature	Description
`xUrlPort`	`uint16_t xUrlPort(const xUrl *url)`	Numeric port (explicit or default by scheme)

xUrl Fields

Field	Type	Description
`scheme` / `scheme_len`	`const char *` / `size_t`	e.g. `"https"`
`userinfo` / `userinfo_len`	`const char *` / `size_t`	e.g. `"user:pass"` (optional)
`host` / `host_len`	`const char *` / `size_t`	e.g. `"example.com"` or `"::1"`
`port` / `port_len`	`const char *` / `size_t`	e.g. `"8443"` (optional)
`path` / `path_len`	`const char *` / `size_t`	e.g. `"/ws/chat"` (optional)
`query` / `query_len`	`const char *` / `size_t`	e.g. `"key=val"` (optional)
`fragment` / `fragment_len`	`const char *` / `size_t`	e.g. `"section1"` (optional)

Note: Optional fields have ptr=NULL, len=0 when absent. The raw_ field is internal — do not access it.

Usage Examples

Basic URL Parsing

#include <stdio.h>
#include <xnet/url.h>

int main(void) {
    xUrl url;
    xErrno err = xUrlParse("https://user:[email protected]:8443/ws/chat?token=abc#top", &url);
    if (err != xErrno_Ok) {
        fprintf(stderr, "parse failed\n");
        return 1;
    }

    printf("scheme:   %.*s\n", (int)url.scheme_len, url.scheme);
    printf("userinfo: %.*s\n", (int)url.userinfo_len, url.userinfo);
    printf("host:     %.*s\n", (int)url.host_len, url.host);
    printf("port:     %.*s (numeric: %u)\n", (int)url.port_len, url.port, xUrlPort(&url));
    printf("path:     %.*s\n", (int)url.path_len, url.path);
    printf("query:    %.*s\n", (int)url.query_len, url.query);
    printf("fragment: %.*s\n", (int)url.fragment_len, url.fragment);

    xUrlFree(&url);
    return 0;
}

Output:

scheme:   https
userinfo: user:pass
host:     example.com
port:     8443 (numeric: 8443)
path:     /ws/chat
query:    token=abc
fragment: top

IPv6 Address

xUrl url;
xUrlParse("http://[::1]:8080/test", &url);

printf("host: %.*s\n", (int)url.host_len, url.host);
// Output: host: ::1  (brackets stripped)

printf("port: %u\n", xUrlPort(&url));
// Output: port: 8080

xUrlFree(&url);

Default Port by Scheme

xUrl url;
xUrlParse("wss://echo.example.com/sock", &url);

// No explicit port in URL
printf("port field: %s\n", url.port ? "present" : "absent");
// Output: port field: absent

// xUrlPort() returns 443 for wss://
printf("effective port: %u\n", xUrlPort(&url));
// Output: effective port: 443

xUrlFree(&url);

Ownership Semantics

// xUrl owns its data — the original string can be freed
char *heap = strdup("ws://example.com:9090/ws");
xUrl url;
xUrlParse(heap, &url);
free(heap);  // safe: xUrl has its own copy

// url fields are still valid here
printf("host: %.*s\n", (int)url.host_len, url.host);

xUrlFree(&url);
// After free, all fields are zeroed (NULL)

Error Handling

Input	Result
`NULL` raw or url pointer	`xErrno_InvalidArg`
Missing `://` separator	`xErrno_InvalidArg`
Empty host (e.g. `http:///path`)	`xErrno_InvalidArg`
Unclosed IPv6 bracket	`xErrno_InvalidArg`
`malloc` failure	`xErrno_NoMemory`

On error, the xUrl struct is zeroed — no cleanup needed.

Best Practices

Always check the return value of xUrlParse(). On error the struct is zeroed, so accessing fields is safe but yields empty values.
Use xUrlPort() instead of parsing the port string yourself. It handles default ports and validates the numeric range (0–65535).
Call xUrlFree() when done. Forgetting to free leaks the internal string copy.
Don't cache field pointers past xUrlFree(). All pointers become invalid after the free call.

dns.h — Asynchronous DNS Resolution

Introduction

dns.h provides asynchronous DNS resolution by offloading getaddrinfo() to the event loop's thread pool. The completion callback is always invoked on the event loop thread, maintaining xKit's single-threaded callback model. Queries can be cancelled before the callback fires.

Design Philosophy

Thread-Pool Offload — getaddrinfo() is a blocking POSIX call. Rather than introducing a dedicated DNS thread, xnet reuses the event loop's existing thread pool via xEventLoopSubmit().
Event-Loop-Thread Callbacks — The done callback runs on the event loop thread, so user code never needs synchronization. This is consistent with every other callback in xKit.
Linked-List Result — Resolved addresses are returned as a linked list of xDnsAddr nodes, preserving the full getaddrinfo() result (family, socktype, protocol) for each address.
Cancellation Support — xDnsCancel() sets an atomic flag. If the worker has already finished, the done callback silently discards the result instead of invoking the user callback.
IP Literal Fast Path — If the hostname is an IPv4 or IPv6 literal, AI_NUMERICHOST is set automatically, skipping the actual DNS lookup.

Architecture

sequenceDiagram
    participant App as Application
    participant EL as Event Loop Thread
    participant TP as Thread Pool Worker

    App->>EL: xDnsResolve(loop, "example.com", ...)
    EL->>TP: xEventLoopSubmit(dns_work_fn)
    Note over TP: getaddrinfo() (blocking)
    TP-->>EL: dns_done_fn(result)
    alt Not cancelled
        EL->>App: callback(result, arg)
    else Cancelled
        EL->>EL: xDnsResultFree(result)
    end

Implementation Details

Internal Request Lifecycle

stateDiagram-v2
    [*] --> Created: xDnsResolve()
    Created --> Queued: xEventLoopSubmit()
    Queued --> Working: Thread pool picks up
    Working --> Done: getaddrinfo() returns
    Done --> Delivered: callback invoked
    Done --> Discarded: cancelled flag set

    Queued --> Cancelled: xDnsCancel()
    Working --> Cancelled: xDnsCancel()
    Cancelled --> Discarded: done_fn checks flag

    Delivered --> [*]: request freed
    Discarded --> [*]: request freed

Error Mapping

getaddrinfo() returns EAI_* codes. These are mapped to xKit error codes:

EAI Code	xErrno	Meaning
`0` (success)	`xErrno_Ok`	Resolution succeeded
`EAI_NONAME`	`xErrno_DnsNotFound`	Host not found
`EAI_AGAIN`	`xErrno_DnsTempFail`	Temporary failure
`EAI_MEMORY`	`xErrno_NoMemory`	Out of memory
Other	`xErrno_DnsError`	Generic DNS error

IP Literal Detection

Before calling getaddrinfo(), the worker checks if the hostname is an IP literal using inet_pton(). If it is, AI_NUMERICHOST is added to the hints, which tells getaddrinfo() to skip DNS lookup entirely.

// Pseudocode
if (inet_pton(AF_INET, hostname, buf) == 1 ||
    inet_pton(AF_INET6, hostname, buf) == 1) {
    hints.ai_flags |= AI_NUMERICHOST;
}

API Reference

Core Functions

Function	Signature	Description
`xDnsResolve`	`xDnsQuery xDnsResolve(xEventLoop loop, const char hostname, const char service, const struct addrinfo hints, xDnsCallback callback, void arg)`	Start async DNS resolution
`xDnsCancel`	`void xDnsCancel(xEventLoop loop, xDnsQuery query)`	Cancel a pending query
`xDnsResultFree`	`void xDnsResultFree(xDnsResult *result)`	Free a resolution result

Types

Type	Description
`xDnsQuery`	Opaque handle to a pending query
`xDnsResult`	Resolution result: `error` + `addrs` linked list
`xDnsAddr`	Single resolved address node
`xDnsCallback`	`void ()(xDnsResult result, void *arg)`

xDnsResult Fields

Field	Type	Description
`error`	`xErrno`	`xErrno_Ok` on success
`addrs`	`xDnsAddr *`	Linked list of addresses, or `NULL`

xDnsAddr Fields

Field	Type	Description
`addr`	`struct sockaddr_storage`	Resolved socket address
`addrlen`	`socklen_t`	Length of the address
`family`	`int`	`AF_INET` or `AF_INET6`
`socktype`	`int`	`SOCK_STREAM` or `SOCK_DGRAM`
`protocol`	`int`	`IPPROTO_TCP` or `IPPROTO_UDP`
`next`	`xDnsAddr *`	Next address, or `NULL`

Parameter Details for xDnsResolve

Parameter	Required	Description
`loop`	Yes	Event loop (must not be `NULL`)
`hostname`	Yes	Hostname or IP literal (non-empty)
`service`	No	Port string (e.g. `"443"`) or `NULL`
`hints`	No	`addrinfo` hints; `NULL` defaults to `AF_UNSPEC` + `SOCK_STREAM`
`callback`	Yes	Completion callback (must not be `NULL`)
`arg`	No	User argument forwarded to callback

Returns a xDnsQuery handle, or NULL on invalid arguments.

Usage Examples

Basic Resolution

#include <stdio.h>
#include <arpa/inet.h>
#include <xbase/event.h>
#include <xnet/dns.h>

static void on_resolved(xDnsResult *result, void *arg) {
    xEventLoop loop = (xEventLoop)arg;

    if (result->error != xErrno_Ok) {
        fprintf(stderr, "DNS failed: %d\n", result->error);
        xDnsResultFree(result);
        xEventLoopStop(loop);
        return;
    }

    for (xDnsAddr *a = result->addrs; a; a = a->next) {
        char buf[INET6_ADDRSTRLEN];
        if (a->family == AF_INET) {
            struct sockaddr_in *sin = (struct sockaddr_in *)&a->addr;
            inet_ntop(AF_INET, &sin->sin_addr, buf, sizeof(buf));
        } else {
            struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&a->addr;
            inet_ntop(AF_INET6, &sin6->sin6_addr, buf, sizeof(buf));
        }
        printf("  %s (family=%d)\n", buf, a->family);
    }

    xDnsResultFree(result);
    xEventLoopStop(loop);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xDnsResolve(loop, "example.com", "443", NULL, on_resolved, loop);
    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

IPv4-Only Resolution

struct addrinfo hints = {0};
hints.ai_family   = AF_INET;
hints.ai_socktype = SOCK_STREAM;

xDnsResolve(loop, "example.com", "80", &hints, on_resolved, loop);```

### Cancelling a Query

```c
xDnsQuery q = xDnsResolve(loop, "slow.example.com", NULL, NULL, on_resolved, NULL);
// Cancel immediately — callback will NOT fire
xDnsCancel(loop, q);

IP Literal (No DNS Lookup)

// Resolves instantly via AI_NUMERICHOST
xDnsResolve(loop, "127.0.0.1", "8080", NULL, on_resolved, loop);

xDnsResolve(loop, "::1", "8080", NULL, on_resolved, loop);

Thread Safety

Operation	Thread Safety
`xDnsResolve()`	Call from event loop thread only
`xDnsCancel()`	Call from event loop thread only
`xDnsResultFree()`	Call from any thread (result is owned)
`xDnsCallback`	Always invoked on event loop thread

Error Handling

Scenario	Behavior
`NULL` loop, hostname, or callback	Returns `NULL` (no query created)
Empty hostname	Returns `NULL`
`malloc` failure	Returns `NULL`
`getaddrinfo()` failure	Callback receives `result->error != xErrno_Ok`
Cancelled query	Callback is not invoked; result is freed internally

Best Practices

Always call xDnsResultFree() in your callback. The callee owns the result.
Check result->error before iterating addrs. On failure, addrs is NULL.
Use xDnsCancel() for cleanup. If you destroy the object that owns the callback context, cancel the query first to prevent a use-after-free.
Pass NULL hints for typical use. The defaults (AF_UNSPEC + SOCK_STREAM) cover most HTTP/WebSocket connection scenarios.
xDnsCancel(loop, NULL) is safe — it's a no-op, so you don't need to guard against NULL handles.

tcp.h — Async TCP Connection, Connector & Listener

Introduction

tcp.h provides three async TCP building blocks on top of xKit's event loop:

xTcpConn — a thin resource wrapper that pairs an xSocket with an xTransport, plus convenience Recv/Send/SendIov helpers.
xTcpConnect — an async connector that performs DNS → socket → non-blocking connect → optional TLS handshake, delivering a ready-to-use xTcpConn via callback.
xTcpListener — an async listener that accepts connections (with optional TLS) and delivers each as an xTcpConn.

All callbacks run on the event loop thread, consistent with the rest of xKit.

Design Philosophy

Resource Wrapper, Not Callback Framework — Unlike xWsCallbacks, we intentionally do not provide on_data / on_close callbacks at the TCP layer. WebSocket callbacks work well because the protocol defines message boundaries, close handshakes, and ping/pong — the library does real work before invoking user code. Raw TCP is a byte stream with no framing; an on_data callback would still deliver arbitrary fragments, leaving the user to reassemble and parse — no better than calling xTcpConnRecv directly. Instead, users register their own xSocketFunc callback via xSocketSetCallback() and drive I/O with xTcpConnRecv / xTcpConnSend.
Transport Transparency — xTcpConn wraps an xTransport vtable. For plain TCP, read/writev map to read(2)/writev(2). For TLS, they map to SSL_read/SSL_write. The Recv/Send/SendIov helpers hide this detail so users never need to reach into xTransport internals.
Full Async Connector Pipeline — xTcpConnect chains DNS resolution → socket creation → non-blocking connect() → optional TLS handshake into a single async operation with a timeout. Each phase is driven by event loop callbacks.
Ownership Transfer — xTcpConnTakeSocket and xTcpConnTakeTransport allow higher-level protocols (e.g. WebSocket upgrade) to extract the underlying resources without closing them.

Architecture

Connector State Machine

stateDiagram-v2
    [*] --> DNS: xTcpConnect()
    DNS --> TcpConnect: resolved
    DNS --> Failed: DNS error

    TcpConnect --> TlsHandshake: connected + TLS configured
    TcpConnect --> Succeed: connected (plain TCP)
    TcpConnect --> Failed: connect error

    TlsHandshake --> Succeed: handshake done
    TlsHandshake --> Failed: handshake error

    Succeed --> [*]: callback(conn, Ok)
    Failed --> [*]: callback(NULL, err)

    note right of DNS: Async via xDnsResolve
    note right of TcpConnect: Non-blocking connect()
    note right of TlsHandshake: Async SSL_do_handshake

Listener Accept Flow

sequenceDiagram
    participant EL as Event Loop
    participant L as xTcpListener
    participant PC as PendingConn (TLS only)
    participant App as User Callback

    EL->>L: xEvent_Read (new connection)
    L->>L: accept()

    alt Plain TCP
        L->>App: callback(listener, conn, addr)
    else TLS
        L->>PC: create PendingConn
        loop Handshake rounds
            EL->>PC: xEvent_Read / xEvent_Write
            PC->>PC: SSL_do_handshake()
        end
        PC->>App: callback(listener, conn, addr)
    end

xTcpConn Resource Ownership

graph LR
    CONN["xTcpConn"]
    SOCK["xSocket<br/>(event loop registration)"]
    TP["xTransport<br/>(plain / TLS vtable)"]
    FD["fd"]

    CONN --> SOCK
    CONN --> TP
    SOCK --> FD

    style CONN fill:#4a90d9,color:#fff
    style SOCK fill:#50b86c,color:#fff
    style TP fill:#f5a623,color:#fff

xTcpConnClose() destroys in order: transport → socket → conn shell. Use xTcpConnTakeSocket() / xTcpConnTakeTransport() to extract resources before closing.

API Reference

xTcpConn — Connection

Function	Signature	Description
`xTcpConnRecv`	`ssize_t xTcpConnRecv(xTcpConn conn, void *buf, size_t len)`	Read up to `len` bytes; returns bytes read, 0 on EOF, -1 on error
`xTcpConnSend`	`ssize_t xTcpConnSend(xTcpConn conn, const char *buf, size_t len)`	Write `len` bytes; returns bytes written, -1 on error
`xTcpConnSendIov`	`ssize_t xTcpConnSendIov(xTcpConn conn, const struct iovec *iov, int iovcnt)`	Scatter-gather write; returns total bytes written, -1 on error
`xTcpConnTransport`	`xTransport *xTcpConnTransport(xTcpConn conn)`	Get the internal transport vtable
`xTcpConnSocket`	`xSocket xTcpConnSocket(xTcpConn conn)`	Get the underlying socket handle
`xTcpConnTakeSocket`	`xSocket xTcpConnTakeSocket(xTcpConn conn)`	Extract socket ownership (conn no longer owns it)
`xTcpConnTakeTransport`	`xTransport xTcpConnTakeTransport(xTcpConn conn)`	Extract transport ownership (conn no longer owns it)
`xTcpConnReader`	`xReader xTcpConnReader(xTcpConn conn)`	Get an `xReader` adapter bound to the connection's transport (see io.h)
`xTcpConnWriter`	`xWriter xTcpConnWriter(xTcpConn conn)`	Get an `xWriter` adapter bound to the connection's transport (see io.h)
`xTcpConnClose`	`void xTcpConnClose(xEventLoop loop, xTcpConn conn)`	Close connection and free all resources

xTcpConnect — Async Connector

Function	Signature	Description
`xTcpConnect`	`xErrno xTcpConnect(xEventLoop loop, const char host, uint16_t port, const xTcpConnectConf conf, xTcpConnectFunc callback, void *arg)`	Initiate async TCP connection

xTcpConnectConf Fields

Field	Type	Default	Description
`tls_ctx`	`xTlsCtx`	`NULL`	Pre-created shared TLS context (preferred); `NULL` for plain TCP or auto-create from `tls`
`tls`	`const xTlsConf *`	`NULL`	TLS config for auto-created ctx; ignored when `tls_ctx` is set; `NULL` for plain TCP
`timeout_ms`	`int`	`10000`	Connect timeout in milliseconds
`nodelay`	`int`	`0`	Set `TCP_NODELAY` if non-zero
`keepalive`	`int`	`0`	Set `SO_KEEPALIVE` if non-zero

TLS context resolution order: tls_ctx (shared, not owned) → auto-create from tls → defaults (system CA, verify enabled). When tls_ctx is provided, the connector does not create or destroy the context — the caller retains ownership.

xTcpConnectFunc

typedef void (*xTcpConnectFunc)(xTcpConn conn, xErrno err, void *arg);

On success: conn is valid, err is xErrno_Ok. On failure: conn is NULL, err indicates the error.

xTcpListener — Async Listener

Function	Signature	Description
`xTcpListenerCreate`	`xTcpListener xTcpListenerCreate(xEventLoop loop, const char host, uint16_t port, const xTcpListenerConf conf, xTcpListenerFunc callback, void *arg)`	Create and start a TCP listener
`xTcpListenerDestroy`	`void xTcpListenerDestroy(xTcpListener listener)`	Stop listening and free resources

xTcpListenerConf Fields

Field	Type	Default	Description
`tls_ctx`	`xTlsCtx`	`NULL`	TLS context from `xTlsCtxCreate()`; `NULL` for plain TCP
`backlog`	`int`	`128`	`listen()` backlog
`reuseport`	`int`	`0`	Set `SO_REUSEPORT` if non-zero

xTcpListenerFunc

typedef void (*xTcpListenerFunc)(xTcpListener listener, xTcpConn conn,
                                 const struct sockaddr *addr, socklen_t addrlen,
                                 void *arg);

Invoked for each accepted connection. The callee takes ownership of conn.

Usage Examples

Echo Server

#include <string.h>
#include <xbase/event.h>
#include <xbase/socket.h>
#include <xnet/tcp.h>

static void on_conn_event(xSocket sock, xEventMask mask, void *arg) {
    xTcpConn conn = (xTcpConn)arg;
    (void)sock;

    if (mask & xEvent_Read) {
        char buf[4096];
        ssize_t n = xTcpConnRecv(conn, buf, sizeof(buf));
        if (n > 0) {
            xTcpConnSend(conn, buf, (size_t)n);
        } else {
            /* EOF or error: close */
            xTcpConnClose(xSocketLoop(sock), conn);
        }
    }
}

static void on_accept(xTcpListener listener, xTcpConn conn,
                      const struct sockaddr *addr, socklen_t addrlen,
                      void *arg) {
    (void)listener; (void)addr; (void)addrlen; (void)arg;

    /* Register our own event callback on the connection's socket */
    xSocket sock = xTcpConnSocket(conn);
    xSocketSetCallback(sock, on_conn_event, conn);
    /* Socket is already registered for xEvent_Read by default */
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xTcpListener listener =
        xTcpListenerCreate(loop, "0.0.0.0", 8080, NULL, on_accept, NULL);
    if (!listener) return 1;

    xEventLoopRun(loop);

    xTcpListenerDestroy(listener);
    xEventLoopDestroy(loop);
    return 0;
}

Async Client

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xbase/socket.h>
#include <xnet/tcp.h>

static void on_response(xSocket sock, xEventMask mask, void *arg) {
    xTcpConn conn = (xTcpConn)arg;
    xEventLoop loop = (xEventLoop)xSocketLoop(sock);
    (void)mask;

    char buf[4096];
    ssize_t n = xTcpConnRecv(conn, buf, sizeof(buf));
    if (n > 0) {
        printf("Received: %.*s\n", (int)n, buf);
    }
    xTcpConnClose(loop, conn);
    xEventLoopStop(loop);
}

static void on_connected(xTcpConn conn, xErrno err, void *arg) {
    xEventLoop loop = (xEventLoop)arg;
    if (err != xErrno_Ok) {
        fprintf(stderr, "Connect failed: %d\n", err);
        xEventLoopStop(loop);
        return;
    }

    /* Send a request */
    const char *msg = "Hello, server!";
    xTcpConnSend(conn, msg, strlen(msg));

    /* Wait for response */
    xSocket sock = xTcpConnSocket(conn);
    xSocketSetCallback(sock, on_response, conn);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xTcpConnectConf conf = {0};
    conf.nodelay = 1;

    xTcpConnect(loop, "127.0.0.1", 8080, &conf, on_connected, loop);
    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

TLS Client (auto-create context)

#include <xnet/tcp.h>
#include <xnet/tls.h>

static void on_tls_connected(xTcpConn conn, xErrno err, void *arg) {
    if (err != xErrno_Ok) { /* handle error */ return; }

    /* TLS is already established — Recv/Send are transparently encrypted */
    const char *msg = "GET / HTTP/1.1\r\nHost: example.com\r\n\r\n";
    xTcpConnSend(conn, msg, strlen(msg));
    /* ... register read callback ... */
}

void connect_tls(xEventLoop loop) {
    xTlsConf tls = {0};
    tls.ca = "/etc/ssl/certs/ca-certificates.crt";

    xTcpConnectConf conf = {0};
    conf.tls = &tls;

    xTcpConnect(loop, "example.com", 443, &conf, on_tls_connected, loop);
}

TLS Client (shared context)

When making many connections to the same server, share a xTlsCtx to avoid reloading certificates each time:

#include <xnet/tcp.h>
#include <xnet/tls.h>

static void on_connected(xTcpConn conn, xErrno err, void *arg) {
    if (err != xErrno_Ok) { /* handle error */ return; }
    /* ... use conn ... */
}

void connect_with_shared_ctx(xEventLoop loop) {
    // Create once, reuse for all connections
    xTlsConf tls = {0};
    tls.ca = "ca.pem";
    xTlsCtx ctx = xTlsCtxCreate(&tls);

    xTcpConnectConf conf = {0};
    conf.tls_ctx = ctx;  // shared, not owned by connector

    xTcpConnect(loop, "example.com", 443, &conf, on_connected, loop);
    xTcpConnect(loop, "example.com", 443, &conf, on_connected, loop);

    // ... later, after all connections are closed ...
    xTlsCtxDestroy(ctx);
}

TLS Server

#include <xnet/tcp.h>
#include <xnet/transport.h>

void start_tls_server(xEventLoop loop) {
    xTlsConf tls_conf = {
        .cert = "server.pem",
        .key  = "server-key.pem",
    };
    xTlsCtx tls_ctx = xTlsCtxCreate(&tls_conf);

    xTcpListenerConf conf = {0};
    conf.tls_ctx = tls_ctx;

    xTcpListener listener =
        xTcpListenerCreate(loop, "0.0.0.0", 8443, &conf, on_accept, NULL);
    /* ... run event loop ... */

    xTcpListenerDestroy(listener);
    xTlsCtxDestroy(tls_ctx);
}

Ownership Transfer (Protocol Upgrade)

/* After receiving an HTTP upgrade response on a TCP connection,
 * extract the socket and transport for the new protocol layer. */
xSocket    sock = xTcpConnTakeSocket(conn);
xTransport tp   = xTcpConnTakeTransport(conn);

/* Close the empty conn shell (no-op on resources) */
xTcpConnClose(loop, conn);

/* sock and tp are now owned by the new protocol handler */

Thread Safety

Operation	Thread Safety
`xTcpConnect()`	Call from event loop thread only
`xTcpListenerCreate()`	Call from event loop thread only
`xTcpListenerDestroy()`	Call from event loop thread only
`xTcpConnRecv/Send/SendIov()`	Call from event loop thread only
`xTcpConnClose()`	Call from event loop thread only
`xTcpConnectFunc` callback	Always invoked on event loop thread
`xTcpListenerFunc` callback	Always invoked on event loop thread

Error Handling

Scenario	Behavior
`NULL` loop, host, or callback in `xTcpConnect`	Returns `xErrno_InvalidArg`
DNS resolution failure	Callback receives `xErrno_DnsError` or `xErrno_DnsNotFound`
`connect()` failure	Callback receives `xErrno_SysError`
TLS handshake failure	Callback receives `xErrno_SysError`
Connect timeout	Callback receives `xErrno_Timeout`
`xTcpListenerCreate` bind/listen failure	Returns `NULL`
`xTcpConnRecv/Send` on `NULL` conn	Returns `-1`
`xTcpConnClose(loop, NULL)`	No-op (safe)
`xTcpListenerDestroy(NULL)`	No-op (safe)

Best Practices

Always close connections with xTcpConnClose() — it destroys the transport (TLS cleanup), removes the socket from the event loop, closes the fd, and frees the conn.
Register your own xSocketFunc on the connection's socket via xSocketSetCallback() to receive read/write events, then use xTcpConnRecv / xTcpConnSend inside the callback.
Use xTcpConnSendIov for multi-buffer writes (e.g. header + body) to avoid copying into a single buffer.
Set nodelay = 1 in xTcpConnectConf for latency-sensitive protocols (HTTP, WebSocket).
Use xTcpConnTakeSocket / xTcpConnTakeTransport when upgrading protocols (e.g. HTTP → WebSocket) to avoid double-free.
Cancel or close before freeing context — if you destroy the object that owns the connect callback context, ensure the connection attempt has completed or timed out first.

tls.h — TLS Configuration Types

Introduction

tls.h defines xTlsConf, the unified TLS configuration structure shared across xKit modules, and xTlsCtx, the opaque handle to a server-level TLS context. It controls certificate loading, peer verification, and optional ALPN negotiation for both client-side and server-side TLS. These are the central TLS abstractions — the actual TLS handshake is handled by the TLS backend (OpenSSL or mbedTLS) in the transport layer.

Design Philosophy

Backend-Agnostic — The config struct contains only file paths and flags. It works identically whether the TLS backend is OpenSSL or mbedTLS.
Zero-Initialize for Defaults — A zero-initialized xTlsConf uses the system CA bundle with full peer and host verification enabled. This is the secure default for both client and server.
Unified Client/Server — A single xTlsConf struct serves both roles. Client-only fields (key_password) and server-only fields (alpn) are simply left as NULL / zero when unused.
Separation of Concerns — TLS configuration is defined in xnet (the networking primitives layer) and consumed by xhttp (the HTTP layer). This avoids circular dependencies and allows future modules to reuse the same types.

API Reference

xTlsConf

Unified TLS configuration for both client and server.

Field	Type	Default	Description
`cert`	`const char *`	`NULL` (none)	Path to PEM certificate file
`key`	`const char *`	`NULL` (none)	Path to PEM private key file
`ca`	`const char *`	`NULL` (system CA)	Path to CA certificate file
`key_password`	`const char *`	`NULL` (none)	Private key password (client-side)
`alpn`	`const char **`	`NULL` (none)	NULL-terminated ALPN protocol list (server-side)
`skip_verify`	`int`	`0` (verify)	Non-zero to skip peer & host verification

Backward-compatible aliases: xTlsClientConf and xTlsServerConf are typedef'd to xTlsConf.

xTlsCtx

Opaque handle to a shared TLS context. Created by xTlsCtxCreate(), used by both server-side listeners (xTcpListenerConf.tls_ctx) and client-side connectors (xTcpConnectConf.tls_ctx, xWsConnectConf.tls_ctx). Shared across all connections that use the same context. Destroyed by xTlsCtxDestroy(). Supports certificate hot-reload via xTlsCtxReload().

xTlsCtxCreate

xTlsCtx xTlsCtxCreate(const xTlsConf *conf);

Create a shared TLS context. Loads the certificate (if provided), private key (if provided), optional CA, and optional ALPN list. The returned context can be shared across all connections that use the same TLS configuration.

conf — TLS configuration (must not be NULL). For server-side use, cert and key are required. For client-side use, only ca (or defaults) is needed.
Returns a TLS context handle, or NULL on failure.

xTlsCtxDestroy

void xTlsCtxDestroy(xTlsCtx ctx);

Destroy a shared TLS context and release all resources. Safe to call with NULL (no-op). Must only be called after all connections using this context have been closed.

xTlsCtxReload

int xTlsCtxReload(xTlsCtx ctx, const xTlsConf *conf);

Hot-reload certificates for an existing TLS context. Atomically replaces the certificate, private key, and optional CA. Existing connections are not affected; only new connections will use the updated certificates.

ctx — TLS context to reload (must not be NULL).
conf — New TLS configuration (must not be NULL, cert and key must not be NULL).
Returns 0 on success, -1 on failure (context unchanged).

Example: Certificate hot-reload

// Initial setup
xTlsConf tls = {
    .cert = "server.pem",
    .key  = "server-key.pem",
    .alpn = (const char *[]){"h2", "http/1.1", NULL},
};
xTlsCtx ctx = xTlsCtxCreate(&tls);

// ... later, when certificates are renewed ...
xTlsConf new_tls = {
    .cert = "server-new.pem",
    .key  = "server-key-new.pem",
    .alpn = (const char *[]){"h2", "http/1.1", NULL},
};
if (xTlsCtxReload(ctx, &new_tls) == 0) {
    // New connections will use the updated certificates
}

One-Way TLS (Client Verifies Server)

#include <xnet/tls.h>
#include <xhttp/client.h>

// Use system CA bundle (zero-init)
xTlsConf tls = {0};
xHttpClientConf conf = {.tls = &tls};
xHttpClient client = xHttpClientCreate(loop, &conf);

// Or specify a CA file
xTlsConf tls_ca = {0};
tls_ca.ca = "ca.pem";
xHttpClientConf conf_ca = {.tls = &tls_ca};
xHttpClient client2 = xHttpClientCreate(loop, &conf_ca);

Skip Verification (Development Only)

xTlsConf tls = {0};
tls.skip_verify = 1;  // DANGER: disables all checks
xHttpClientConf conf = {.tls = &tls};
xHttpClient client = xHttpClientCreate(loop, &conf);

Mutual TLS (mTLS)

// Server: require client certificate (default: verify enabled)
xTlsConf server_tls = {
    .cert = "server.pem",
    .key  = "server-key.pem",
    .ca   = "ca.pem",
};
xHttpServerListenTls(server, "0.0.0.0", 8443, &server_tls);

// Client: present certificate
xTlsConf client_tls = {0};
client_tls.ca   = "ca.pem";
client_tls.cert = "client.pem";
client_tls.key  = "client-key.pem";
xHttpClientConf client_conf = {
    .tls = &client_tls,
};
xHttpClient client = xHttpClientCreate(loop, &client_conf);

Password-Protected Private Key

xTlsConf tls = {0};
tls.ca           = "ca.pem";
tls.cert         = "client.pem";
tls.key          = "client-key-enc.pem";
tls.key_password = "my-secret";
xHttpClientConf conf = {.tls = &tls};
xHttpClient client = xHttpClientCreate(loop, &conf);

Relationship with Other Modules

xnet — xTlsCtxCreate() / xTlsCtxDestroy() / xTlsCtxReload() are declared in tls.h and implemented in the TLS backend files (transport_openssl.c, transport_mbedtls.c). The TCP listener uses xTlsCtx via xTcpListenerConf.tls_ctx, and the TCP connector uses it via xTcpConnectConf.tls_ctx.
xhttp — The HTTP server calls xTlsCtxCreate() internally when xHttpServerListenTls() is invoked, automatically setting ALPN to {"h2", "http/1.1"}. The HTTP client uses libcurl for TLS management and consumes xTlsConf directly. The WebSocket client supports both xTlsConf (auto-creates a context) and a pre-created xTlsCtx (shared across connections) via xWsConnectConf.tls_ctx. See the TLS Deployment Guide for end-to-end examples.

Security Notes

Never use skip_verify = 1 in production. It disables all certificate validation.
Keep private keys secure. Use restrictive file permissions (chmod 600).
For mTLS, set ca to the signing CA on the server side. Zero-initialized skip_verify means verification is enabled by default.
The config struct does not copy strings. The caller must ensure that file path strings remain valid until xHttpClientCreate() or xHttpServerListenTls() returns (the library deep-copies them internally).

xhttp — Asynchronous HTTP

Introduction

xhttp is xKit's HTTP module, providing both a fully asynchronous HTTP client and server, all powered by xbase's event loop.

The client uses libcurl's multi-socket API for non-blocking HTTP requests and SSE streaming — ideal for integrating with REST APIs and LLM streaming endpoints. Supports TLS configuration including custom CA certificates, mutual TLS (mTLS), and certificate verification control via xTlsConf.
The server uses an xHttpProto vtable interface for protocol-abstracted parsing, supporting both HTTP/1.1 (llhttp) and HTTP/2 (nghttp2, h2c Prior Knowledge) on the same port. TLS listeners are supported via xHttpServerListenTls with xTlsConf. Single-threaded, event-driven connection handling — ideal for building lightweight HTTP services and APIs.
WebSocket support includes both server and client. On the server side, call xWsUpgrade() inside a regular HTTP handler to perform the RFC 6455 upgrade handshake. On the client side, use xWsConnect() to establish an async WebSocket connection to a remote endpoint. The library handles frame codec, ping/pong, fragment reassembly, and close negotiation automatically for both sides.

Design Philosophy

Event Loop Integration — Instead of blocking threads, xhttp registers libcurl's sockets with xEventLoop and uses event-driven I/O. All callbacks are dispatched on the event loop thread, eliminating the need for synchronization.
Vtable-Based Request Polymorphism — Internally, different request types (oneshot HTTP, SSE streaming) share the same curl multi handle but use different vtables for completion and cleanup. This avoids code duplication while supporting diverse response handling patterns.
Zero-Copy Response Delivery — Response headers and body are accumulated in xBuffer instances and delivered to the callback as pointers. No extra copies are made.
Automatic Resource Management — Request contexts, curl easy handles, and buffers are automatically cleaned up after the completion callback returns. In-flight requests are cancelled with error callbacks when the client is destroyed.

Architecture

graph TD
    subgraph "Application"
        APP["User Code"]
    end

    subgraph "xhttp"
        CLIENT["xHttpClient"]
        TLS_CLI["TLS Config<br/>(xTlsConf)"]
        ONESHOT["Oneshot Request<br/>(GET/POST/Do)"]
        SSE["SSE Request<br/>(GetSse/DoSse)"]
        PARSER["SSE Parser<br/>(W3C spec)"]
    end

    subgraph "libcurl"
        MULTI["curl_multi"]
        EASY1["curl_easy (req 1)"]
        EASY2["curl_easy (req 2)"]
    end

    subgraph "xbase"
        LOOP["xEventLoop"]
        TIMER["Timer<br/>(curl timeout)"]
        FD["FD Events<br/>(socket I/O)"]
    end

    APP -->|"xHttpClientGet/Post/Do"| ONESHOT
    APP -->|"xHttpClientGetSse/DoSse"| SSE
    APP -->|"xHttpClientConf.tls"| TLS_CLI
    SSE --> PARSER
    ONESHOT --> CLIENT
    SSE --> CLIENT
    TLS_CLI --> CLIENT
    CLIENT --> MULTI
    MULTI --> EASY1
    MULTI --> EASY2
    MULTI -->|"CURLMOPT_SOCKETFUNCTION"| FD
    MULTI -->|"CURLMOPT_TIMERFUNCTION"| TIMER
    FD --> LOOP
    TIMER --> LOOP

    style CLIENT fill:#4a90d9,color:#fff
    style LOOP fill:#50b86c,color:#fff
    style MULTI fill:#f5a623,color:#fff

Sub-Module Overview

File	Description	Doc
`server.h`	Async HTTP/1.1 & HTTP/2 server (routing, request/response, protocol-abstracted parsing)	server.md
`client.h`	Async HTTP client API (GET, POST, Do, SSE, TLS configuration)	client.md
`sse.c`	SSE stream parser and request handler	sse.md
`ws.h` (server)	WebSocket server API (upgrade, send, close, callbacks)	ws_server.md
`ws.h` (client)	WebSocket client API (connect, send, close, callbacks)	ws_client.md
(guide)	TLS deployment guide (certificate generation, one-way TLS, mTLS, troubleshooting)	tls.md

Quick Start

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/client.h>

static void on_response(const xHttpResponse *resp, void *arg) {
    (void)arg;
    if (resp->curl_code == 0) {
        printf("Status: %ld\n", resp->status_code);
        printf("Body: %.*s\n", (int)resp->body_len, resp->body);
    } else {
        printf("Error: %s\n", resp->curl_error);
    }
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpClient client = xHttpClientCreate(loop, NULL);

    xHttpClientGet(client, "https://httpbin.org/get", on_response, NULL);

    xEventLoopRun(loop);

    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

Relationship with Other Modules

xbase — Uses xEventLoop for I/O multiplexing and xEventLoopTimerAfter for curl timeout management.
xbuf — Uses xBuffer for response header and body accumulation.
libcurl — External dependency (client). Uses the multi-socket API (curl_multi_socket_action) for non-blocking HTTP.
llhttp — External dependency (server). Provides incremental HTTP/1.1 request parsing, isolated behind the xHttpProto vtable in proto_h1.c.
nghttp2 — External dependency (server). Provides HTTP/2 frame processing and HPACK header compression, isolated behind the xHttpProto vtable in proto_h2.c.

client.h — Asynchronous HTTP Client

Introduction

client.h provides xHttpClient, an asynchronous HTTP client that integrates libcurl's multi-socket API with xbase's event loop. All network I/O is non-blocking and driven by the event loop; completion callbacks are dispatched on the event loop thread. The client supports GET, POST, PUT, DELETE, PATCH, HEAD methods and Server-Sent Events (SSE) streaming.

Design Philosophy

libcurl Multi-Socket Integration — Rather than using libcurl's easy (blocking) API or multi-perform (polling) API, xhttp uses the multi-socket API (CURLMOPT_SOCKETFUNCTION + CURLMOPT_TIMERFUNCTION). This allows libcurl to delegate socket monitoring to xEventLoop, achieving true event-driven I/O without dedicated threads.
Single-Threaded Callback Model — All callbacks (response, SSE events, done) are invoked on the event loop thread. No locks are needed in callback code.
Vtable-Based Polymorphism — Internally, each request carries a vtable (xHttpReqVtable) with on_done and on_cleanup function pointers. Oneshot requests and SSE requests use different vtables, sharing the same curl multi handle and completion infrastructure.
Automatic Body Copy — POST/PUT request bodies are copied internally (malloc + memcpy), so the caller doesn't need to keep the body alive after submitting the request.

Architecture

graph TD
    subgraph xHttpClientInternal[xHttpClient Internal]
        MULTI[curl multi handle]
        TIMER_CB[timer callback - CURLMOPT TIMERFUNCTION]
        SOCKET_CB[socket callback - CURLMOPT SOCKETFUNCTION]
        CHECK[check multi info]
    end

    subgraph PerRequest[Per Request]
        REQ[xHttpReq]
        EASY[curl easy handle]
        BODY[xBuffer body]
        HDR[xBuffer headers]
        VT[vtable - oneshot or SSE]
    end

    subgraph xbaseEventLoop[xbase Event Loop]
        LOOP[xEventLoop]
        FD_EVT[FD events]
        TIMER_EVT[Timer events]
    end

    SOCKET_CB --> FD_EVT
    TIMER_CB --> TIMER_EVT
    FD_EVT --> LOOP
    TIMER_EVT --> LOOP
    LOOP -->|fd ready| CHECK
    LOOP -->|timeout| CHECK
    CHECK --> VT
    VT -->|on done| APP[User Callback]

    REQ --> EASY
    REQ --> BODY
    REQ --> HDR
    REQ --> VT

    style MULTI fill:#f5a623,color:#fff
    style LOOP fill:#50b86c,color:#fff

Implementation Details

libcurl + xEventLoop Integration

sequenceDiagram
    participant App as Application
    participant Client as xHttpClient
    participant Curl as CurlMulti
    participant L as xEventLoop

    App->>Client: xHttpClientGet url cb
    Client->>Curl: curl multi add handle
    Curl->>Client: socket callback fd POLL IN
    Client->>L: xEventAdd fd Read
    Note over L: Event loop polls
    L->>Client: fd ready callback
    Client->>Curl: curl multi socket action
    Curl->>Client: write callback data
    Client->>Client: xBufferAppend body buf data
    Note over Curl: Transfer complete
    Client->>Client: check multi info
    Client->>App: on response resp

Socket Callback Flow

When libcurl needs to monitor a socket, it calls socket_callback:

CURL_POLL_REMOVE — Unregister the fd from the event loop (xEventDel).
CURL_POLL_IN/OUT/INOUT — Register or update the fd with the event loop (xEventAdd/xEventMod).

Each socket gets an xHttpSocketCtx_ that maps the fd to the client and event source.

Timer Callback Flow

When libcurl needs a timeout:

timeout_ms == -1 — Cancel any existing timer.
timeout_ms == 0 — Schedule a 1ms timer (deferred to avoid reentrant curl_multi_socket_action).
timeout_ms > 0 — Schedule a timer via xEventLoopTimerAfter.

When the timer fires, curl_multi_socket_action(CURL_SOCKET_TIMEOUT) is called.

Request Lifecycle

stateDiagram-v2
    [*] --> Created: xHttpClientGet/Post/Do
    Created --> Submitted: curl_multi_add_handle
    Submitted --> InFlight: Event loop drives I/O
    InFlight --> Completed: curl reports CURLMSG_DONE
    Completed --> CallbackInvoked: on_response(resp)
    CallbackInvoked --> CleanedUp: free buffers + easy handle
    CleanedUp --> [*]

    InFlight --> Aborted: xHttpClientDestroy
    Aborted --> CallbackInvoked: on_response(error)

Response Structure

XDEF_STRUCT(xHttpResponse) {
    long        status_code;  // HTTP status (200, 404, etc.), 0 on failure
    const char *headers;      // Raw headers (NUL-terminated)
    size_t      headers_len;
    const char *body;         // Response body (NUL-terminated)
    size_t      body_len;
    int         curl_code;    // CURLcode (0 = success)
    const char *curl_error;   // Human-readable error, or NULL
};

All pointers are valid only during the callback. The library manages their lifetime.

API Reference

Types

Type	Description
`xHttpClient`	Opaque handle to an HTTP client bound to an event loop
`xHttpClientConf`	Configuration struct for creating a client (TLS, HTTP version)
`xHttpResponse`	Response data delivered to the completion callback
`xHttpResponseFunc`	`void ()(const xHttpResponse resp, void *arg)`
`xHttpMethod`	Enum: `GET`, `POST`, `PUT`, `DELETE`, `PATCH`, `HEAD`
`xHttpRequestConf`	Configuration struct for generic requests
`xSseEvent`	SSE event data delivered to the event callback
`xSseEventFunc`	`int ()(const xSseEvent ev, void *arg)` — return 0 to continue, non-zero to close
`xSseDoneFunc`	`void ()(int curl_code, void arg)`
`xTlsConf`	TLS configuration for the client (CA path, client cert/key, skip verify)

Lifecycle

Function	Signature	Description	Thread Safety
`xHttpClientCreate`	`xHttpClient xHttpClientCreate(xEventLoop loop, const xHttpClientConf *conf)`	Create a client bound to an event loop. Pass `NULL` for defaults.	Not thread-safe
`xHttpClientDestroy`	`void xHttpClientDestroy(xHttpClient client)`	Destroy client. In-flight requests get error callbacks.	Not thread-safe

TLS Configuration

TLS is configured at client creation time via xHttpClientConf. The xTlsConf fields are deep-copied internally; the caller does not need to keep them alive after creation.

`xTlsConf` Fields (Client)

Field	Type	Description
`ca`	`const char *`	Path to a CA certificate file for server verification. When set, the system CA bundle is bypassed.
`cert`	`const char *`	Path to a client certificate file (PEM) for mutual TLS (mTLS).
`key`	`const char *`	Path to the client private key file (PEM) for mTLS.
`key_password`	`const char *`	Passphrase for an encrypted client private key.
`skip_verify`	`int`	If non-zero, skip server certificate verification (useful for self-signed certs in development).

All string fields are deep-copied internally; the caller does not need to keep them alive after the call.

Convenience Requests

Function	Signature	Description	Thread Safety
`xHttpClientGet`	`xErrno xHttpClientGet(xHttpClient client, const char url, xHttpResponseFunc on_response, void arg)`	Async GET request.	Not thread-safe
`xHttpClientPost`	`xErrno xHttpClientPost(xHttpClient client, const char url, const char body, size_t body_len, xHttpResponseFunc on_response, void *arg)`	Async POST request. Body is copied internally.	Not thread-safe

Generic Request

Function	Signature	Description	Thread Safety
`xHttpClientDo`	`xErrno xHttpClientDo(xHttpClient client, const xHttpRequestConf config, xHttpResponseFunc on_response, void arg)`	Fully-configured async request.	Not thread-safe

SSE Requests

Function	Signature	Description	Thread Safety
`xHttpClientGetSse`	`xErrno xHttpClientGetSse(xHttpClient client, const char url, xSseEventFunc on_event, xSseDoneFunc on_done, void arg)`	Subscribe to SSE endpoint (GET).	Not thread-safe
`xHttpClientDoSse`	`xErrno xHttpClientDoSse(xHttpClient client, const xHttpRequestConf config, xSseEventFunc on_event, xSseDoneFunc on_done, void arg)`	Fully-configured SSE request (e.g., POST for LLM APIs).	Not thread-safe

Usage Examples

Simple GET Request

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/client.h>

static void on_response(const xHttpResponse *resp, void *arg) {
    (void)arg;
    if (resp->curl_code == 0) {
        printf("HTTP %ld\n", resp->status_code);
        printf("%.*s\n", (int)resp->body_len, resp->body);
    } else {
        printf("Error: %s\n", resp->curl_error);
    }
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpClient client = xHttpClientCreate(loop, NULL);

    xHttpClientGet(client, "https://httpbin.org/get", on_response, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

HTTPS with TLS Configuration

#include <xbase/event.h>
#include <xhttp/client.h>

static void on_response(const xHttpResponse *resp,
                        void *arg) {
    (void)arg;
    printf("Status: %ld\n", resp->status_code);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    // Skip certificate verification (dev only)
    xTlsConf tls = {0};
    tls.skip_verify = 1;
    xHttpClientConf conf = {.tls = &tls};
    xHttpClient client =
        xHttpClientCreate(loop, &conf);

    xHttpClientGet(
        client,
        "https://secure.example.com/api",
        on_response, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

POST with Custom Headers

#include <xbase/event.h>
#include <xhttp/client.h>

static void on_response(const xHttpResponse *resp, void *arg) {
    (void)arg;
    printf("Status: %ld, Body: %.*s\n",
           resp->status_code, (int)resp->body_len, resp->body);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpClient client = xHttpClientCreate(loop, NULL);

    const char *headers[] = {
        "Content-Type: application/json",
        "Authorization: Bearer token123",
        NULL
    };

    xHttpRequestConf config = {
        .url       = "https://api.example.com/data",
        .method    = xHttpMethod_POST,
        .body      = "{\"key\": \"value\"}",
        .body_len  = 16,
        .headers   = headers,
        .timeout_ms = 5000,
    };

    xHttpClientDo(client, &config, on_response, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

Use Cases

REST API Integration — Make async HTTP calls to microservices, cloud APIs, or webhooks from an event-driven C application.
Secure Communication — Pass TLS config via xHttpClientConf at creation time to configure custom CA certificates, client certificates for mTLS, or skip verification for development environments with self-signed certs.
LLM API Calls — Use xHttpClientDoSse() with POST method and JSON body to stream responses from OpenAI, Anthropic, or other LLM APIs. See sse.md for a complete example.
Health Checks / Monitoring — Periodically poll HTTP endpoints using timer-driven GET requests within the event loop.

Best Practices

Don't block in callbacks. Callbacks run on the event loop thread. Blocking delays all other I/O.
Copy data you need to keep. Response pointers (body, headers) are only valid during the callback.
Use xHttpClientDo() for complex requests. The convenience helpers (Get/Post) are for simple cases; Do gives full control over method, headers, body, and timeout.
Destroy the client before the event loop. xHttpClientDestroy() cancels in-flight requests and invokes their callbacks with error status.
Check curl_code first. A curl_code of 0 means the HTTP transfer succeeded; then check status_code for the HTTP-level result.
Never use skip_verify in production. It disables all certificate validation. Use a proper CA path or system CA bundle instead.
TLS config is set at creation time. Pass xHttpClientConf with TLS settings when creating the client; it affects both oneshot and SSE requests. To change TLS config, destroy and recreate the client.

Comparison with Other Libraries

Feature	xhttp client.h	libcurl easy API	cpp-httplib	Python requests
I/O Model	Async (event loop)	Blocking	Blocking	Blocking
Event Loop	xEventLoop integration	None (or manual multi)	None	None (asyncio separate)
SSE Support	Built-in (`GetSse`/`DoSse`)	Manual parsing	No	No (needs `sseclient`)
TLS Config	`xHttpClientConf.tls` at creation	`curl_easy_setopt` (manual)	Built-in	`verify`/`cert` params
Thread Model	Single-threaded callbacks	One thread per request	One thread per request	One thread per request
Memory	Automatic (xBuffer)	Manual (`WRITEFUNCTION`)	Automatic (std::string)	Automatic (Python GC)
Language	C99	C	C++	Python

Key Differentiator: xhttp provides true event-loop-integrated async HTTP with built-in SSE support. Unlike libcurl's easy API (which blocks) or multi-perform API (which requires polling), xhttp uses the multi-socket API for zero-overhead integration with xEventLoop. The built-in SSE parser makes it uniquely suited for LLM API integration from C.

server.h — Asynchronous HTTP/1.1 & HTTP/2 Server

Introduction

server.h provides xHttpServer, an asynchronous, non-blocking HTTP server powered by xbase's event loop. The server supports both HTTP/1.1 and HTTP/2 (h2c, cleartext) on the same port, with automatic protocol detection via Prior Knowledge. The protocol parsing layer is abstracted behind an xHttpProto vtable interface — HTTP/1.1 uses llhttp, HTTP/2 uses nghttp2. All connection handling, request parsing, and response sending are driven by the event loop on a single thread — no locks or thread pools required. The server supports routing, keep-alive, configurable limits, automatic error responses, and TLS/HTTPS via xHttpServerListenTls() with pluggable TLS backends (OpenSSL or Mbed TLS).

Design Philosophy

Single-Threaded Event-Driven I/O — The server registers listening and client sockets with xEventLoop. Accept, read, parse, dispatch, and write all happen on the event loop thread, eliminating synchronization overhead.
Protocol-Abstracted Parsing — Request parsing is delegated to a protocol handler behind the xHttpProto vtable interface. HTTP/1.1 (proto_h1.c) uses llhttp; HTTP/2 (proto_h2.c) uses nghttp2. Incremental callbacks accumulate URL, headers, and body into xBuffer instances. This abstraction allows both protocols to share the same connection management, routing, and response serialization layers.
Automatic Protocol Detection — On each new connection, the server inspects the first bytes of incoming data. If the 24-byte HTTP/2 connection preface (PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n) is detected, the connection is upgraded to HTTP/2; otherwise, HTTP/1.1 is used. This enables h2c (cleartext HTTP/2) via Prior Knowledge — ideal for internal service-to-service communication.
First-Match Routing — Routes are registered as pattern strings (e.g. "GET /users/:id" or "/any") and matched in registration order. If the pattern starts with /, it matches any HTTP method; otherwise the first token is the method. Path patterns support both exact segments and :param segments.
Writer-Based Response API — Handlers receive an xHttpResponseWriter handle to set status, headers, and body. The response is serialized into an xIOBuffer and flushed asynchronously, with backpressure handled automatically.
Defensive Limits — Configurable limits on header size (default 8 KiB), body size (default 1 MiB), and idle timeout (default 60 s) protect against slow clients and oversized payloads. Violations produce appropriate 4xx error responses.
Pluggable TLS — TLS support is provided via xHttpServerListenTls() with xTlsConf. The TLS backend (OpenSSL or Mbed TLS) is selected at compile time via XK_TLS_BACKEND. ALPN negotiation automatically selects HTTP/1.1 or HTTP/2 over TLS. Mutual TLS (mTLS) is supported when ca is set (verification is enabled by default).

Architecture

graph TD
    subgraph "Application"
        APP["User Code"]
        HANDLER["Handler Callback"]
    end

    subgraph "xhttp Server"
        SERVER["xHttpServer"]
        TLS["TLS Layer<br/>(OpenSSL / Mbed TLS)"]
        ROUTER["Route Table<br/>(linked list)"]
        CONN["xHttpConn_<br/>(per connection)"]
        DETECT["Protocol Detection<br/>(Prior Knowledge / ALPN)"]
        PROTO["xHttpProto (vtable)"]
        PARSER_H1["proto_h1 (llhttp)"]
        PARSER_H2["proto_h2 (nghttp2)"]
        STREAM["xHttpStream_<br/>(per request)"]
        WRITER["xHttpResponseWriter"]
    end

    subgraph "xbase"
        LOOP["xEventLoop"]
        SOCK["xSocket"]
        TIMER["Idle Timeout"]
    end

    APP -->|"xHttpServerRoute"| ROUTER
    APP -->|"xHttpServerListen<br/>xHttpServerListenTls"| SERVER
    SERVER -->|"accept()"| CONN
    SERVER -.->|"TLS handshake"| TLS
    TLS -.-> CONN
    CONN --> DETECT
    DETECT -->|"H1"| PARSER_H1
    DETECT -->|"H2 preface"| PARSER_H2
    PARSER_H1 --> PROTO
    PARSER_H2 --> PROTO
    PROTO -->|"request complete"| STREAM
    STREAM --> ROUTER
    ROUTER -->|"first match"| HANDLER
    HANDLER -->|"xHttpResponseSend"| WRITER
    WRITER --> STREAM
    STREAM -->|"H1: xIOBuffer / H2: nghttp2 frames"| CONN
    CONN --> SOCK
    SOCK --> LOOP
    TIMER --> LOOP

    style SERVER fill:#4a90d9,color:#fff
    style LOOP fill:#50b86c,color:#fff
    style PROTO fill:#9b59b6,color:#fff
    style PARSER_H1 fill:#f5a623,color:#fff
    style PARSER_H2 fill:#e74c3c,color:#fff
    style DETECT fill:#1abc9c,color:#fff
    style TLS fill:#2ecc71,color:#fff

Implementation Details

Connection Lifecycle

stateDiagram-v2
    [*] --> Accepted: accept() on listen fd
    Accepted --> Reading: xSocket registered (Read)
    Reading --> Parsing: Data received
    Parsing --> Dispatching: on_message_complete
    Dispatching --> HandlerRunning: Route matched
    Dispatching --> ErrorSent: No match (404/405)
    HandlerRunning --> ResponseQueued: xHttpResponseSend()
    ResponseQueued --> Flushing: conn_try_flush()
    Flushing --> KeepAlive: All written + keep-alive
    Flushing --> Backpressure: EAGAIN (register Write)
    Backpressure --> Flushing: Write event fires
    KeepAlive --> Reading: Reset parser state
    Flushing --> Closed: All written + !keep-alive
    ErrorSent --> Closed: Error responses close connection

    Reading --> Closed: Idle timeout
    Reading --> Closed: Client disconnect
    Reading --> Closed: Parse error (400)
    Parsing --> ErrorSent: Header too large (431)
    Parsing --> ErrorSent: Body too large (413)

Request Parsing Flow

sequenceDiagram
    participant Client
    participant Conn as xHttpConn_
    participant Proto as xHttpProto (vtable)
    participant Parser as proto_h1 (llhttp)
    participant Bufs as xBuffer (url/headers/body)
    participant Router as Route Table
    participant Handler as User Handler

    Client->>Conn: TCP data
    Conn->>Conn: xIOBufferReadFd()
    Conn->>Proto: proto.on_data(data)
    Proto->>Parser: llhttp_execute(data)
    Parser->>Bufs: on_url → xBufferAppend(url)
    Parser->>Bufs: on_header_field → xBufferAppend(headers_raw)
    Parser->>Bufs: on_header_value → xBufferAppend(headers_raw)
    Parser->>Bufs: on_body → xBufferAppend(body)
    Parser->>Proto: on_message_complete → return 1
    Proto->>Conn: return 1 (request complete)
    Conn->>Router: conn_dispatch_request()
    Router->>Handler: handler(writer, req, arg)
    Handler->>Conn: xHttpResponseSend(body)
    Conn->>Client: HTTP response (async flush)

Routing

Routes are stored in a singly-linked list and matched in registration order (first match wins):

Path match — Segment-by-segment comparison. Static segments require exact match; :param segments match any non-empty string and capture the value.
Method match — Case-insensitive comparison (strcasecmp). A pattern without a method prefix (e.g. "/any") matches any HTTP method.
Fallback — If the path matches but no method matches → 405 Method Not Allowed. If no path matches → 404 Not Found.
Parameter access — Inside a handler, call xHttpRequestParam(req, "id", &len) to retrieve the captured value.

Response Serialization

When xHttpResponseSend() is called:

Status line (HTTP/1.1 <code> <reason>\r\n) is written to the xIOBuffer.
Content-Length header is added automatically.
Connection: keep-alive or Connection: close is added based on the parser's determination.
User-set headers are appended.
Header section is terminated with \r\n.
Body is appended.
conn_try_flush() attempts an immediate writev(). If EAGAIN, the socket is registered for write events and flushing continues asynchronously.

Keep-Alive & Pipelining

HTTP/1.1 connections default to keep-alive. After a response is fully flushed, proto.reset() is called and the connection waits for the next request.
The parser is paused in on_message_complete to prevent parsing the next pipelined request before the current response is sent.
Error responses always set Connection: close.

HTTP/2 Support (h2c Prior Knowledge)

The server supports cleartext HTTP/2 (h2c) via the Prior Knowledge mechanism. HTTP/1.1 and HTTP/2 coexist on the same port — no TLS or Upgrade header required.

Protocol Detection

When a new connection is accepted, protocol detection is deferred until the first bytes arrive:

If the first 24 bytes match the HTTP/2 connection preface (PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n), xHttpProtoH2Init() is called.
If the prefix doesn't match, xHttpProtoH1Init() is called.
If fewer than 24 bytes have arrived but the prefix still matches so far, the server waits for more data before deciding.

Stream Multiplexing

Under HTTP/2, a single TCP connection carries multiple concurrent streams, each representing an independent request/response exchange:

xHttpStream_ — Per-request state (URL, headers, body, response writer). HTTP/1.1 uses a single implicit stream (stream_id = 0); HTTP/2 creates a new stream for each request.
Deferred dispatch — Completed streams are queued during nghttp2_session_mem_recv() and dispatched after it returns, avoiding re-entrancy issues.
Response framing — Responses are submitted via nghttp2_submit_response() with HPACK-compressed headers and DATA frames, then flushed through the connection's write buffer.

H2 Connection Lifecycle

sequenceDiagram
    participant Client
    participant Conn as xHttpConn_
    participant Detect as Protocol Detection
    participant H2 as proto_h2 (nghttp2)
    participant Stream as xHttpStream_
    participant Router as Route Table
    participant Handler as User Handler

    Client->>Conn: TCP connect
    Client->>Conn: H2 connection preface + SETTINGS
    Conn->>Detect: First bytes inspection
    Detect->>H2: xHttpProtoH2Init()
    H2->>Client: SETTINGS frame (server preface)
    Client->>Conn: HEADERS frame (stream 1, :method=GET, :path=/hello)
    Conn->>H2: h2_on_data()
    H2->>Stream: Create stream (id=1)
    H2->>Stream: Accumulate headers
    H2->>Router: Dispatch (END_STREAM received)
    Router->>Handler: handler(writer, req, arg)
    Handler->>Stream: xHttpResponseSend(body)
    Stream->>H2: nghttp2_submit_response()
    H2->>Client: HEADERS + DATA frames

Key Differences: H1 vs H2

Feature	HTTP/1.1 (proto_h1)	HTTP/2 (proto_h2)
Parser	llhttp (byte stream → request)	nghttp2 (byte stream → frame → stream)
Multiplexing	None (pipelining at best)	Native, multiple concurrent streams
Headers	Plain text `Key: Value`	HPACK compressed pseudo-headers + regular headers
Keep-alive	`Connection: keep-alive` header	Always persistent (multiplexed)
Reset	Per-request `proto.reset()`	No-op (streams are independent)
Response framing	Raw HTTP/1.1 status line + headers + body	`nghttp2_submit_response()` → HEADERS + DATA frames
Flow control	None	Built-in per-stream flow control

Limitations

h2 over TLS — TLS-based HTTP/2 (h2 with ALPN) is supported via xHttpServerListenTls(). Cleartext h2c uses Prior Knowledge.
No server push — HTTP/2 server push is not implemented.
Streaming responses — xHttpResponseWrite()/xHttpResponseEnd() for HTTP/2 streaming DATA frames is not yet fully implemented.

Idle Timeout

Each connection has an idle timeout (default 60 s). If no data is received within this period, the connection is closed automatically via xEvent_Timeout. The timeout is reset after each response is sent on a keep-alive connection.

API Reference

Types

Type	Description
`xHttpServer`	Opaque handle to an HTTP server bound to an event loop
`xHttpResponseWriter`	Opaque handle to a response writer (valid only during handler)
`xHttpRequest`	Request data delivered to the handler callback
`xHttpHandlerFunc`	`void ()(xHttpResponseWriter writer, const xHttpRequest req, void *arg)`
`xTlsConf`	TLS configuration for HTTPS listeners (cert, key, CA, skip_verify)

xHttpRequest Fields

Field	Type	Description
`method`	`const char *`	HTTP method string (e.g. `"GET"`, `"POST"`)
`url`	`const char *`	Request URL / path (NUL-terminated)
`headers`	`const char *`	Raw request headers (NUL-terminated)
`headers_len`	`size_t`	Length of headers in bytes
`body`	`const char *`	Request body, or `NULL` if no body
`body_len`	`size_t`	Length of body in bytes

All pointers are valid only for the duration of the handler callback.

Lifecycle

Function	Signature	Description
`xHttpServerCreate`	`xHttpServer xHttpServerCreate(xEventLoop loop)`	Create a server bound to an event loop.
`xHttpServerListen`	`xErrno xHttpServerListen(xHttpServer server, const char *host, uint16_t port)`	Start listening on the given address and port.
`xHttpServerListenTls`	`xErrno xHttpServerListenTls(xHttpServer server, const char host, uint16_t port, const xTlsConf config)`	Start listening for HTTPS connections with TLS. ALPN selects H1/H2. Can coexist with `Listen` on a different port. Returns `xErrno_NotSupported` if no TLS backend was compiled.
`xHttpServerDestroy`	`void xHttpServerDestroy(xHttpServer server)`	Destroy server, close all connections, free all routes.

Route Registration

Function	Signature	Description
`xHttpServerRoute`	`xErrno xHttpServerRoute(xHttpServer server, const char pattern, xHttpHandlerFunc handler, void arg)`	Register a route. `pattern` combines method and path: `"GET /users/:id"` matches only GET; `"/users/:id"` matches all methods. Path supports `:param` segments. First match wins.

Request Parameters

Function	Signature	Description
`xHttpRequestParam`	`const char xHttpRequestParam(const xHttpRequest req, const char name, size_t len)`	Look up a path parameter by name. Returns a pointer to the value (NOT NUL-terminated) and sets `*len`, or returns `NULL` if not found.

Response

Function	Signature	Description
`xHttpResponseSetStatus`	`void xHttpResponseSetStatus(xHttpResponseWriter writer, int code)`	Set HTTP status code (default 200).
`xHttpResponseSetHeader`	`xErrno xHttpResponseSetHeader(xHttpResponseWriter writer, const char key, const char value)`	Add a response header. Call before `Send` or the first `Write`.
`xHttpResponseSend`	`xErrno xHttpResponseSend(xHttpResponseWriter writer, const char *body, size_t body_len)`	Send a complete response. May only be called once. Mutually exclusive with `Write`.
`xHttpResponseWrite`	`xErrno xHttpResponseWrite(xHttpResponseWriter writer, const char *data, size_t len)`	Write data to a streaming response. First call flushes headers (no `Content-Length`). Mutually exclusive with `Send`.
`xHttpResponseEnd`	`void xHttpResponseEnd(xHttpResponseWriter writer)`	End a streaming response. Optional — auto-called when the handler returns.

Configuration

Function	Signature	Description	Default
`xHttpServerSetIdleTimeout`	`xErrno xHttpServerSetIdleTimeout(xHttpServer server, int timeout_ms)`	Set idle timeout for connections.	60000 ms
`xHttpServerSetMaxHeaderSize`	`xErrno xHttpServerSetMaxHeaderSize(xHttpServer server, size_t max_size)`	Set max header size. Exceeding → 431.	8192 bytes
`xHttpServerSetMaxBodySize`	`xErrno xHttpServerSetMaxBodySize(xHttpServer server, size_t max_size)`	Set max body size. Exceeding → 413.	1048576 bytes

All configuration functions must be called before xHttpServerListen() / xHttpServerListenTls().

TLS Configuration

`xTlsConf` Fields (Server)

Field	Type	Description
`cert`	`const char *`	Path to PEM certificate file (required).
`key`	`const char *`	Path to PEM private key file (required).
`ca`	`const char *`	Path to CA certificate file for client verification (optional).
`skip_verify`	`int`	If non-zero, skip peer verification. Default `0` (verify enabled).

When ca is set and skip_verify is 0 (default), the server performs mutual TLS (mTLS) — clients must present a valid certificate signed by the specified CA.

Usage Examples

Minimal Server

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_hello(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "text/plain");
    xHttpResponseSend(w, "Hello, World!\n", 14);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /hello", on_hello, NULL);
    xHttpServerListen(server, "0.0.0.0", 8080);

    printf("Listening on :8080\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

JSON API with POST

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_echo(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "application/json");
    xHttpResponseSend(w, req->body, req->body_len);
}

static void on_not_found(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    const char *body = "{\"error\": \"not found\"}";
    xHttpResponseSetStatus(w, 404);
    xHttpResponseSetHeader(w, "Content-Type", "application/json");
    xHttpResponseSend(w, body, strlen(body));
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerSetMaxBodySize(server, 4 * 1024 * 1024); /* 4 MiB */

    xHttpServerRoute(server, "POST /echo", on_echo, NULL);

    xHttpServerListen(server, NULL, 9090);
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

Server-Sent Events (SSE)

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_events(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "text/event-stream");
    xHttpResponseSetHeader(w, "Cache-Control", "no-cache");

    xHttpResponseWrite(w, "data: hello\n\n", 13);
    xHttpResponseWrite(w, "data: world\n\n", 13);
    /* xHttpResponseEnd(w) is optional; auto-called on return */
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /events", on_events, NULL);

    xHttpServerListen(server, NULL, 8080);
    printf("SSE server on :8080/events\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

RESTful API with Path Parameters

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_get_user(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)arg;
    size_t id_len = 0;
    const char *id = xHttpRequestParam(req, "id", &id_len);

    char body[128];
    int len = snprintf(body, sizeof(body),
                       "{\"user_id\": \"%.*s\"}\n", (int)id_len, id);

    xHttpResponseSetHeader(w, "Content-Type", "application/json");
    xHttpResponseSend(w, body, (size_t)len);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /users/:id", on_get_user, NULL);

    xHttpServerListen(server, NULL, 8080);
    printf("REST API on :8080\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

HTTPS Server

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_hello(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "text/plain");
    xHttpResponseSend(w, "Hello, HTTPS!\n", 14);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /hello", on_hello, NULL);

    // TLS configuration
    xTlsConf tls = {
        .cert = "/path/to/server.pem",
        .key  = "/path/to/server-key.pem",
    };
    xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

    printf("HTTPS server on :8443\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

HTTPS Server with Mutual TLS (mTLS)

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_secure(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "text/plain");
    xHttpResponseSend(w, "mTLS verified!\n", 15);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /secure", on_secure, NULL);

    // Require client certificates
    xTlsConf tls = {
        .cert     = "/path/to/server.pem",
        .key      = "/path/to/server-key.pem",
        .ca       = "/path/to/ca.pem",
    };
    xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

    printf("mTLS server on :8443\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

HTTP + HTTPS on Different Ports

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_hello(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSend(w, "Hello!\n", 7);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /hello", on_hello, NULL);

    // Serve HTTP on port 8080
    xHttpServerListen(server, "0.0.0.0", 8080);

    // Serve HTTPS on port 8443
    xTlsConf tls = {
        .cert = "/path/to/server.pem",
        .key  = "/path/to/server-key.pem",
    };
    xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

    printf("HTTP on :8080, HTTPS on :8443\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

Multiple Routes with Shared State

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/server.h>

typedef struct {
    int counter;
} AppState;

static void on_count(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req;
    AppState *state = (AppState *)arg;
    state->counter++;

    char body[64];
    int len = snprintf(body, sizeof(body), "{\"count\": %d}\n", state->counter);

    xHttpResponseSetHeader(w, "Content-Type", "application/json");
    xHttpResponseSend(w, body, (size_t)len);
}

static void on_health(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSend(w, "ok\n", 3);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    AppState state = { .counter = 0 };

    xHttpServerRoute(server, "POST /count", on_count, &state);
    xHttpServerRoute(server, "GET /health", on_health, NULL);

    xHttpServerListen(server, NULL, 8080);
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

Best Practices

Don't block in handlers. Handlers run on the event loop thread. Blocking delays all other connections.
Always call xHttpResponseSend() or xHttpResponseWrite(). If the handler returns without sending, a default 200 OK with empty body is sent automatically — but it's better to be explicit.
Don't mix Send and Write. xHttpResponseSend() is for one-shot responses; xHttpResponseWrite() is for streaming. They are mutually exclusive — calling one after the other returns xErrno_InvalidState.
Configure limits before listening. SetIdleTimeout, SetMaxHeaderSize, and SetMaxBodySize must be called before xHttpServerListen() / xHttpServerListenTls().
Register routes before listening. Routes should be set up before the server starts accepting connections.
Use xHttpServerListenTls() for HTTPS. Provide valid PEM certificate and key files. For mTLS, set ca (verification is enabled by default).
Serve HTTP and HTTPS on different ports. Call both xHttpServerListen() and xHttpServerListenTls() on the same server instance to support both protocols simultaneously.
Destroy server before event loop. xHttpServerDestroy() closes all connections and frees all resources.
Copy data you need to keep. xHttpRequest pointers (url, headers, body) are only valid during the handler callback.

Comparison with Other Libraries

Feature	xhttp server.h	libuv + http-parser	libmicrohttpd	Go net/http	Node.js http
I/O Model	Async (event loop)	Async (event loop)	Threaded / select	Goroutines	Async (event loop)
Event Loop	xEventLoop integration	libuv	Internal	Go runtime	libuv (V8)
HTTP Parser	llhttp (H1) + nghttp2 (H2)	http-parser / llhttp	Internal	Internal	llhttp
Streaming Response	Built-in (`Write`/`End`)	Manual	Manual	Built-in (`Flusher`)	Built-in (`write`/`end`)
Routing	Built-in (first match)	None (manual)	None (manual)	Built-in (`ServeMux`)	None (manual)
Keep-Alive	Automatic	Manual	Automatic	Automatic	Automatic
Thread Model	Single-threaded	Single-threaded	Multi-threaded	Multi-goroutine	Single-threaded
TLS/HTTPS	Built-in (`ListenTLS`, mTLS)	Manual (libuv + OpenSSL)	Built-in	Built-in (`ListenAndServeTLS`)	Built-in (`https.createServer`)
Language	C99	C	C	Go	JavaScript

Key Differentiator: xhttp server provides a complete, single-threaded HTTP/1.1 & HTTP/2 server with built-in routing, streaming responses, TLS/HTTPS, and automatic keep-alive — all integrated with xEventLoop. HTTP/1.1 and HTTP/2 coexist on the same port via automatic protocol detection (Prior Knowledge for cleartext, ALPN for TLS). Unlike libuv + http-parser (which requires manual response assembly and TLS integration) or libmicrohttpd (which uses threads), xhttp keeps everything on one thread with zero synchronization overhead. The TLS layer supports mutual TLS (mTLS) with client certificate verification, and the streaming API (xHttpResponseWrite/xHttpResponseEnd) makes it straightforward to implement SSE or chunked streaming without external dependencies.

Relationship with Other Modules

xbase — Uses xEventLoop for I/O multiplexing, xSocket for non-blocking socket management, and socket timeouts for idle connection detection.
xbuf — Uses xBuffer for request parsing accumulation (URL, headers, body) and xIOBuffer for read/write buffering with scatter-gather I/O.
llhttp — External dependency. Provides incremental HTTP/1.1 request parsing via callbacks, isolated behind the xHttpProto vtable in proto_h1.c.
nghttp2 — External dependency. Provides HTTP/2 frame processing, HPACK header compression, and stream management, isolated behind the xHttpProto vtable in proto_h2.c.
OpenSSL / Mbed TLS — External dependency (TLS backend, compile-time selection via XK_TLS_BACKEND). Provides TLS handshake, encryption, certificate verification, and ALPN negotiation for xHttpServerListenTls().

ws.h — WebSocket Server

Introduction

ws.h provides a callback-driven WebSocket interface integrated with the xhttp server. For pure WebSocket services, call xWsServe() to create a server in one line. For mixed HTTP + WebSocket endpoints, call xWsUpgrade() inside a regular HTTP handler to perform the RFC 6455 upgrade handshake. The library handles frame codec, ping/pong, fragment reassembly, and close negotiation automatically.

All callbacks are dispatched on the event loop thread — no locks or thread pools required.

Design Philosophy

Handler-Initiated Upgrade — WebSocket connections start as regular HTTP requests. The user calls xWsUpgrade() inside an xHttpHandlerFunc to perform the upgrade. This keeps routing unified: WebSocket endpoints are just HTTP routes.
Callback-Driven I/O — Three optional callbacks (on_open, on_message, on_close) cover the full connection lifecycle. The library handles all framing, masking, and control frames internally.
Automatic Protocol Handling — Ping/pong is answered automatically. Fragmented messages are reassembled before delivery. Close handshake follows RFC 6455 §5.5.1 with a 5-second timeout for the peer's response.
Connection Hijacking — On successful upgrade, the HTTP connection's socket and transport layer are transferred to a new xWsConn object. The HTTP connection is destroyed; the WebSocket connection takes full ownership of the file descriptor.
Pluggable Crypto Backend — The handshake requires SHA-1 and Base64 for Sec-WebSocket-Accept computation. The crypto backend is selected at compile time: OpenSSL, Mbed TLS, or a built-in implementation.

Architecture

graph TD
    subgraph "Application"
        APP["User Code"]
        HANDLER["HTTP Handler"]
        WS_CBS["xWsCallbacks"]
    end

    subgraph "xhttp WebSocket"
        UPGRADE["xWsUpgrade()"]
        HANDSHAKE["Handshake<br/>(RFC 6455 §4)"]
        CRYPTO["SHA-1 + Base64<br/>(pluggable backend)"]
        WSCONN["xWsConn"]
        PARSER["Frame Parser<br/>(incremental)"]
        ENCODER["Frame Encoder"]
        FRAG["Fragment<br/>Reassembly"]
        CTRL["Control Frames<br/>(Ping/Pong/Close)"]
    end

    subgraph "xbase"
        LOOP["xEventLoop"]
        SOCK["xSocket"]
        TIMER["Idle Timer"]
    end

    APP -->|"xHttpServerRoute"| HANDLER
    HANDLER -->|"xWsUpgrade(w, req, cbs)"| UPGRADE
    UPGRADE --> HANDSHAKE
    HANDSHAKE --> CRYPTO
    HANDSHAKE -->|"101 Switching Protocols"| WSCONN
    WSCONN --> PARSER
    WSCONN --> ENCODER
    PARSER --> FRAG
    PARSER --> CTRL
    FRAG -->|"on_message"| WS_CBS
    CTRL -->|"auto pong"| ENCODER
    WSCONN --> SOCK
    SOCK --> LOOP
    TIMER --> LOOP

    style WSCONN fill:#4a90d9,color:#fff
    style LOOP fill:#50b86c,color:#fff
    style PARSER fill:#9b59b6,color:#fff
    style HANDSHAKE fill:#f5a623,color:#fff

Implementation Details

Upgrade Handshake Flow

sequenceDiagram
    participant Client as Browser
    participant Handler as HTTP Handler
    participant Upgrade as xWsUpgrade()
    participant Conn as xHttpConn_
    participant WS as xWsConn

    Client->>Handler: GET /ws (Upgrade: websocket)
    Handler->>Upgrade: xWsUpgrade(w, req, &cbs, arg)
    Upgrade->>Upgrade: Validate headers
    Note over Upgrade: Method=GET<br/>Upgrade: websocket<br/>Connection: Upgrade<br/>Sec-WebSocket-Version: 13<br/>Sec-WebSocket-Key: ...
    Upgrade->>Upgrade: SHA1(Key + GUID) → Base64
    Upgrade->>Client: 101 Switching Protocols
    Upgrade->>Conn: Hijack socket + transport
    Upgrade->>WS: xWsConnCreate()
    WS->>Client: on_open callback fires

Connection Lifecycle

stateDiagram-v2
    [*] --> Open: xWsUpgrade() succeeds
    Open --> Open: Data frames (text/binary)
    Open --> Open: Ping → auto Pong
    Open --> CloseSent: xWsClose() called
    Open --> CloseReceived: Peer sends Close
    CloseSent --> Closed: Peer Close received
    CloseSent --> Closed: 5s timeout
    CloseReceived --> Closed: Echo Close flushed
    Open --> Closed: I/O error
    Open --> CloseSent: Idle timeout (1001)
    Closed --> [*]: on_close + destroy

Frame Processing

When data arrives on the socket, the incremental frame parser (xWsFrameParser) extracts complete frames from the xIOBuffer. Each frame is processed based on its opcode:

Opcode	Handling
Text (0x1)	Deliver via `on_message`
Binary (0x2)	Deliver via `on_message`
Continuation (0x0)	Append to fragment buffer
Ping (0x9)	Auto-reply with Pong
Pong (0xA)	Ignored
Close (0x8)	Close handshake

Fragment Reassembly

Fragmented messages are reassembled transparently:

First fragment (FIN=0, opcode=Text/Binary) starts accumulation in frag_buf.
Continuation frames (opcode=0x0) append to frag_buf.
Final fragment (FIN=1, opcode=0x0) triggers reassembly and delivers the complete message via on_message.

Protocol violations (e.g., new message mid-fragment) result in a Close frame with status 1002.

Close State Machine

XDEF_ENUM(xWsCloseState){
    xWsCloseState_Open,          // Normal operating state
    xWsCloseState_CloseSent,     // We sent Close, waiting for peer
    xWsCloseState_CloseReceived, // Peer sent Close, we replied
    xWsCloseState_Closed,        // Connection fully closed
};

Server-initiated close: xWsClose() sends a Close frame and transitions to CLOSE_SENT. A 5-second timer waits for the peer's Close response.
Peer-initiated close: The peer's Close frame is echoed back, transitioning to CLOSE_RECEIVED. After the echo is flushed, on_close fires and the connection is destroyed.
Idle timeout: After the configured idle period with no data, a Close frame with code 1001 (Going Away) is sent.

Internal File Structure

File	Role
`ws.h`	Public API (types, callbacks, functions)
`ws.c`	Connection lifecycle, I/O, frame dispatch
`ws_handshake_server.c`	Server upgrade handshake (RFC 6455 §4.2)
`ws_frame.h/c`	Frame codec (parse + encode)
`ws_crypto.h`	SHA-1 + Base64 interface
`ws_crypto_openssl.c`	OpenSSL backend
`ws_crypto_mbedtls.c`	Mbed TLS backend
`ws_crypto_builtin.c`	Built-in (no TLS dep)
`ws_serve.c`	`xWsServe()` convenience wrapper
`ws_private.h`	Internal data structures

API Reference

Types

Type	Description
`xWsConn`	Opaque WebSocket connection handle
`xWsOpcode`	Message type: `Text` (0x1), `Binary` (0x2)
`xWsCallbacks`	Struct of 3 optional callback pointers

Callback Signatures

xWsOnOpenFunc

typedef void (*xWsOnOpenFunc)(xWsConn conn, void *arg);

Called when the WebSocket connection is established. conn is valid until on_close returns.

xWsOnMessageFunc

typedef void (*xWsOnMessageFunc)(
    xWsConn conn, xWsOpcode opcode,
    const void *payload, size_t len,
    void *arg);

Called when a complete message is received. Fragmented messages are reassembled before delivery. payload is valid only during the callback.

xWsOnCloseFunc

typedef void (*xWsOnCloseFunc)(
    xWsConn conn, uint16_t code,
    const char *reason, size_t len,
    void *arg);

Called when the connection is closed (clean or abnormal). After this callback returns, conn is invalid.

xWsCallbacks

typedef struct {
    xWsOnOpenFunc    on_open;    // optional
    xWsOnMessageFunc on_message; // optional
    xWsOnCloseFunc   on_close;   // optional
} xWsCallbacks;

Functions

Function	Description
`xWsServe`	One-call WebSocket-only server
`xWsUpgrade`	Upgrade HTTP → WebSocket
`xWsSend`	Send a text or binary message
`xWsClose`	Initiate graceful close

xWsServe

xHttpServer xWsServe(
    xEventLoop loop,
    const char *host,
    uint16_t port,
    const xWsCallbacks *callbacks,
    void *arg);

Convenience function that creates an HTTP server, registers a catch-all route that upgrades every incoming request to WebSocket, and starts listening. Returns the server handle for later cleanup via xHttpServerDestroy(), or NULL on failure.

Parameters:

loop — Event loop (must not be NULL).
host — Bind address (e.g. "0.0.0.0"), or NULL.
port — Port number to listen on.
callbacks — WebSocket event callbacks (not NULL).
arg — User argument forwarded to all callbacks.

Returns: Server handle, or NULL on failure.

xWsUpgrade

xErrno xWsUpgrade(
    xHttpResponseWriter writer,
    const xHttpRequest *req,
    const xWsCallbacks *callbacks,
    void *arg);

Call inside an xHttpHandlerFunc to upgrade the HTTP connection to WebSocket. On success, the handler must return immediately — the HTTP connection has been hijacked.

On failure (bad headers, wrong method), an HTTP error response (400/405) is sent automatically and a non-Ok error code is returned.

Parameters:

writer — Response writer from the handler.
req — HTTP request from the handler.
callbacks — WebSocket event callbacks (not NULL).
arg — User argument forwarded to all callbacks.

Returns: xErrno_Ok on success.

xWsSend

xErrno xWsSend(
    xWsConn conn, xWsOpcode opcode,
    const void *payload, size_t len);

Send a message over the WebSocket connection. The payload is framed and queued for asynchronous transmission.

Parameters:

conn — WebSocket connection handle.
opcode — xWsOpcode_Text or xWsOpcode_Binary.
payload — Message data.
len — Payload length in bytes.

Returns: xErrno_Ok on success, xErrno_InvalidState if the connection is closing.

xWsClose

xErrno xWsClose(xWsConn conn, uint16_t code);

Initiate a graceful close. Sends a Close frame with the given status code. The connection remains open until the peer responds or a 5-second timeout expires.

Parameters:

conn — WebSocket connection handle.
code — Close status code (e.g., 1000 for normal).

Returns: xErrno_Ok on success.

Close Status Codes

Code	Constant	Meaning
1000	`XWS_CLOSE_NORMAL`	Normal closure
1001	`XWS_CLOSE_GOING_AWAY`	Server shutting down
1002	`XWS_CLOSE_PROTOCOL_ERR`	Protocol error
1003	`XWS_CLOSE_UNSUPPORTED`	Unsupported data
1005	`XWS_CLOSE_NO_STATUS`	No status received
1006	`XWS_CLOSE_ABNORMAL`	Abnormal closure

Usage Examples

Echo Server (with xWsServe)

#include <xbase/event.h>
#include <xhttp/ws.h>
#include <stdio.h>
#include <string.h>

static void on_open(xWsConn conn, void *arg) {
    (void)arg;
    const char *hi = "Welcome!";
    xWsSend(conn, xWsOpcode_Text, hi, strlen(hi));
}

static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) {
    (void)arg;
    xWsSend(conn, op, data, len);
}

static void on_close(xWsConn conn, uint16_t code, const char *reason, size_t len, void *arg) {
    (void)conn; (void)reason; (void)len; (void)arg;
    printf("closed: %u\n", code);
}

static const xWsCallbacks ws_cbs = {
    .on_open    = on_open,
    .on_message = on_message,
    .on_close   = on_close,
};

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xHttpServer srv = xWsServe(loop, "0.0.0.0", 8080, &ws_cbs, NULL);
    if (!srv) return 1;

    printf("ws://localhost:8080/\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(srv);
    xEventLoopDestroy(loop);
    return 0;
}

Echo Server (with xWsUpgrade)

#include <xbase/event.h>
#include <xhttp/server.h>
#include <xhttp/ws.h>
#include <stdio.h>
#include <string.h>

static const xWsCallbacks ws_cbs = { ... };

static void ws_handler(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)arg;
    xWsUpgrade(w, req, &ws_cbs, NULL);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer srv = xHttpServerCreate(loop);

    xHttpServerRoute(srv, "GET /ws", ws_handler, NULL);
    xHttpServerListen(srv, "0.0.0.0", 8080);

    printf("ws://localhost:8080/ws\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(srv);
    xEventLoopDestroy(loop);
    return 0;
}

Per-Connection User Data

typedef struct {
    char username[64];
    int  msg_count;
} Session;

static void on_open(xWsConn conn, void *arg) {
    Session *s = (Session *)arg;
    snprintf(s->username, sizeof(s->username), "user_%p", (void *)conn);
    s->msg_count = 0;
}

static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) {
    Session *s = (Session *)arg;
    s->msg_count++;
    printf("[%s] msg #%d: %.*s\n", s->username, s->msg_count, (int)len, (const char *)data);
    xWsSend(conn, op, data, len);
}

static void ws_handler(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)arg;
    Session *s = calloc(1, sizeof(Session));
    xWsCallbacks cbs = {
        .on_open    = on_open,
        .on_message = on_message,
        .on_close   = on_close_free_session,
    };
    xWsUpgrade(w, req, &cbs, s);
}

Graceful Server-Initiated Close

static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) {
    (void)op; (void)arg;
    if (len == 4 && memcmp(data, "quit", 4) == 0) {
        xWsClose(conn, 1000); // normal close
        return;
    }
    xWsSend(conn, op, data, len);
}

JavaScript Client

<script>
const ws = new WebSocket('ws://localhost:8080/ws');

ws.onopen = () => console.log('connected');

ws.onmessage = (e) => console.log('< ' + e.data);

ws.onclose = (e) =>
    console.log('closed: ' + e.code);

// Send a message
ws.send('Hello, server!');
</script>

Best Practices

Return immediately after xWsUpgrade(). On success, the HTTP connection is hijacked. Do not call any xHttpResponse* functions afterward.
Don't block in callbacks. All callbacks run on the event loop thread. Blocking delays all other I/O.
Copy payload if needed. The payload pointer in on_message is valid only during the callback. Copy the data if you need it later.
Use xWsClose() for graceful shutdown. Avoid dropping connections without a Close handshake.
Handle on_close for cleanup. Free per-connection resources in on_close, as the xWsConn handle becomes invalid after the callback returns.
Idle timeout is inherited. The WebSocket connection inherits the HTTP server's idle_timeout_ms setting. Adjust it via xHttpServerSetIdleTimeout() if needed.

Comparison with Other Libraries

Feature	xhttp WS	libwebsockets	uWebSockets
Integration	xEventLoop	Own loop	Own loop
Upgrade	In HTTP handler	Separate	Separate
Fragment reassembly	Automatic	Automatic	Automatic
Ping/Pong	Automatic	Automatic	Automatic
Close handshake	RFC 6455	RFC 6455	RFC 6455
TLS	Via xhttp	Built-in	Built-in
Language	C99	C	C++
Dependencies	xbase only	OpenSSL	None

Key Differentiator: xhttp's WebSocket server is unique in its handler-initiated upgrade pattern. Instead of a separate WebSocket server, you register a normal HTTP route and call xWsUpgrade() inside the handler. This keeps routing, middleware, and mixed HTTP+WS endpoints unified under a single server instance.

ws.h — WebSocket Client

Introduction

ws.h provides xWsConnect(), an asynchronous WebSocket client that integrates with xbase's event loop. The entire connection process — DNS resolution, TCP connect, optional TLS handshake, and HTTP Upgrade — runs fully asynchronously. Once connected, the same callback-driven model (on_open, on_message, on_close) and the same xWsConn handle are used for both client and server connections.

Design Philosophy

Fully Asynchronous Connection — xWsConnect() returns immediately. The multi-phase connection process (DNS → TCP → TLS → HTTP Upgrade) is driven entirely by the event loop. No threads or blocking calls.
Shared Connection Model — Once the handshake completes, a client xWsConn is identical to a server xWsConn. The same xWsSend(), xWsClose(), and callback interfaces apply. Code that operates on xWsConn doesn't need to know which side initiated the connection.
Failure via on_close — If the connection fails at any stage (DNS, TCP, TLS, or HTTP Upgrade), on_close is invoked with an error code. on_open is never called for failed connections. This simplifies error handling: cleanup always happens in one place.
Client-Side Masking — Per RFC 6455, client-to-server frames must be masked. The library handles this automatically when the connection is created in client mode.

Architecture

graph TD
    subgraph "Application"
        APP["User Code"]
        CBS["xWsCallbacks"]
        CONF["xWsConnectConf"]
    end

    subgraph "xWsConnect State Machine"
        CONNECT["xWsConnect()"]
        DNS["DNS Resolution"]
        TCP["TCP Connect"]
        TLS["TLS Handshake<br/>(wss:// only)"]
        UPGRADE["HTTP Upgrade<br/>Request/Response"]
        VALIDATE["Validate 101<br/>+ Sec-WebSocket-Accept"]
    end

    subgraph "Established Connection"
        WSCONN["xWsConn<br/>(client mode)"]
        SEND["xWsSend()"]
        CLOSE["xWsClose()"]
    end

    subgraph "xbase"
        LOOP["xEventLoop"]
        SOCK["xSocket"]
        TIMER["Timeout Timer"]
    end

    APP --> CONF
    APP --> CBS
    CONF --> CONNECT
    CBS --> CONNECT
    CONNECT --> DNS
    DNS --> TCP
    TCP --> TLS
    TLS --> UPGRADE
    UPGRADE --> VALIDATE
    VALIDATE -->|"Success"| WSCONN
    VALIDATE -->|"Failure"| CBS

    WSCONN --> SEND
    WSCONN --> CLOSE
    WSCONN --> SOCK
    SOCK --> LOOP
    TIMER --> LOOP

    style WSCONN fill:#4a90d9,color:#fff
    style LOOP fill:#50b86c,color:#fff
    style CONNECT fill:#f5a623,color:#fff
    style VALIDATE fill:#9b59b6,color:#fff

Implementation Details

Connection State Machine

The xWsConnector drives the connection through five phases, all on the event loop thread:

stateDiagram-v2
    [*] --> DNS: xWsConnect() called
    DNS --> TCP_CONNECT: Address resolved
    TCP_CONNECT --> TLS_HANDSHAKE: Connected [wss]
    TCP_CONNECT --> HTTP_UPGRADE_WRITE: Connected [ws]
    TLS_HANDSHAKE --> HTTP_UPGRADE_WRITE: Handshake complete
    HTTP_UPGRADE_WRITE --> HTTP_UPGRADE_READ: Request sent
    HTTP_UPGRADE_READ --> DONE: 101 validated
    DONE --> [*]: on_open fires

    DNS --> [*]: Failure → on_close
    TCP_CONNECT --> [*]: Failure → on_close
    TLS_HANDSHAKE --> [*]: Failure → on_close
    HTTP_UPGRADE_READ --> [*]: Bad response → on_close
    DNS --> [*]: Timeout → on_close
    TCP_CONNECT --> [*]: Timeout → on_close

Phase Details

Phase	What Happens
DNS	`xDnsResolve()` resolves the hostname asynchronously. On success, proceeds to TCP.
TCP Connect	Creates an `xSocket`, calls `connect()`. Waits for the writable event (EINPROGRESS).
TLS Handshake	For `wss://` URLs only. Initializes the TLS transport and drives the handshake via read/write events.
HTTP Upgrade Write	Builds the Upgrade request (with random `Sec-WebSocket-Key`) and flushes it to the server.
HTTP Upgrade Read	Reads the server's response, validates `HTTP/1.1 101`, `Upgrade: websocket`, `Connection: Upgrade`, and `Sec-WebSocket-Accept`.

Handshake Flow

sequenceDiagram
    participant App as Application
    participant Conn as xWsConnector
    participant DNS as xDnsResolve
    participant Server as Remote Server

    App->>Conn: xWsConnect(loop, conf, cbs, arg)
    Conn->>DNS: Resolve hostname
    DNS-->>Conn: Address resolved
    Conn->>Server: TCP connect()
    Server-->>Conn: Connected
    Note over Conn,Server: (wss:// only) TLS handshake
    Conn->>Server: GET /path HTTP/1.1<br/>Upgrade: websocket<br/>Sec-WebSocket-Key: ...
    Server-->>Conn: HTTP/1.1 101 Switching Protocols<br/>Sec-WebSocket-Accept: ...
    Conn->>Conn: Validate response
    Conn->>App: on_open(conn, arg)

Timeout Handling

A configurable timeout (default 10 seconds) covers the entire connection process. If any phase takes too long, the timer fires, the connector is destroyed, and on_close is invoked with code 1006 (Abnormal Closure).

Internal File Structure

File	Role
`ws.h`	Public API (`xWsConnect`, `xWsConnectConf`)
`ws_connect.c`	Async connection state machine
`ws_handshake_client.h/c`	Build Upgrade request, validate 101 response
`ws_crypto.h`	SHA-1 + Base64 for `Sec-WebSocket-Accept`
`transport_tls_client.h`	TLS client transport init (shared `xTlsCtx` → per-connection SSL)
`transport_tls_client_openssl.c`	OpenSSL TLS client transport implementation
`transport_tls_client_mbedtls.c`	mbedTLS TLS client transport implementation

API Reference

Types

Type	Description
`xWsConn`	Opaque WebSocket connection handle (shared with server)
`xWsOpcode`	Message type: `Text` (0x1), `Binary` (0x2)
`xWsCallbacks`	Struct of 3 optional callback pointers (shared with server)
`xWsConnectConf`	Configuration for `xWsConnect()`

xWsConnectConf

struct xWsConnectConf {
    const char *url;              // ws:// or wss:// URL (required)
    const xTlsConf *tls;         // TLS config for wss:// (NULL = defaults)
    xTlsCtx tls_ctx;             // Pre-created shared TLS context (priority over tls)
    const char *headers;          // Extra HTTP headers (NULL = none)
    int timeout_ms;               // Connect timeout (0 = 10000 ms)
};

Field	Description
`url`	WebSocket URL. Must start with `ws://` or `wss://`. Required.
`tls`	TLS configuration for `wss://` connections. `NULL` uses system CA with verification enabled. Ignored for `ws://`. Ignored when `tls_ctx` is set.
`tls_ctx`	Pre-created shared TLS context from `xTlsCtxCreate()`. Takes priority over `tls`. The caller retains ownership and must keep it alive for the lifetime of the connection. `NULL` = create from `tls` (or use defaults).
`headers`	Extra HTTP headers appended to the Upgrade request. Format: `"Key: Value\r\nKey2: Value2\r\n"`. `NULL` for none.
`timeout_ms`	Timeout for the entire connection process in milliseconds. `0` uses the default (10000 ms).

Callbacks

The same xWsCallbacks struct is used for both client and server connections. See WebSocket Server for callback signature details.

Client-specific behavior:

on_open — Called when the connection is fully established (101 validated). Not called on failure.
on_close — Called on connection failure (DNS, TCP, TLS, or Upgrade error) or after a normal close. For failed connections, conn is NULL.

Functions

xWsConnect

xErrno xWsConnect(
    xEventLoop loop,
    const xWsConnectConf *conf,
    const xWsCallbacks *callbacks,
    void *arg);

Initiate an asynchronous WebSocket client connection. Returns immediately; the connection process runs on the event loop.

Parameters:

loop — Event loop (must not be NULL).
conf — Connection configuration (must not be NULL, conf->url required).
callbacks — WebSocket event callbacks (must not be NULL).
arg — User argument forwarded to all callbacks.

Returns: xErrno_Ok if the async connection started, xErrno_InvalidArg for bad parameters (NULL pointers, invalid URL scheme).

xWsSend

xErrno xWsSend(
    xWsConn conn, xWsOpcode opcode,
    const void *payload, size_t len);

Send a message. Identical to the server-side API. Client frames are automatically masked per RFC 6455.

xWsClose

xErrno xWsClose(xWsConn conn, uint16_t code);

Initiate a graceful close. Identical to the server-side API.

Usage Examples

Connect and Echo

#include <xbase/event.h>
#include <xhttp/ws.h>
#include <stdio.h>
#include <string.h>

static void on_open(xWsConn conn, void *arg) {
    (void)arg;
    const char *msg = "Hello, server!";
    xWsSend(conn, xWsOpcode_Text, msg, strlen(msg));
}

static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) {
    (void)conn; (void)op; (void)arg;
    printf("Received: %.*s\n", (int)len, (const char *)data);
    xWsClose(conn, 1000);
}

static void on_close(xWsConn conn, uint16_t code, const char *reason, size_t len, void *arg) {
    (void)conn; (void)reason; (void)len; (void)arg;
    printf("Closed: %u\n", code);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xWsConnectConf conf = {0};
    conf.url = "ws://localhost:8080/ws";

    xWsCallbacks cbs = {
        .on_open    = on_open,
        .on_message = on_message,
        .on_close   = on_close,
    };

    xWsConnect(loop, &conf, &cbs, NULL);

    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

Secure Connection (wss://)

#include <xbase/event.h>
#include <xhttp/ws.h>
#include <xnet/tls.h>

static void on_open(xWsConn conn, void *arg) { /* ... */ }
static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) { /* ... */ }
static void on_close(xWsConn conn, uint16_t code, const char *reason, size_t len, void *arg) { /* ... */ }

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    // Skip certificate verification (dev only)
    xTlsConf tls = {0};
    tls.skip_verify = 1;

    xWsConnectConf conf = {0};
    conf.url = "wss://echo.example.com/ws";
    conf.tls = &tls;
    conf.timeout_ms = 5000;

    xWsCallbacks cbs = {
        .on_open    = on_open,
        .on_message = on_message,
        .on_close   = on_close,
    };

    xWsConnect(loop, &conf, &cbs, NULL);

    xEventLoopRun(loop);
    xEventLoopDestroy(loop);
    return 0;
}

Shared TLS Context (Multiple Connections)

When creating many wss:// connections (e.g. reconnect loops or connection pools), use a shared xTlsCtx to avoid reloading certificates on every connection:

#include <xbase/event.h>
#include <xhttp/ws.h>
#include <xnet/tls.h>

static void on_open(xWsConn conn, void *arg) { /* ... */ }
static void on_message(xWsConn conn, xWsOpcode op, const void *data, size_t len, void *arg) { /* ... */ }
static void on_close(xWsConn conn, uint16_t code, const char *reason, size_t len, void *arg) { /* ... */ }

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    // Create a shared TLS context once
    xTlsConf tls = {0};
    tls.ca = "ca.pem";
    xTlsCtx ctx = xTlsCtxCreate(&tls);

    // All connections share the same ctx
    xWsConnectConf conf = {0};
    conf.url = "wss://echo.example.com/ws";
    conf.tls_ctx = ctx;  // shared, not copied

    xWsCallbacks cbs = {
        .on_open    = on_open,
        .on_message = on_message,
        .on_close   = on_close,
    };

    xWsConnect(loop, &conf, &cbs, NULL);

    xEventLoopRun(loop);

    // Destroy ctx after all connections are closed
    xTlsCtxDestroy(ctx);
    xEventLoopDestroy(loop);
    return 0;
}

Custom Headers (Authentication)

xWsConnectConf conf = {0};
conf.url = "ws://api.example.com/stream";
conf.headers = "Authorization: Bearer token123\r\n"
               "X-Client-Version: 1.0\r\n";

xWsConnect(loop, &conf, &cbs, NULL);

Connection Failure Handling

static void on_close(xWsConn conn, uint16_t code, const char *reason, size_t len, void *arg) {
    if (conn == NULL) {
        // Connection failed before establishing WebSocket
        printf("Connection failed (code %u)\n", code);
        // Optionally retry after a delay
        return;
    }
    // Normal close after successful connection
    printf("Disconnected: %u\n", code);
}

Binary Data

static void on_open(xWsConn conn, void *arg) {
    uint8_t data[] = {0x00, 0x01, 0x02, 0xFF, 0xFE};
    xWsSend(conn, xWsOpcode_Binary, data, sizeof(data));
}

Best Practices

Check the return value of xWsConnect(). It returns xErrno_InvalidArg for obviously bad parameters (NULL pointers, unsupported URL scheme). Network errors are reported asynchronously via on_close.
Handle conn == NULL in on_close. This indicates a connection failure before the WebSocket was established. Use this to implement retry logic.
Don't block in callbacks. All callbacks run on the event loop thread.
Copy payload if needed. The payload pointer in on_message is valid only during the callback.
Use xWsClose() for graceful shutdown. The client sends a Close frame and waits for the server's response.
Set a reasonable timeout. The default 10-second timeout covers DNS + TCP + TLS + Upgrade. Adjust via conf.timeout_ms for high-latency networks.
Never use skip_verify in production. It disables all certificate validation. Use a proper CA path or system CA bundle instead.

Comparison with Other Libraries

Feature	xhttp WS Client	libwebsockets	wslay	civetweb
I/O Model	Async (event loop)	Async (own loop)	Sync (user drives)	Threaded
Event Loop	xEventLoop	Own loop	None	pthreads
DNS	Async (xDnsResolve)	Async (built-in)	Manual	Blocking
TLS	Via xnet	Built-in	Manual	Built-in
Client Masking	Automatic	Automatic	Automatic	Automatic
Connection Timeout	Configurable	Configurable	Manual	Configurable
Language	C99	C	C	C
Dependencies	xbase + xnet	OpenSSL	None	None

Key Differentiator: xhttp's WebSocket client runs entirely on the xbase event loop with zero blocking calls. The multi-phase connection (DNS → TCP → TLS → Upgrade) is a single async state machine. Combined with the shared xWsConn model, client and server code use identical APIs for sending, receiving, and closing — making bidirectional WebSocket applications straightforward.

TLS Context Sharing: For wss:// connections, the client supports a shared xTlsCtx (via conf.tls_ctx) that avoids reloading certificates and re-creating the SSL context on every connection. This is the same pattern used by xTcpConnect and xTcpListener, providing consistent TLS context management across all xKit networking APIs.

sse.c — SSE Stream Client

Introduction

sse.c implements Server-Sent Events (SSE) support for xHttpClient. It provides xHttpClientGetSse() and xHttpClientDoSse() which subscribe to SSE endpoints and parse the event stream according to the W3C SSE specification. Each parsed event is delivered to a callback as it arrives, enabling real-time streaming — ideal for LLM API integration.

Design Philosophy

W3C Spec Compliance — The parser follows the W3C Server-Sent Events specification: field parsing (event, data, id, retry), comment handling, multi-line data joining with \n, and default event type "message".
Streaming Parse — Data is parsed incrementally as it arrives from libcurl's write callback. Complete lines are processed immediately; incomplete lines are buffered until more data arrives.
Shared Infrastructure — SSE requests reuse the same curl_multi handle and event loop integration as regular HTTP requests. The xHttpReqVtable mechanism allows SSE to plug in its own write callback and completion handler.
User-Controlled Cancellation — The xSseEventFunc callback returns an int: 0 to continue, non-zero to close the connection. This gives the user fine-grained control over when to stop streaming.

Architecture

graph TD
    subgraph "SSE Request Flow"
        SUBMIT["xHttpClientDoSse()"]
        EASY["curl_easy + SSE headers"]
        WRITE["sse_write_callback"]
        PARSER["xSseParser_"]
        EVENT["on_event(ev)"]
        DONE["on_done(curl_code)"]
    end

    subgraph "Shared with Oneshot"
        MULTI["curl_multi"]
        LOOP["xEventLoop"]
        CHECK["check_multi_info()"]
    end

    SUBMIT --> EASY
    EASY --> MULTI
    MULTI --> LOOP
    LOOP -->|"fd ready"| WRITE
    WRITE --> PARSER
    PARSER -->|"event boundary"| EVENT
    CHECK -->|"transfer done"| DONE

    style PARSER fill:#4a90d9,color:#fff
    style EVENT fill:#50b86c,color:#fff

Implementation Details

SSE Parser State Machine

stateDiagram-v2
    [*] --> Buffering: Data arrives from curl
    Buffering --> ParseLine: Complete line found (\\n or \\r\\n)
    ParseLine --> FieldParse: Non-empty line
    ParseLine --> DispatchEvent: Empty line (event boundary)
    FieldParse --> Buffering: Continue parsing
    DispatchEvent --> CallUser: data field exists
    DispatchEvent --> Buffering: No data (skip)
    CallUser --> Buffering: User returns 0 (continue)
    CallUser --> [*]: User returns non-zero (close)

SSE Field Parsing

Each non-empty line is parsed as a field:

Line Format	Field	Value
`:comment`	(ignored)	—
`event:type`	event_type	`"type"`
`data:payload`	data	`"payload"` (accumulated with `\n`)
`id:123`	id	`"123"` (persists across events)
`retry:5000`	retry	`5000` (ms, must be all digits)
`unknown:foo`	(ignored)	—

Multi-line data: Multiple data: lines are joined with \n:

data:line1
data:line2
data:line3

→ ev.data = "line1\nline2\nline3"

Parser Internal Structure

struct xSseParser_ {
    xBuffer  buf;          // Raw incoming data buffer
    size_t   pos;          // Parse position within buf
    int      error;        // Allocation failure flag

    char *event_type;      // Current event type (NULL = "message")
    char *data;            // Accumulated data lines
    char *id;              // Last event ID (persists across events)
    int   retry;           // Retry delay in ms (-1 = not set)
};

Data Flow

sequenceDiagram
    participant Server as SSE Server
    participant Curl as libcurl
    participant Writer as sse_write_callback
    participant Parser as xSseParser_
    participant User as User Callback

    Server->>Curl: HTTP 200 text/event-stream
    loop For each chunk
        Curl->>Writer: sse_write_callback(chunk)
        Writer->>Parser: sse_parser_feed(chunk)
        Parser->>Parser: Buffer + parse lines
        alt Empty line (event boundary)
            Parser->>User: on_event(ev)
            alt User returns 0
                User->>Parser: Continue
            else User returns non-zero
                User->>Writer: Close connection
                Writer->>Curl: Return 0 (abort)
            end
        end
    end
    Curl->>User: on_done(curl_code)

SSE Request Structure

struct xSseReq_ {
    struct xHttpReq_   base;        // Base request (shared with oneshot)
    xSseEventFunc      on_event;    // Per-event callback
    xSseDoneFunc       on_done;     // Stream-end callback
    struct xSseParser_ parser;      // SSE parser state
    struct curl_slist  *sse_headers; // Accept: text/event-stream + user headers
};

The SSE request uses a dedicated vtable:

sse_on_done — Invokes the user's on_done callback.
sse_on_cleanup — Frees SSE-specific resources (parser, headers).

Automatic Headers

xHttpClientDoSse() automatically adds:

Accept: text/event-stream
Cache-Control: no-cache

User-provided headers are merged after these defaults.

API Reference

Types

Type	Description
`xSseEvent`	SSE event: `event` (type), `data`, `id`, `retry`
`xSseEventFunc`	`int ()(const xSseEvent ev, void *arg)` — return 0 to continue, non-zero to close
`xSseDoneFunc`	`void ()(int curl_code, void arg)` — called when stream ends

xSseEvent Fields

Field	Type	Description
`event`	`const char *`	Event type. `"message"` if omitted by server.
`data`	`const char *`	Event data. Multi-line data joined by `\n`.
`id`	`const char *`	Last event ID, or NULL.
`retry`	`int`	Retry delay in ms, or -1 if not set.

Functions

Function	Signature	Description	Thread Safety
`xHttpClientGetSse`	`xErrno xHttpClientGetSse(xHttpClient client, const char url, xSseEventFunc on_event, xSseDoneFunc on_done, void arg)`	Subscribe to SSE endpoint (GET).	Not thread-safe
`xHttpClientDoSse`	`xErrno xHttpClientDoSse(xHttpClient client, const xHttpRequestConf config, xSseEventFunc on_event, xSseDoneFunc on_done, void arg)`	Fully-configured SSE request.	Not thread-safe

Usage Examples

Simple SSE Subscription

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/client.h>

static int on_event(const xSseEvent *ev, void *arg) {
    (void)arg;
    printf("[%s] %s\n", ev->event, ev->data);
    return 0; // Continue receiving
}

static void on_done(int curl_code, void *arg) {
    (void)arg;
    printf("Stream ended (code=%d)\n", curl_code);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpClient client = xHttpClientCreate(loop, NULL);

    xHttpClientGetSse(client, "https://example.com/events",
                      on_event, on_done, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

LLM API Streaming (OpenAI-Compatible)

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xhttp/client.h>

static int on_event(const xSseEvent *ev, void *arg) {
    (void)arg;

    // OpenAI sends "[DONE]" as the final data
    if (strcmp(ev->data, "[DONE]") == 0) {
        printf("\n--- Stream complete ---\n");
        return 1; // Close connection
    }

    // Parse JSON and extract content delta...
    printf("%s", ev->data);
    fflush(stdout);
    return 0;
}

static void on_done(int curl_code, void *arg) {
    (void)arg;
    if (curl_code != 0)
        printf("\nStream error (code=%d)\n", curl_code);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpClient client = xHttpClientCreate(loop, NULL);

    const char *body =
        "{"
        "  \"model\": \"gpt-4\","
        "  \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}],"
        "  \"stream\": true"
        "}";

    const char *headers[] = {
        "Content-Type: application/json",
        "Authorization: Bearer sk-your-api-key",
        NULL
    };

    xHttpRequestConf config = {
        .url       = "https://api.openai.com/v1/chat/completions",
        .method    = xHttpMethod_POST,
        .body      = body,
        .body_len  = strlen(body),
        .headers   = headers,
        .timeout_ms = 60000, // 60s timeout for streaming
    };

    xHttpClientDoSse(client, &config, on_event, on_done, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

Early Cancellation

static int on_event(const xSseEvent *ev, void *arg) {
    int *count = (int *)arg;
    (*count)++;

    printf("Event #%d: %s\n", *count, ev->data);

    // Stop after 10 events
    if (*count >= 10) {
        printf("Received enough events, closing.\n");
        return 1; // Non-zero = close connection
    }
    return 0;
}

Use Cases

LLM API Integration — Stream responses from OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible API. Use xHttpClientDoSse() with POST method and JSON body.
Real-Time Notifications — Subscribe to server push notifications (chat messages, stock prices, IoT sensor data) via SSE endpoints.
Log Streaming — Tail remote log streams delivered as SSE events.

Best Practices

Use xHttpClientDoSse() for LLM APIs. Most LLM APIs require POST with a JSON body and custom headers. GetSse is only for simple GET endpoints.
Handle [DONE] signals. Many LLM APIs send a special [DONE] data payload to signal the end of the stream. Return non-zero from on_event to close cleanly.
Set appropriate timeouts. Streaming responses can take a long time. Set timeout_ms high enough (e.g., 60000ms) to avoid premature timeouts.
Don't block in on_event. The callback runs on the event loop thread. Blocking delays all other I/O.
Copy event data if needed. xSseEvent pointers are valid only during the callback.

Comparison with Other Libraries

Feature	xhttp SSE	`eventsource` (JS)	`sseclient-py`	libcurl (manual)
Spec Compliance	W3C SSE	W3C SSE	W3C SSE	Manual parsing
Integration	xEventLoop (async)	Browser event loop	Blocking iterator	Manual
POST Support	Yes (`DoSse`)	No (GET only)	No (GET only)	Manual
Cancellation	Callback return value	`close()`	Break loop	`curl_easy_pause`
Multi-line Data	Auto-joined with `\n`	Auto-joined	Auto-joined	Manual
Language	C99	JavaScript	Python	C

Key Differentiator: xhttp's SSE implementation is unique in supporting POST-based SSE (via xHttpClientDoSse), which is essential for LLM API integration. Most SSE libraries only support GET. The incremental parser integrates seamlessly with the event loop, delivering events as they arrive without buffering the entire stream.

TLS Deployment Guide

This guide covers end-to-end TLS deployment for xhttp, including certificate generation, server and client configuration, and mutual TLS (mTLS). For API reference, see server.md and client.md.

Prerequisites

OpenSSL CLI — Used for certificate generation (openssl command).
TLS backend compiled — xKit must be built with XK_TLS_BACKEND=openssl (or mbedtls). Without a TLS backend, xHttpServerListenTls() returns xErrno_NotSupported.

Check your build:

# If XK_HAS_OPENSSL is defined, TLS is available
grep -r "XK_HAS_OPENSSL" xhttp/

Certificate Generation

Self-Signed Certificate (Development)

For quick local development and testing:

openssl req -x509 -newkey rsa:2048 \
  -keyout server-key.pem \
  -out server.pem \
  -days 365 -nodes \
  -subj '/CN=localhost'

This produces:

server.pem — Self-signed certificate
server-key.pem — Unencrypted private key

Note: Self-signed certificates are not trusted by default. Clients must either set skip_verify = 1 or provide the certificate as a CA via ca.

CA-Signed Certificates (Production / mTLS)

For mutual TLS or production-like setups, create a private CA and sign both server and client certificates.

Step 1: Create a CA

# Generate CA private key and self-signed certificate
openssl req -x509 -newkey rsa:2048 \
  -keyout ca-key.pem \
  -out ca.pem \
  -days 365 -nodes \
  -subj '/CN=MyCA'

Step 2: Generate Server Certificate

# Generate server key + CSR
openssl req -newkey rsa:2048 \
  -keyout server-key.pem \
  -out server.csr \
  -nodes \
  -subj '/CN=localhost'

# Sign with CA
openssl x509 -req \
  -in server.csr \
  -CA ca.pem -CAkey ca-key.pem -CAcreateserial \
  -out server.pem \
  -days 365

# Clean up CSR
rm server.csr

Step 3: Generate Client Certificate (for mTLS)

# Generate client key + CSR
openssl req -newkey rsa:2048 \
  -keyout client-key.pem \
  -out client.csr \
  -nodes \
  -subj '/CN=MyClient'

# Sign with the same CA
openssl x509 -req \
  -in client.csr \
  -CA ca.pem -CAkey ca-key.pem -CAcreateserial \
  -out client.pem \
  -days 365

# Clean up CSR
rm client.csr

After these steps you have:

File	Description
`ca.pem`	CA certificate (trusted by both sides)
`ca-key.pem`	CA private key (keep secure, not deployed)
`server.pem`	Server certificate (signed by CA)
`server-key.pem`	Server private key
`client.pem`	Client certificate (signed by CA)
`client-key.pem`	Client private key

Deployment Scenarios

1. One-Way TLS (Server Authentication Only)

The most common setup: the client verifies the server's identity, but the server does not verify the client.

sequenceDiagram
    participant Client
    participant Server

    Client->>Server: TLS ClientHello
    Server->>Client: Certificate (server.pem)
    Client->>Client: Verify server cert against CA
    Client->>Server: Finished
    Server->>Client: Finished
    Note over Client,Server: Encrypted HTTP traffic

Server:

xTlsConf tls = {
    .cert = "server.pem",
    .key  = "server-key.pem",
};
xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

Client (with CA verification):

xTlsConf tls = {0};
tls.ca = "ca.pem";
xHttpClientConf conf = {.tls = &tls};
xHttpClient client =
    xHttpClientCreate(loop, &conf);

xHttpClientGet(
    client,
    "https://localhost:8443/hello",
    on_response, NULL);

Client (skip verification — development only):

xTlsConf tls = {0};
tls.skip_verify = 1;
xHttpClientConf conf = {.tls = &tls};
xHttpClient client =
    xHttpClientCreate(loop, &conf);

2. Mutual TLS (mTLS)

Both sides authenticate each other. The server requires a valid client certificate signed by a trusted CA.

sequenceDiagram
    participant Client
    participant Server

    Client->>Server: TLS ClientHello
    Server->>Client: Certificate (server.pem) + CertificateRequest
    Client->>Client: Verify server cert against CA
    Client->>Server: Certificate (client.pem)
    Server->>Server: Verify client cert against CA
    Client->>Server: Finished
    Server->>Client: Finished
    Note over Client,Server: Mutually authenticated encrypted traffic

Server:

xTlsConf tls = {
    .cert     = "server.pem",
    .key      = "server-key.pem",
    .ca       = "ca.pem",       // CA to verify client certs
};
xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

Client:

xTlsConf tls = {0};
tls.ca   = "ca.pem";
tls.cert = "client.pem";
tls.key  = "client-key.pem";
xHttpClientConf conf = {.tls = &tls};
xHttpClient client =
    xHttpClientCreate(loop, &conf);

xHttpClientGet(
    client,
    "https://localhost:8443/secure",
    on_response, NULL);

3. HTTP + HTTPS on Different Ports

A single xHttpServer can serve both cleartext HTTP and HTTPS simultaneously:

// HTTP on port 8080
xHttpServerListen(server, "0.0.0.0", 8080);

// HTTPS on port 8443
xTlsConf tls = {
    .cert = "server.pem",
    .key  = "server-key.pem",
};
xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

Routes are shared — the same handlers serve both HTTP and HTTPS traffic.

Complete End-to-End Example

A full working example: CA-signed mTLS with server and client.

Generate Certificates

#!/bin/bash
set -e

# CA
openssl req -x509 -newkey rsa:2048 \
  -keyout ca-key.pem -out ca.pem \
  -days 365 -nodes -subj '/CN=TestCA'

# Server
openssl req -newkey rsa:2048 \
  -keyout server-key.pem -out server.csr \
  -nodes -subj '/CN=localhost'
openssl x509 -req -in server.csr \
  -CA ca.pem -CAkey ca-key.pem -CAcreateserial \
  -out server.pem -days 365
rm server.csr

# Client
openssl req -newkey rsa:2048 \
  -keyout client-key.pem -out client.csr \
  -nodes -subj '/CN=MyClient'
openssl x509 -req -in client.csr \
  -CA ca.pem -CAkey ca-key.pem -CAcreateserial \
  -out client.pem -days 365
rm client.csr

echo "Generated: ca.pem, server.pem, server-key.pem, client.pem, client-key.pem"

Server Code

#include <stdio.h>
#include <string.h>
#include <xbase/event.h>
#include <xhttp/server.h>

static void on_secure(xHttpResponseWriter w, const xHttpRequest *req, void *arg) {
    (void)req; (void)arg;
    xHttpResponseSetHeader(w, "Content-Type", "text/plain");
    xHttpResponseSend(w, "mTLS OK!\n", 9);
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();
    xHttpServer server = xHttpServerCreate(loop);

    xHttpServerRoute(server, "GET /secure", on_secure, NULL);

    xTlsConf tls = {
        .cert     = "server.pem",
        .key      = "server-key.pem",
        .ca       = "ca.pem",
    };
    xHttpServerListenTls(server, "0.0.0.0", 8443, &tls);

    printf("mTLS server listening on :8443\n");
    xEventLoopRun(loop);

    xHttpServerDestroy(server);
    xEventLoopDestroy(loop);
    return 0;
}

Client Code

#include <stdio.h>
#include <xbase/event.h>
#include <xhttp/client.h>

static void on_response(const xHttpResponse *resp, void *arg) {
    (void)arg;
    if (resp->curl_code == 0) {
        printf("HTTP %ld: %.*s\n", resp->status_code,
               (int)resp->body_len, resp->body);
    } else {
        printf("TLS error: %s\n", resp->curl_error);
    }
}

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xTlsConf tls = {0};
    tls.ca   = "ca.pem";
    tls.cert = "client.pem";
    tls.key  = "client-key.pem";
    xHttpClientConf conf = {.tls = &tls};
    xHttpClient client =
        xHttpClientCreate(loop, &conf);

    xHttpClientGet(client, "https://localhost:8443/secure",
                   on_response, NULL);

    xEventLoopRun(loop);
    xHttpClientDestroy(client);
    xEventLoopDestroy(loop);
    return 0;
}

Verify with curl

# One-way TLS (skip verify)
curl -k https://localhost:8443/secure

# One-way TLS (with CA)
curl --cacert ca.pem https://localhost:8443/secure

# mTLS
curl --cacert ca.pem \
     --cert client.pem \
     --key client-key.pem \
     https://localhost:8443/secure

`skip_verify` Behavior

Value	Behavior
`0` (default)	Peer verification enabled. Server verifies client cert (if `ca` is set); client verifies server cert.
non-zero	All peer verification disabled. Development only.

ALPN and HTTP/2 over TLS

When TLS is enabled, ALPN (Application-Layer Protocol Negotiation) automatically selects the HTTP protocol:

If the client supports HTTP/2, ALPN negotiates h2 and the connection uses HTTP/2 framing.
Otherwise, ALPN falls back to http/1.1.

This is transparent to application code — the same routes and handlers work regardless of the negotiated protocol.

Troubleshooting

Symptom	Cause	Fix
`xErrno_NotSupported` from `ListenTls`	No TLS backend compiled	Rebuild with `XK_TLS_BACKEND=openssl`
Client gets `curl_code != 0`, `status_code == 0`	TLS handshake failed	Check cert paths, CA trust, and `skip_verify` settings
Self-signed cert rejected	Client verifies against system CA bundle	Set `ca` to the self-signed cert, or use `skip_verify = 1` for dev
mTLS handshake fails	Client didn't provide cert, or cert not signed by server's `ca`	Ensure client cert is signed by the same CA specified in server's `ca`
"wrong CA path" error	`ca` points to non-existent file	Verify the file path exists and is readable
Connection works with `skip_verify` but not without	Server cert CN doesn't match hostname, or CA not trusted	Use `ca` pointing to the signing CA, ensure CN matches the hostname

Security Best Practices

Never use skip_verify in production. It disables all certificate validation, making the connection vulnerable to MITM attacks.
Keep private keys secure. ca-key.pem, server-key.pem, and client-key.pem should have restricted file permissions (chmod 600).
Use short-lived certificates. Set reasonable expiry (-days) and rotate certificates before they expire.
For mTLS, set ca on the server side. Verification is enabled by default (skip_verify = 0), so the server will require a valid client certificate when ca is set.
Don't deploy the CA private key. Only ca.pem (the public certificate) needs to be distributed. Keep ca-key.pem offline or in a secure vault.
Match CN/SAN to hostname. The server certificate's Common Name (or Subject Alternative Name) should match the hostname clients use to connect.

API Quick Reference

Server Side

Item	Description
`xTlsConf`	Struct: `cert`, `key`, `ca`, `key_password`, `alpn`, `skip_verify`
`xHttpServerListenTls()`	Start HTTPS listener with TLS config

Client Side

Item	Description
`xTlsConf`	Struct: `cert`, `key`, `ca`, `key_password`, `alpn`, `skip_verify`
`xHttpClientConf`	Struct: `tls` (pointer to `xTlsConf`), `http_version`
`xHttpClientCreate()`	Create client with TLS config via `xHttpClientConf`.

WebSocket Client Side

Item	Description
`xTlsConf`	Struct: `cert`, `key`, `ca`, `key_password`, `alpn`, `skip_verify`
`xTlsCtx`	Opaque shared TLS context from `xTlsCtxCreate()`
`xWsConnectConf`	Struct: `tls` (pointer to `xTlsConf`), `tls_ctx` (shared context, priority over `tls`)
`xWsConnect()`	Initiate async WebSocket connection with optional TLS.

For full API details, see server.md and client.md.

xlog — Async Logging

Introduction

xlog is xKit's high-performance asynchronous logging module. It formats log entries on the calling thread and flushes them to a file (or stderr) on the event loop thread, decoupling I/O latency from application logic. Three operating modes — Timer, Notify, and Mixed — offer different trade-offs between flush latency and overhead.

Design Philosophy

Async by Default — Log messages are formatted on the calling thread and enqueued via a lock-free MPSC queue. The event loop thread drains the queue and writes to disk, ensuring that logging never blocks the caller (except for Fatal level).
Three Modes for Different Needs — Timer mode batches writes for throughput; Notify mode uses a pipe for low-latency delivery; Mixed mode combines both, using the timer for normal messages and the pipe for high-severity entries.
Event Loop Integration — The logger is bound to an xEventLoop and uses its timer and I/O facilities. This means no dedicated logging thread — the event loop thread handles both I/O and log flushing.
Thread-Local Context — xLoggerEnter() sets the current thread's logger, enabling the XLOG_*() macros and bridging xbase's internal xLog() calls to the async pipeline.

Architecture

graph TD
    subgraph "Application Threads"
        T1["Thread 1<br/>xLoggerLog()"]
        T2["Thread 2<br/>XLOG_INFO()"]
        T3["Thread 3<br/>xLog() (xbase internal)"]
    end

    subgraph "Lock-Free Queue"
        MPSC["MPSC Queue<br/>(xbase/mpsc.h)"]
    end

    subgraph "Event Loop Thread"
        TIMER["Timer Callback<br/>(periodic flush)"]
        PIPE["Pipe Callback<br/>(immediate flush)"]
        FLUSH["logger_flush_entries()"]
        WRITE["fwrite() + fflush()"]
        ROTATE["File Rotation"]
    end

    subgraph "Output"
        FILE["Log File"]
        STDERR["stderr"]
    end

    T1 -->|"format + enqueue"| MPSC
    T2 -->|"format + enqueue"| MPSC
    T3 -->|"bridge_callback"| MPSC
    MPSC --> FLUSH
    TIMER --> FLUSH
    PIPE --> FLUSH
    FLUSH --> WRITE
    WRITE --> FILE
    WRITE --> STDERR
    WRITE -->|"max_size exceeded"| ROTATE

    style MPSC fill:#f5a623,color:#fff
    style FLUSH fill:#50b86c,color:#fff

Sub-Module Overview

File	Description	Doc
`logger.h`	Async logger API, macros, and configuration	logger.md

Quick Start

#include <xbase/event.h>
#include <xlog/logger.h>

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xLoggerConf conf = {
        .loop             = loop,
        .path             = "app.log",
        .mode             = xLogMode_Mixed,
        .level            = xLogLevel_Info,
        .max_size         = 10 * 1024 * 1024, // 10MB
        .max_files        = 5,
        .flush_interval_ms = 100,
    };

    xLogger logger = xLoggerCreate(conf);
    xLoggerEnter(logger); // Set as thread-local logger

    XLOG_INFO("Application started, version %d.%d", 1, 0);
    XLOG_WARN("Low memory: %zu bytes remaining", (size_t)1024);

    // Run event loop (processes log flushes)
    xEventLoopRun(loop);

    xLoggerLeave();
    xLoggerDestroy(logger);
    xEventLoopDestroy(loop);
    return 0;
}

Relationship with Other Modules

xbase/event.h — The logger is bound to an xEventLoop for timer-driven and pipe-driven flush.
xbase/mpsc.h — Uses the lock-free MPSC queue to pass log entries from producer threads to the event loop thread.
xbase/log.h — xLoggerEnter() bridges xbase's internal xLog() calls to the async logger via the thread-local callback mechanism.
xbase/atomic.h — Uses atomic operations for the lock-free entry freelist.

logger.h — High-Performance Async Logger

Introduction

logger.h provides xLogger, a high-performance asynchronous logger that formats log entries on the calling thread and flushes them to a file (or stderr) on the event loop thread. It supports three operating modes (Timer, Notify, Mixed), five severity levels, file rotation, synchronous flush, and seamless bridging with xbase's internal xLog() mechanism.

Design Philosophy

Format on Caller, Write on Loop — Log messages are formatted (snprintf) on the calling thread into a pre-allocated entry buffer, then enqueued via the lock-free MPSC queue. The event loop thread dequeues and writes to disk. This decouples I/O latency from application logic.
Three Operating Modes — Different applications have different latency/throughput requirements:
- Timer — Periodic flush (default 100ms). Best throughput, highest latency.
- Notify — Pipe-based immediate notification. Lowest latency, highest overhead.
- Mixed — Timer for normal messages, pipe for Error/Fatal. Best balance.
Lock-Free Entry Pool — A global Treiber stack freelist recycles log entry structs across all threads, avoiding malloc/free on the hot path.
Fatal = Synchronous + Abort — Fatal-level messages bypass the async queue entirely: they are written directly to the file and followed by abort(). This ensures the fatal message is never lost.
xbase Bridge — xLoggerEnter() registers a callback with xbase's xLogSetCallback(), routing all internal xKit error messages through the async logger.

Architecture

graph TD
    subgraph "xLogger Internal"
        MPSC["MPSC Queue<br/>(head, tail)"]
        TIMER["xEventLoopTimer<br/>(periodic flush)"]
        PIPE["Pipe<br/>(notify flush)"]
        FLUSH_PIPE["Flush Request Pipe<br/>(sync flush)"]
        FREELIST["Entry Freelist<br/>(Treiber stack)"]
        FP["FILE *fp<br/>(log file or stderr)"]
    end

    subgraph "xbase Dependencies"
        EVENT["xEventLoop"]
        MPSC_LIB["xbase/mpsc.h"]
        ATOMIC_LIB["xbase/atomic.h"]
        LOG_LIB["xbase/log.h"]
    end

    TIMER --> EVENT
    PIPE --> EVENT
    FLUSH_PIPE --> EVENT
    MPSC --> MPSC_LIB
    FREELIST --> ATOMIC_LIB

    style MPSC fill:#f5a623,color:#fff
    style FREELIST fill:#4a90d9,color:#fff

Implementation Details

Three Operating Modes

graph LR
    subgraph "Timer Mode"
        T_ENQUEUE["Enqueue"] --> T_TIMER["Timer fires<br/>(every 100ms)"]
        T_TIMER --> T_FLUSH["Flush all entries"]
    end

    subgraph "Notify Mode"
        N_ENQUEUE["Enqueue"] --> N_PIPE["Write 1 byte to pipe"]
        N_PIPE --> N_LOOP["Pipe readable event"]
        N_LOOP --> N_FLUSH["Flush all entries"]
    end

    subgraph "Mixed Mode"
        M_ENQUEUE["Enqueue"]
        M_ENQUEUE -->|"Debug/Info/Warn"| M_TIMER["Timer fires"]
        M_ENQUEUE -->|"Error/Fatal"| M_PIPE["Write to pipe"]
        M_TIMER --> M_FLUSH["Flush all entries"]
        M_PIPE --> M_FLUSH
    end

    style T_FLUSH fill:#50b86c,color:#fff
    style N_FLUSH fill:#50b86c,color:#fff
    style M_FLUSH fill:#50b86c,color:#fff

Mode	Flush Trigger	Latency	Overhead	Best For
Timer	Periodic timer (default 100ms)	Up to `flush_interval_ms`	Lowest (no per-message syscall)	High-throughput logging
Notify	Pipe write per message	~Immediate	Highest (1 `write()` per message)	Low-latency debugging
Mixed	Timer + pipe for Error/Fatal	Low for errors, batched for info	Moderate	Production applications

Log Entry Lifecycle

sequenceDiagram
    participant App as Application Thread
    participant Pool as Entry Freelist
    participant Queue as MPSC Queue
    participant L as Event Loop Thread
    participant File as Log File

    App->>Pool: entry_alloc()
    Pool-->>App: "xLogEntry_ (recycled or malloc'd)"
    App->>App: "snprintf(entry->buf, timestamp + level + message)"
    App->>Queue: xMpscPush(entry)
    Note over App: "Optional: write(pipe_wfd, 1) for Notify/Mixed"

    L->>Queue: "xMpscPop() (timer or pipe callback)"
    Queue-->>L: xLogEntry_
    L->>File: "fwrite(entry->buf)"
    L->>Pool: entry_free(entry)
    L->>File: fflush()

Log Entry Structure

struct xLogEntry_ {
    xMpsc           node;       // MPSC queue node
    xLogLevel       level;      // Severity level
    int             len;        // Formatted message length
    char            buf[XLOG_ENTRY_BUF_SIZE]; // Formatted message (512 bytes)
    struct xLogEntry_ *free_next; // Freelist link
};

Lock-Free Entry Freelist

The freelist uses a Treiber stack with atomic CAS:

Alloc: Pop from freelist head (CAS loop). Fallback to malloc() if empty.
Free: Push to freelist head (CAS loop). If count exceeds XLOG_FREELIST_SIZE, call free() instead.

The count check is intentionally racy (soft cap) to keep the fast path lean.

File Rotation

When written >= max_size and max_files > 1:

Delete path.{max_files-1} (oldest)
Cascade rename: path.{i-1} → path.{i} for i = max_files-1 down to 2
Rename path → path.1
Reopen path in append mode

app.log      → app.log.1
app.log.1    → app.log.2
app.log.2    → app.log.3
app.log.3    → (deleted if max_files=4)

Synchronous Flush

xLoggerFlush() writes a byte to a dedicated flush-request pipe, triggering logger_flush_req_cb on the event loop thread. The caller then busy-waits (polling xMpscEmpty() every 1ms, up to 1 second) until the queue is drained.

Log Format

2025-04-04 16:30:00.123 INFO  Application started
2025-04-04 16:30:00.456 WARN  Low memory: 1024 bytes remaining
2025-04-04 16:30:01.789 ERROR Connection refused

Format: YYYY-MM-DD HH:MM:SS.mmm LEVEL message\n

API Reference

Types

Type	Description
`xLogger`	Opaque handle to an async logger
`xLogLevel`	Enum: `Debug`, `Info`, `Warn`, `Error`, `Fatal`
`xLogMode`	Enum: `Timer`, `Notify`, `Mixed`
`xLoggerConf`	Configuration struct for creating a logger

xLoggerConf Fields

Field	Type	Default	Description
`loop`	`xEventLoop`	(required)	Event loop for timer/pipe callbacks
`path`	`const char *`	NULL (stderr)	Log file path
`mode`	`xLogMode`	`Timer`	Operating mode
`level`	`xLogLevel`	`Info`	Minimum log level
`max_size`	`size_t`	0 (no rotation)	Max file size before rotation
`max_files`	`int`	0 (no rotation)	Total files to keep (including current)
`flush_interval_ms`	`uint64_t`	100	Timer/Mixed flush interval

Functions

Function	Signature	Description	Thread Safety
`xLoggerCreate`	`xLogger xLoggerCreate(xLoggerConf conf)`	Create a logger.	Not thread-safe
`xLoggerDestroy`	`void xLoggerDestroy(xLogger logger)`	Flush remaining entries and destroy.	Not thread-safe
`xLoggerLog`	`void xLoggerLog(xLogger logger, xLogLevel level, const char *fmt, ...)`	Write a log entry. Fatal is synchronous + abort.	Thread-safe
`xLoggerFlush`	`void xLoggerFlush(xLogger logger)`	Synchronously flush all pending entries.	Thread-safe
`xLoggerEnter`	`void xLoggerEnter(xLogger logger)`	Set as thread-local logger + bridge xbase log.	Thread-local
`xLoggerLeave`	`void xLoggerLeave(void)`	Clear thread-local logger.	Thread-local
`xLoggerCurrent`	`xLogger xLoggerCurrent(void)`	Get current thread's logger.	Thread-local

Convenience Macros

Using thread-local logger (set via xLoggerEnter):

Macro	Expands To
`XLOG_DEBUG(fmt, ...)`	`xLoggerLog(xLoggerCurrent(), xLogLevel_Debug, fmt, ...)`
`XLOG_INFO(fmt, ...)`	`xLoggerLog(xLoggerCurrent(), xLogLevel_Info, fmt, ...)`
`XLOG_WARN(fmt, ...)`	`xLoggerLog(xLoggerCurrent(), xLogLevel_Warn, fmt, ...)`
`XLOG_ERROR(fmt, ...)`	`xLoggerLog(xLoggerCurrent(), xLogLevel_Error, fmt, ...)`
`XLOG_FATAL(fmt, ...)`	`xLoggerLog(xLoggerCurrent(), xLogLevel_Fatal, fmt, ...)`

Explicit logger variants: XLOG_DEBUG_L(logger, fmt, ...), etc.

Usage Examples

Basic File Logging

#include <xbase/event.h>
#include <xlog/logger.h>

int main(void) {
    xEventLoop loop = xEventLoopCreate();

    xLoggerConf conf = {
        .loop  = loop,
        .path  = "app.log",
        .mode  = xLogMode_Timer,
        .level = xLogLevel_Info,
    };

    xLogger logger = xLoggerCreate(conf);
    xLoggerEnter(logger);

    XLOG_INFO("Server started on port %d", 8080);
    XLOG_DEBUG("This is filtered out (level < Info)");
    XLOG_WARN("Connection pool at %d%% capacity", 85);

    xEventLoopRun(loop);

    xLoggerLeave();
    xLoggerDestroy(logger);
    xEventLoopDestroy(loop);
    return 0;
}

File Rotation Example

xLoggerConf conf = {
    .loop      = loop,
    .path      = "/var/log/myapp.log",
    .mode      = xLogMode_Mixed,
    .level     = xLogLevel_Info,
    .max_size  = 50 * 1024 * 1024, // 50MB per file
    .max_files = 10,                // Keep 10 files (500MB total)
};

Multi-Threaded Logging

#include <pthread.h>
#include <xlog/logger.h>

static xLogger g_logger;

static void *worker(void *arg) {
    int id = *(int *)arg;
    xLoggerEnter(g_logger); // Each thread must enter

    for (int i = 0; i < 1000; i++) {
        XLOG_INFO("Worker %d: iteration %d", id, i);
    }

    xLoggerLeave();
    return NULL;
}

// In main():
// g_logger = xLoggerCreate(conf);
// pthread_create(&threads[i], NULL, worker, &ids[i]);

Synchronous Flush Before Exit

void graceful_shutdown(xLogger logger) {
    XLOG_INFO("Shutting down...");
    xLoggerFlush(logger); // Block until all entries are written
    xLoggerDestroy(logger);
}

Use Cases

Application Logging — Primary use case: structured, async logging for server applications with file rotation and level filtering.
xKit Internal Error Capture — Via xLoggerEnter(), all xKit internal errors (from xLog()) are automatically routed through the async logger.
Debug Logging — Use xLogMode_Notify during development for immediate log output without timer delay.

Best Practices

Call xLoggerEnter() on every thread that uses XLOG_*() macros. Each thread needs its own thread-local context.
Use Mixed mode for production. It provides the best balance: batched writes for normal messages, immediate notification for errors.
Set appropriate rotation limits. Without rotation (max_size = 0), log files grow unbounded.
Call xLoggerFlush() before shutdown to ensure all pending messages are written.
Don't log in tight loops at Debug level without checking the level first. While the level filter is cheap, formatting still costs CPU.
Fatal messages are synchronous. XLOG_FATAL() writes directly and calls abort(). Don't rely on async delivery for fatal messages.

Comparison with Other Libraries

Feature	xlog logger.h	spdlog	zlog	log4c
Language	C99	C++11	C	C
Async Model	MPSC queue + event loop	Dedicated thread + queue	Dedicated thread	Synchronous
Modes	Timer / Notify / Mixed	Async (thread pool)	Async (thread)	Sync only
Lock-Free	Yes (MPSC + Treiber stack)	Yes (MPMC queue)	No (mutex)	No (mutex)
Event Loop	Integrated (xEventLoop)	None (own thread)	None (own thread)	None
File Rotation	Size-based (cascade rename)	Size-based	Size/time-based	Size-based
Format	printf-style	fmt-style / printf	printf-style	printf-style
Thread-Local Context	Yes (`xLoggerEnter`)	No	Yes (MDC)	Yes (NDC)
Fatal Handling	Sync write + abort	Flush + abort	Configurable	Configurable

Key Differentiator: xlog is unique in integrating with an event loop rather than spawning a dedicated logging thread. This means the same thread that handles network I/O also handles log flushing, reducing context switches and thread count. The three-mode design (Timer/Notify/Mixed) gives fine-grained control over the latency/throughput trade-off that most logging libraries don't offer.

Benchmark

End-to-end benchmarks for xKit, measuring real-world performance across complete scenarios.

All benchmarks run on Apple M3 Pro (12 cores, 36 GB), macOS 26.4, Clang 17, Release (-O2).

For micro-benchmark results, see the Benchmark section at the bottom of each module's documentation page.

Available Benchmarks

Benchmark	Description
HTTP Server	xKit single-threaded HTTP/1.1 server vs Go `net/http` — 152 K req/s, +15–60% faster across all scenarios
HTTP/2 Server	xKit single-threaded h2c server vs Go `net/http` + `x/net/http2` — 576 K req/s, +15–405% faster across all scenarios
HTTPS Server	xKit single-threaded HTTPS server vs Go `net/http` + `crypto/tls` — 512 K req/s (HTTPS/2), TLS-bound parity on HTTPS/1.1

HTTP Server Benchmark

End-to-end HTTP/1.1 server benchmark comparing xKit (single-threaded event-loop) against Go net/http (goroutine-per-connection).

Test Environment

Item	Value
CPU	Apple M3 Pro (12 cores)
Memory	36 GB
OS	macOS 26.4 (Darwin)
Compiler	Apple Clang 17.0.0
Build	Release (`-O2`)
Load Generator	wrk — 4 threads, 10s duration

Server Implementations

xKit (`bench/http_bench_server.cpp`)

Single-threaded event-loop HTTP/1.1 server built on xbase/event.h + xhttp/server.h. Uses kqueue on macOS, epoll on Linux. All I/O is handled in one thread — no thread pool, no goroutines.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel
./build/bench/http_bench_server 8080

Go (`bench/http_bench_server.go`)

Standard net/http server with default settings. Go's runtime spawns one goroutine per connection and uses its own epoll/kqueue poller internally.

go build -o build/bench/go_http_bench bench/http_bench_server.go
./build/bench/go_http_bench 8081

Routes

Both servers implement identical routes:

Route	Method	Description
`/ping`	GET	Returns `"pong"` (4 bytes) — minimal response latency test
`/echo?size=N`	GET	Returns N bytes of `'x'` — variable response size test
`/echo`	POST	Echoes request body — request body throughput test

Benchmark Methodology

All benchmarks use wrk with the following defaults unless noted:

4 threads (-t4)
100 connections (-c100)
10 seconds (-d10s)

POST benchmarks use Lua scripts to set the request body:

wrk.method = "POST"
wrk.headers["Content-Type"] = "application/octet-stream"
wrk.body = string.rep("x", BODY_SIZE)

Results

GET /ping — Minimal Response Latency

Tests raw request/response overhead with a 4-byte "pong" response. Varies connection count to measure scalability.

Connections	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
50	151,935	128,639	315 μs	365 μs	xKit +18%
100	152,316	128,915	658 μs	761 μs	xKit +18%
200	151,007	128,162	1.33 ms	1.55 ms	xKit +18%
500	155,486	125,471	3.20 ms	3.96 ms	xKit +24%

Analysis:

xKit maintains ~152K req/s regardless of connection count, showing excellent scalability of the single-threaded event loop.
Go's throughput slightly degrades at 500 connections due to goroutine scheduling overhead.
xKit's advantage grows from +18% to +24% as connection count increases — the event loop's O(1) dispatch scales better than goroutine context switching.

GET /echo — Variable Response Size

Tests response serialization throughput with different payload sizes. Fixed at 100 connections.

Response Size	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
64 B	150,592	127,432	666 μs	771 μs	xKit +18%
256 B	146,487	126,907	682 μs	774 μs	xKit +15%
1 KiB	144,831	125,729	689 μs	785 μs	xKit +15%
4 KiB	141,511	91,886	707 μs	1.08 ms	xKit +54%

Analysis:

xKit throughput degrades gracefully from 151K to 142K req/s as response size grows from 64B to 4KB — only a 6% drop.
Go drops sharply at 4KB (92K req/s, -27% from 64B), likely due to bytes.Repeat allocation pressure and GC overhead.
xKit's largest advantage (+54%) appears at 4KB, where Go's per-request heap allocation becomes the bottleneck.

POST /echo — Request Body Throughput

Tests request body parsing and echo throughput. Fixed at 100 connections.

Body Size	xKit Req/s	Go Req/s	xKit Transfer/s	Go Transfer/s	Δ
1 KiB	141,495	122,584	152.35 MB/s	133.51 MB/s	xKit +15%
4 KiB	133,935	83,512	536.60 MB/s	337.13 MB/s	xKit +60%
16 KiB	82,231	53,828	1.26 GB/s	848.10 MB/s	xKit +53%
64 KiB	35,908	31,124	2.20 GB/s	1.90 GB/s	xKit +15%

Analysis:

xKit achieves 2.20 GB/s transfer rate at 64KB body size — impressive for a single-threaded server.
The largest advantage (+60%) appears at 4KB, consistent with the GET /echo pattern — Go's allocation overhead dominates at medium payload sizes.
At 64KB, the gap narrows to +15% as both servers become I/O bound (kernel socket buffer management dominates).

Summary

                    xKit vs Go net/http (Release build)
                    ====================================

  GET /ping:     xKit +18% ~ +24%   (consistent across all concurrency levels)
  GET /echo:     xKit +15% ~ +54%   (advantage grows with response size)
  POST /echo:    xKit +15% ~ +60%   (advantage peaks at medium body sizes)

  Peak throughput:  xKit 155K req/s (GET /ping, 500 connections)
  Peak transfer:    xKit 2.20 GB/s  (POST /echo, 64KB body)

Key Takeaways:

xKit wins every scenario. A single-threaded C event loop outperforms Go's multi-goroutine runtime across all request types and payload sizes.
Scalability. xKit's throughput is nearly flat from 50 to 500 connections. Go degrades under high connection counts due to goroutine scheduling overhead.
Payload efficiency. xKit's advantage is most pronounced at medium payloads (1–4 KiB) where Go's per-request heap allocation and GC pressure become significant.
Architecture matters. xKit's single-threaded design eliminates all synchronization overhead. Go pays for goroutine creation, scheduling, and garbage collection on every request.

Reproducing

# Build xKit server
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel

# Build Go server
go build -o build/bench/go_http_bench bench/http_bench_server.go

# Run xKit benchmark
./build/bench/http_bench_server 8080 &
wrk -t4 -c100 -d10s http://127.0.0.1:8080/ping
wrk -t4 -c100 -d10s "http://127.0.0.1:8080/echo?size=64"
wrk -t4 -c100 -d10s "http://127.0.0.1:8080/echo?size=4096"

# POST with lua script
cat > /tmp/post.lua << 'EOF'
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/octet-stream"
wrk.body = string.rep("x", 4096)
EOF
wrk -t4 -c100 -d10s -s /tmp/post.lua http://127.0.0.1:8080/echo

# Run Go benchmark (same wrk commands, different port)
./build/bench/go_http_bench 8081 &
wrk -t4 -c100 -d10s http://127.0.0.1:8081/ping

HTTP/2 Server Benchmark

End-to-end HTTP/2 (h2c, cleartext) server benchmark comparing xKit (single-threaded event-loop) against Go net/http + x/net/http2/h2c (goroutine-per-connection).

Test Environment

Item	Value
CPU	Apple M3 Pro (12 cores)
Memory	36 GB
OS	macOS 26.4 (Darwin)
Compiler	Apple Clang 17.0.0
Build	Release (`-O2`)
Load Generator	h2load (nghttp2 1.68.1) — 4 threads, 10s duration, 10 max concurrent streams per connection

Server Implementations

xKit (`bench/http_bench_server.cpp`)

Single-threaded event-loop HTTP/2 server built on xbase/event.h + xhttp/server.h. Supports h2c (cleartext HTTP/2) via Prior Knowledge — the same binary as the HTTP/1.1 benchmark, since xKit auto-detects the protocol on the first bytes of each connection.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel
./build/bench/http_bench_server 8080

Go (`bench/h2c_bench_server.go`)

Standard net/http server wrapped with golang.org/x/net/http2/h2c.NewHandler() to support cleartext HTTP/2 via Prior Knowledge. Go's runtime spawns one goroutine per connection and uses its own epoll/kqueue poller internally.

cd bench && go build -o ../build/bench/go_h2c_bench h2c_bench_server.go
./build/bench/go_h2c_bench 8081

Routes

Both servers implement identical routes:

Route	Method	Description
`/ping`	GET	Returns `"pong"` (4 bytes) — minimal response latency test
`/echo?size=N`	GET	Returns N bytes of `'x'` — variable response size test
`/echo`	POST	Echoes request body — request body throughput test

Benchmark Methodology

All benchmarks use h2load with the following defaults unless noted:

4 threads (-t4)
100 connections (-c100)
10 max concurrent streams per connection (-m10)
10 seconds (-D 10)

POST benchmarks use -d <file> to specify the request body.

Why h2load? Unlike wrk (HTTP/1.1 only), h2load is purpose-built for HTTP/2 benchmarking. It supports stream multiplexing (-m), h2c Prior Knowledge, and reports per-stream latency.

Results

GET /ping — Minimal Response Latency

Tests raw request/response overhead with a 4-byte "pong" response. Varies connection count to measure scalability under HTTP/2 multiplexing.

Connections	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
50	576,249	141,655	863 μs	3.51 ms	xKit +307%
100	561,825	120,732	1.78 ms	8.27 ms	xKit +365%
200	555,800	110,143	3.59 ms	18.10 ms	xKit +405%
500	538,905	136,719	9.22 ms	36.21 ms	xKit +294%

Analysis:

xKit sustains ~560K req/s across all connection counts — a massive improvement over its HTTP/1.1 numbers (~152K) thanks to HTTP/2 stream multiplexing on fewer TCP connections.
Go's h2c throughput (~110–142K) is comparable to its HTTP/1.1 numbers, suggesting Go's HTTP/2 implementation doesn't benefit as much from multiplexing.
xKit's advantage ranges from +294% to +405% — far larger than the +18–24% gap seen in HTTP/1.1. The single-threaded event loop excels at handling multiplexed streams without context-switching overhead.
At 200 connections, xKit's advantage peaks at +405%. Go's throughput degrades more steeply under high connection counts due to goroutine scheduling and HTTP/2 flow control overhead.

GET /echo — Variable Response Size

Tests response serialization throughput with different payload sizes under HTTP/2 framing. Fixed at 100 connections.

Response Size	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
64 B	518,176	123,386	1.92 ms	8.08 ms	xKit +320%
256 B	511,276	116,267	1.95 ms	8.60 ms	xKit +340%
1 KiB	493,405	115,267	2.03 ms	8.64 ms	xKit +328%
4 KiB	383,507	107,457	2.59 ms	9.23 ms	xKit +257%

Analysis:

xKit throughput degrades gracefully from 518K to 384K req/s as response size grows from 64B to 4KB — a 26% drop, mostly due to HTTP/2 DATA frame serialization overhead.
Go stays relatively flat (~107–123K) but at a much lower baseline. The bytes.Repeat allocation + GC pressure is compounded by HTTP/2 framing overhead.
xKit's advantage is consistently +257% to +340% — HTTP/2's HPACK header compression and binary framing amplify xKit's architectural advantage over Go.

POST /echo — Request Body Throughput

Tests request body parsing and echo throughput under HTTP/2. Fixed at 100 connections.

Body Size	xKit Req/s	Go Req/s	xKit Transfer/s	Go Transfer/s	Δ
1 KiB	401,047	119,739	399.45 MB/s	119.82 MB/s	xKit +235%
4 KiB	195,221	90,585	766.61 MB/s	356.84 MB/s	xKit +115%
16 KiB	57,304	41,313	896.83 MB/s	648.24 MB/s	xKit +39%
64 KiB	19,040	16,557	1.16 GB/s	1.01 GB/s	xKit +15%

Analysis:

xKit achieves 1.16 GB/s transfer rate at 64KB body size — comparable to its HTTP/1.1 performance (2.20 GB/s), with the difference attributable to HTTP/2 flow control and framing overhead.
The advantage narrows from +235% (1KB) to +15% (64KB) as both servers become I/O bound. HTTP/2 flow control (default 64KB window) becomes the bottleneck at large payloads.
At small payloads (1KB), xKit's +235% advantage shows the efficiency of its nghttp2-based H2 implementation vs Go's x/net/http2.

HTTP/2 vs HTTP/1.1 Comparison

How does HTTP/2 compare to HTTP/1.1 for each server? (GET /ping, 100 connections)

Server	HTTP/1.1 Req/s	HTTP/2 Req/s	Δ
xKit	152,316	561,825	+269%
Go	128,915	120,732	−6%

Key Insight: xKit's single-threaded event loop benefits enormously from HTTP/2 multiplexing — handling multiple streams on fewer connections eliminates per-connection overhead. Go's goroutine-per-connection model doesn't gain from multiplexing because it already handles concurrency at the goroutine level; the added HTTP/2 framing overhead actually causes a slight regression.

Summary

                    xKit vs Go h2c (Release build, h2load -m10)
                    =============================================

  GET /ping:     xKit +294% ~ +405%   (massive advantage across all concurrency)
  GET /echo:     xKit +257% ~ +340%   (consistent across all response sizes)
  POST /echo:    xKit +15%  ~ +235%   (advantage narrows as payloads grow)

  Peak throughput:  xKit 576K req/s  (GET /ping, 50 connections)
  Peak transfer:    xKit 1.16 GB/s   (POST /echo, 64KB body)

Key Takeaways:

HTTP/2 amplifies xKit's advantage. The gap widens from +18–24% (HTTP/1.1) to +294–405% (HTTP/2) on GET /ping. Stream multiplexing plays to the strengths of a single-threaded event loop.
xKit scales with multiplexing. xKit's throughput jumps from 152K (HTTP/1.1) to 576K (HTTP/2) req/s — a 3.8× improvement. Go's throughput stays flat or slightly regresses.
Payload efficiency. At small-to-medium payloads, xKit's nghttp2-based H2 implementation is dramatically faster. At large payloads (64KB), both servers converge as I/O and flow control dominate.
Architecture matters even more for H2. HTTP/2's stream multiplexing, HPACK compression, and flow control add complexity that a lean C event loop handles more efficiently than Go's runtime.

Reproducing

# Build xKit server
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel

# Build Go h2c server
cd bench && go build -o ../build/bench/go_h2c_bench h2c_bench_server.go && cd ..

# Install h2load (macOS)
brew install nghttp2

# Start servers
./build/bench/http_bench_server 8080 &
./build/bench/go_h2c_bench 8081 &

# GET /ping benchmark
h2load -t4 -c100 -m10 -D 10 http://127.0.0.1:8080/ping
h2load -t4 -c100 -m10 -D 10 http://127.0.0.1:8081/ping

# GET /echo benchmark
h2load -t4 -c100 -m10 -D 10 "http://127.0.0.1:8080/echo?size=1024"
h2load -t4 -c100 -m10 -D 10 "http://127.0.0.1:8081/echo?size=1024"

# POST /echo benchmark (create body file first)
dd if=/dev/zero bs=4096 count=1 | tr '\0' 'x' > /tmp/body_4k.bin
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin http://127.0.0.1:8080/echo
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin http://127.0.0.1:8081/echo

# Cleanup
pkill -f http_bench_server
pkill -f go_h2c_bench

HTTPS Server Benchmark

End-to-end HTTPS server benchmark comparing xKit (single-threaded event-loop, OpenSSL) against Go net/http + crypto/tls (goroutine-per-connection). Tests both HTTPS/1.1 (wrk) and HTTPS/2 (h2load with ALPN).

Test Environment

Item	Value
CPU	Apple M3 Pro (12 cores)
Memory	36 GB
OS	macOS 26.4 (Darwin)
Compiler	Apple Clang 17.0.0
Build	Release (`-O2`)
TLS Backend	OpenSSL 3.6.1 (xKit), Go crypto/tls (Go)
Certificate	RSA 2048-bit self-signed, TLS 1.3
Load Generator	wrk (HTTP/1.1 over TLS), h2load (HTTP/2 over TLS with ALPN)

Server Implementations

xKit (`bench/https_bench_server.cpp`)

Single-threaded event-loop HTTPS server built on xbase/event.h + xhttp/server.h + OpenSSL. Uses xHttpServerListenTls() which automatically sets ALPN to {"h2", "http/1.1"}, so the same server handles both HTTPS/1.1 and HTTPS/2 depending on client negotiation.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel
openssl req -x509 -newkey rsa:2048 -keyout bench_key.pem -out bench_cert.pem \
  -days 365 -nodes -subj '/CN=localhost'
./build/bench/https_bench_server 8443 bench_cert.pem bench_key.pem

Go (`bench/https_bench_server.go`)

Standard net/http server with crypto/tls and x/net/http2.ConfigureServer(). Go's TLS implementation is in pure Go (crypto/tls), while xKit uses OpenSSL's C implementation. Both servers configure ALPN for h2 and http/1.1.

cd bench && go build -o ../build/bench/go_https_bench https_bench_server.go
./build/bench/go_https_bench 8444 bench_cert.pem bench_key.pem

Routes

Both servers implement identical routes:

Route	Method	Description
`/ping`	GET	Returns `"pong"` (4 bytes) — minimal response latency test
`/echo?size=N`	GET	Returns N bytes of `'x'` — variable response size test
`/echo`	POST	Echoes request body — request body throughput test

Results

HTTPS/1.1 — GET /ping (wrk, varying connections)

Tests HTTPS/1.1 performance where each connection maintains its own TLS session. wrk reuses connections (no per-request handshake), so this measures encrypted request/response throughput.

Connections	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
50	125,147	125,076	395 μs	372 μs	≈ 0%
100	124,593	128,277	0.86 ms	764 μs	Go +3%
200	122,837	127,075	1.88 ms	1.57 ms	Go +3%
500	111,397	122,498	5.25 ms	4.06 ms	Go +10%

Analysis:

Under HTTPS/1.1, xKit and Go are nearly identical at low connection counts (~125K req/s each). This is a dramatic contrast to plaintext HTTP/1.1 where xKit was +18–24% faster.
TLS encryption is the bottleneck, not the HTTP layer. OpenSSL's AES-GCM encryption on a single thread saturates at ~125K req/s regardless of the HTTP framework above it.
At 500 connections, Go pulls ahead by ~10% because Go's multi-threaded runtime can parallelize TLS encryption across all CPU cores, while xKit's single-threaded event loop is limited to one core for both TLS and HTTP processing.
xKit's latency is slightly higher at high connection counts (5.25 ms vs 4.06 ms at 500 connections) — the single thread must serialize all TLS encrypt/decrypt operations.

HTTPS/2 — GET /ping (h2load, varying connections)

Tests HTTPS/2 performance with TLS + ALPN negotiation. HTTP/2 multiplexing reduces the number of TLS sessions needed, which should benefit the single-threaded xKit.

Connections	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
50	511,586	165,341	975 μs	2.99 ms	xKit +209%
100	508,685	144,024	1.96 ms	6.88 ms	xKit +253%
200	497,775	131,749	4.01 ms	15.00 ms	xKit +278%

Analysis:

With HTTPS/2, xKit regains its massive advantage: +209% to +278% over Go. HTTP/2 multiplexing means fewer TLS sessions are needed — multiple streams share one encrypted connection, so the TLS overhead is amortized.
xKit achieves ~510K req/s over HTTPS/2 — only ~10% less than its h2c (cleartext HTTP/2) performance of 562K. The TLS overhead is minimal when amortized across multiplexed streams.
Go's HTTPS/2 throughput (~131–165K) is comparable to its h2c numbers (~121–142K), suggesting Go's TLS overhead is also well-amortized but the HTTP/2 processing itself is the bottleneck.

HTTPS/2 — GET /echo (h2load, varying response size)

Tests response serialization + TLS encryption throughput with different payload sizes. Fixed at 100 connections.

Response Size	xKit Req/s	Go Req/s	xKit Latency	Go Latency	Δ
64 B	470,607	146,727	2.11 ms	6.74 ms	xKit +221%
1 KiB	388,828	140,926	2.56 ms	6.99 ms	xKit +176%
4 KiB	227,414	118,595	4.38 ms	8.22 ms	xKit +92%

Analysis:

xKit's advantage narrows as response size grows (from +221% at 64B to +92% at 4KB) because TLS encryption of larger payloads becomes a bigger fraction of total work.
At 4KB responses, xKit still achieves 893 MB/s encrypted throughput vs Go's 466 MB/s.

HTTPS/2 — POST /echo (h2load, varying body size)

Tests request body parsing + TLS decryption/encryption throughput. Fixed at 100 connections.

Body Size	xKit Req/s	Go Req/s	xKit Transfer/s	Go Transfer/s	Δ
1 KiB	291,086	146,916	289.93 MB/s	147.01 MB/s	xKit +98%
4 KiB	128,229	104,892	503.54 MB/s	413.20 MB/s	xKit +22%
16 KiB	38,975	37,391	609.97 MB/s	586.70 MB/s	xKit +4%
64 KiB	10,278	14,994	643.30 MB/s	939.77 MB/s	Go +46%

Analysis:

At small payloads (1KB), xKit is +98% faster. At medium payloads (4KB), the gap narrows to +22%.
At 16KB, the two are nearly tied (+4%). At 64KB, Go wins by +46% — this is the first scenario where Go decisively beats xKit.
The 64KB crossover happens because: (1) TLS encryption of 64KB payloads is CPU-intensive and benefits from Go's multi-core parallelism, (2) HTTP/2 flow control window (default 64KB) creates back-pressure that the single-threaded event loop handles less efficiently than Go's goroutine scheduler.

Protocol Comparison

How does TLS affect performance for each protocol? (GET /ping, 100 connections)

Server	HTTP/1.1	HTTPS/1.1	Δ (TLS cost)
xKit	152,316	124,593	−18%
Go	128,915	128,277	−0.5%

Server	h2c	HTTPS/2	Δ (TLS cost)
xKit	561,825	508,685	−9%
Go	120,732	144,024	+19%

Key Insights:

TLS costs xKit 18% on HTTP/1.1 because every connection requires its own TLS session, and all encryption runs on a single thread. Go's multi-core TLS is essentially free (−0.5%).
TLS costs xKit only 9% on HTTP/2 because multiplexed streams share TLS sessions. This is why HTTPS/2 is xKit's sweet spot.
Go actually gets faster with HTTPS/2 vs h2c (+19%) — likely because TLS session caching and ALPN negotiation provide a more optimized code path in Go's crypto/tls + x/net/http2 stack.

Summary

                    xKit vs Go HTTPS (Release build, OpenSSL 3.6.1)
                    =================================================

  HTTPS/1.1 (wrk):
    GET /ping:     Go ≈ xKit (−0% to +10% Go advantage at high connections)
    GET /echo 1KB: Go +10%

  HTTPS/2 (h2load -m10):
    GET /ping:     xKit +209% ~ +278%
    GET /echo:     xKit +92%  ~ +221%
    POST /echo:    xKit +98%  (1KB) → Go +46% (64KB)

  Peak throughput:  xKit 512K req/s  (HTTPS/2 GET /ping, 50 connections)
  Peak transfer:    Go 940 MB/s      (HTTPS/2 POST /echo, 64KB body)

Key Takeaways:

HTTPS/1.1 is TLS-bound. Single-threaded OpenSSL encryption caps xKit at ~125K req/s — the same as Go. The HTTP framework advantage disappears when TLS dominates.
HTTPS/2 restores xKit's advantage. Stream multiplexing amortizes TLS overhead across streams, letting xKit's efficient event loop shine again (+209–278% on GET /ping).
Large payloads favor Go. At 64KB POST bodies, Go's multi-core TLS parallelism wins by +46%. This is the only scenario where Go decisively beats xKit.
Choose your protocol wisely. For latency-sensitive APIs with small payloads, HTTPS/2 + xKit is optimal. For bulk data transfer, Go's multi-core TLS is more efficient.

Reproducing

# Build xKit server
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel

# Build Go HTTPS server
cd bench && go build -o ../build/bench/go_https_bench https_bench_server.go && cd ..

# Generate self-signed certificate
openssl req -x509 -newkey rsa:2048 -keyout /tmp/bench_key.pem \
  -out /tmp/bench_cert.pem -days 365 -nodes -subj '/CN=localhost'

# Install tools (macOS)
brew install wrk nghttp2

# Start servers
./build/bench/https_bench_server 8443 /tmp/bench_cert.pem /tmp/bench_key.pem &
./build/bench/go_https_bench 8444 /tmp/bench_cert.pem /tmp/bench_key.pem &

# HTTPS/1.1 benchmark (wrk)
wrk -t4 -c100 -d10s https://127.0.0.1:8443/ping
wrk -t4 -c100 -d10s https://127.0.0.1:8444/ping

# HTTPS/2 benchmark (h2load)
h2load -t4 -c100 -m10 -D 10 https://127.0.0.1:8443/ping
h2load -t4 -c100 -m10 -D 10 https://127.0.0.1:8444/ping

# POST benchmark
dd if=/dev/zero bs=4096 count=1 | tr '\0' 'x' > /tmp/body_4k.bin
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin https://127.0.0.1:8443/echo
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin https://127.0.0.1:8444/echo

# Cleanup
pkill -f https_bench_server
pkill -f go_https_bench

TODO

Planning and feasibility analysis for future improvements.

移除 libcurl 依赖 — 分析移除 xhttp 对 libcurl 的依赖的可行性、收益与折中方案

移除 libcurl 依赖的可行性与收益分析

一、当前 libcurl 的使用范围

libcurl 仅被 HTTP Client 部分使用，涉及以下文件：

文件	依赖程度	说明
`client.c`	核心	整个文件围绕 `curl_multi` / `curl_easy` 构建
`client.h`	API 层	`xHttpResponse` 暴露了 `curl_code` / `curl_error`
`client_private.h`	核心	`CURL easy`、`CURLM multi`、`CURLcode`、`CURL_ERROR_SIZE`
`sse.c`	核心	SSE 流式传输完全基于 curl write callback
`xhttp/CMakeLists.txt`	构建	`Libcurl::Libcurl` 链接
`CMakeLists.txt` (顶层)	构建	整个 xhttp 模块的编译以 `Libcurl_FOUND` 为前提

不依赖 curl 的部分（占 xhttp 模块的大部分）：

HTTP Server（server.c、proto_h1.c、proto_h2.c）→ 用 llhttp + nghttp2
WebSocket Server（ws.c、ws_serve.c、ws_handshake_server.c）
WebSocket Client（ws_connect.c、ws_handshake_client.c）→ 纯 socket + xEventLoop
Transport 层（transport_*.c）→ 纯 OpenSSL / mbedTLS
WS Frame / Deflate / Crypto

二、libcurl 提供了什么

libcurl 在 xhttp client 中承担了以下职责：

graph TD
    A[libcurl 提供的能力] --> B[HTTP/1.1 协议解析<br/>请求序列化 + 响应解析]
    A --> C[HTTP/2 协议支持<br/>HPACK, 流复用, 帧处理]
    A --> D[TLS 握手管理<br/>证书验证, ALPN 协商]
    A --> E[Multi-Socket API<br/>非阻塞 I/O 集成]
    A --> F[连接池 / Keep-Alive<br/>DNS 缓存]
    A --> G[Chunked Transfer<br/>Content-Encoding 解压]
    A --> H[重定向跟随<br/>Cookie 管理]
    A --> I[代理支持<br/>SOCKS / HTTP proxy]

三、替换方案分析

如果移除 libcurl，需要自己实现 HTTP Client 协议栈：

需要自建的组件	复杂度	说明
HTTP/1.1 请求序列化	⭐ 低	手动拼 `GET /path HTTP/1.1\r\n...`
HTTP/1.1 响应解析	⭐⭐ 中	可复用已有的 llhttp（server 已在用）
Chunked Transfer Decoding	⭐⭐ 中	llhttp 可处理
TLS 客户端握手	⭐⭐ 中	WS Client 已有 `transport_tls_client_openssl/mbedtls`，可复用
HTTP/2 客户端	⭐⭐⭐⭐ 高	需要 nghttp2 的 client session API（server 已用 nghttp2，但 client 模式不同）
连接池 / Keep-Alive	⭐⭐⭐ 高	需要自己管理连接复用、idle timeout
Multi-Socket 事件集成	⭐⭐ 中	已有 xEventLoop，但需要自己管理连接状态机
DNS 异步解析	⭐⭐⭐ 高	curl 内置 c-ares 集成，自建需要额外依赖或阻塞
重定向 / Cookie / Proxy	⭐⭐ 中	按需实现

四、收益分析

✅ 收益

减少外部依赖
- 当前 xhttp 模块需要 libcurl（~600KB 动态库），移除后减少一个系统级依赖
- 嵌入式 / 交叉编译场景更友好（libcurl 的交叉编译配置较复杂）
统一 TLS 管理
- 目前 HTTP Client 的 TLS 由 curl 内部管理（CURLOPT_CAINFO 等），与 xnet/xhttp 其他部分的 xTlsCtx 体系割裂
- 移除后可统一使用 xTlsCtx 共享模式，与 TCP/WS Client/HTTP Server 一致
消除 API 泄漏
- xHttpResponse 中的 curl_code / curl_error 是 curl 特有概念，暴露给用户不够抽象
- 移除后可用 xErrno 统一错误体系
减小二进制体积
- 对于只用 server 或 WS 的场景，不再需要链接 curl
更精细的控制
- 连接池策略、超时行为、buffer 管理等可以完全自定义

❌ 代价

工作量巨大（估算 2000-3000 行新代码）
- HTTP/1.1 Client 协议栈：~500 行
- HTTP/2 Client（nghttp2 client session）：~800 行
- 连接池 + Keep-Alive 管理：~500 行
- SSE 重新集成：~300 行
- DNS 解析：~200 行（或引入 c-ares）
- 测试重写：~500 行
HTTP/2 Client 是最大难点
- nghttp2 的 client API 与 server API 差异大，需要处理 SETTINGS、WINDOW_UPDATE、流优先级等
- curl 内部对 nghttp2 client 做了大量边界处理
失去 curl 的成熟度
- libcurl 经过 25+ 年打磨，处理了无数 HTTP 边界情况（畸形响应、各种 Transfer-Encoding、代理认证等）
- 自建实现短期内很难达到同等健壮性
维护负担增加
- HTTP 协议的 edge case 很多，自建意味着长期维护成本

五、折中方案

如果目标是减少依赖但不完全重写，有几个渐进路径：

graph LR
    A[当前状态<br/>curl 必选] --> B[方案1: curl 可选<br/>有 curl 用 curl<br/>无 curl 用内置 H1]
    A --> C[方案2: 仅移除 H2 Client<br/>内置 H1 Client<br/>H2 仍用 curl]
    A --> D[方案3: 完全移除<br/>内置 H1 + H2 Client]
    
    B --> E[工作量: ~800行<br/>风险: 低]
    C --> F[工作量: ~600行<br/>风险: 低]
    D --> G[工作量: ~2500行<br/>风险: 高]

推荐方案1：让 curl 变为可选依赖

新增一个轻量的内置 HTTP/1.1 Client（基于已有的 llhttp + transport_tls_client + xEventLoop）
有 curl 时用 curl（支持 H2、连接池等高级特性）
无 curl 时 fallback 到内置 H1 Client（覆盖 80% 的使用场景）
HTTP Server、WS Server/Client 完全不受影响（它们本来就不依赖 curl）

这样可以：

让 xhttp 模块在无 curl 环境下也能编译（server + ws + 基础 client）
保留 curl 作为增强选项（H2 client、连接池、代理等）
统一 TLS 管理（内置 client 用 xTlsCtx）
逐步迁移，风险可控

六、结论

维度	完全移除	可选依赖（推荐）
工作量	~2500 行 + 测试重写	~800 行
风险	高（H2 client 复杂）	低（H1 only，复用现有组件）
收益	零外部依赖	无 curl 也能用，有 curl 更强
API 变化	需要重新设计 Response	可以抽象一层，渐进迁移
时间	2-3 周	3-5 天

建议：先做方案1（curl 可选），把 HTTP Server / WS 从 curl 依赖中解耦出来（实际上它们已经解耦了，只是 CMake 层面整个 xhttp 模块被 curl 门控了）。然后再根据实际需求决定是否进一步移除 curl。

xKit Documentation

xbase — Core Primitives

xbuf — Buffer Primitives

xnet — Networking Primitives

xhttp — Async HTTP Client & Server & WebSocket

xlog — Async Logging

bench — End-to-End Benchmarks