HTTPS Server Benchmark

End-to-end HTTPS server benchmark comparing xKit (single-threaded event-loop, OpenSSL) against Go net/http + crypto/tls (goroutine-per-connection). Tests both HTTPS/1.1 (wrk) and HTTPS/2 (h2load with ALPN).

Test Environment

ItemValue
CPUApple M3 Pro (12 cores)
Memory36 GB
OSmacOS 26.4 (Darwin)
CompilerApple Clang 17.0.0
BuildRelease (-O2)
TLS BackendOpenSSL 3.6.1 (xKit), Go crypto/tls (Go)
CertificateRSA 2048-bit self-signed, TLS 1.3
Load Generatorwrk (HTTP/1.1 over TLS), h2load (HTTP/2 over TLS with ALPN)

Server Implementations

xKit (bench/https_bench_server.cpp)

Single-threaded event-loop HTTPS server built on xbase/event.h + xhttp/server.h + OpenSSL. Uses xHttpServerListenTls() which automatically sets ALPN to {"h2", "http/1.1"}, so the same server handles both HTTPS/1.1 and HTTPS/2 depending on client negotiation.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel
openssl req -x509 -newkey rsa:2048 -keyout bench_key.pem -out bench_cert.pem \
  -days 365 -nodes -subj '/CN=localhost'
./build/bench/https_bench_server 8443 bench_cert.pem bench_key.pem

Go (bench/https_bench_server.go)

Standard net/http server with crypto/tls and x/net/http2.ConfigureServer(). Go's TLS implementation is in pure Go (crypto/tls), while xKit uses OpenSSL's C implementation. Both servers configure ALPN for h2 and http/1.1.

cd bench && go build -o ../build/bench/go_https_bench https_bench_server.go
./build/bench/go_https_bench 8444 bench_cert.pem bench_key.pem

Routes

Both servers implement identical routes:

RouteMethodDescription
/pingGETReturns "pong" (4 bytes) — minimal response latency test
/echo?size=NGETReturns N bytes of 'x' — variable response size test
/echoPOSTEchoes request body — request body throughput test

Results

HTTPS/1.1 — GET /ping (wrk, varying connections)

Tests HTTPS/1.1 performance where each connection maintains its own TLS session. wrk reuses connections (no per-request handshake), so this measures encrypted request/response throughput.

ConnectionsxKit Req/sGo Req/sxKit LatencyGo LatencyΔ
50125,147125,076395 μs372 μs≈ 0%
100124,593128,2770.86 ms764 μsGo +3%
200122,837127,0751.88 ms1.57 msGo +3%
500111,397122,4985.25 ms4.06 msGo +10%

Analysis:

  • Under HTTPS/1.1, xKit and Go are nearly identical at low connection counts (~125K req/s each). This is a dramatic contrast to plaintext HTTP/1.1 where xKit was +18–24% faster.
  • TLS encryption is the bottleneck, not the HTTP layer. OpenSSL's AES-GCM encryption on a single thread saturates at ~125K req/s regardless of the HTTP framework above it.
  • At 500 connections, Go pulls ahead by ~10% because Go's multi-threaded runtime can parallelize TLS encryption across all CPU cores, while xKit's single-threaded event loop is limited to one core for both TLS and HTTP processing.
  • xKit's latency is slightly higher at high connection counts (5.25 ms vs 4.06 ms at 500 connections) — the single thread must serialize all TLS encrypt/decrypt operations.

HTTPS/2 — GET /ping (h2load, varying connections)

Tests HTTPS/2 performance with TLS + ALPN negotiation. HTTP/2 multiplexing reduces the number of TLS sessions needed, which should benefit the single-threaded xKit.

ConnectionsxKit Req/sGo Req/sxKit LatencyGo LatencyΔ
50511,586165,341975 μs2.99 msxKit +209%
100508,685144,0241.96 ms6.88 msxKit +253%
200497,775131,7494.01 ms15.00 msxKit +278%

Analysis:

  • With HTTPS/2, xKit regains its massive advantage: +209% to +278% over Go. HTTP/2 multiplexing means fewer TLS sessions are needed — multiple streams share one encrypted connection, so the TLS overhead is amortized.
  • xKit achieves ~510K req/s over HTTPS/2 — only ~10% less than its h2c (cleartext HTTP/2) performance of 562K. The TLS overhead is minimal when amortized across multiplexed streams.
  • Go's HTTPS/2 throughput (~131–165K) is comparable to its h2c numbers (~121–142K), suggesting Go's TLS overhead is also well-amortized but the HTTP/2 processing itself is the bottleneck.

HTTPS/2 — GET /echo (h2load, varying response size)

Tests response serialization + TLS encryption throughput with different payload sizes. Fixed at 100 connections.

Response SizexKit Req/sGo Req/sxKit LatencyGo LatencyΔ
64 B470,607146,7272.11 ms6.74 msxKit +221%
1 KiB388,828140,9262.56 ms6.99 msxKit +176%
4 KiB227,414118,5954.38 ms8.22 msxKit +92%

Analysis:

  • xKit's advantage narrows as response size grows (from +221% at 64B to +92% at 4KB) because TLS encryption of larger payloads becomes a bigger fraction of total work.
  • At 4KB responses, xKit still achieves 893 MB/s encrypted throughput vs Go's 466 MB/s.

HTTPS/2 — POST /echo (h2load, varying body size)

Tests request body parsing + TLS decryption/encryption throughput. Fixed at 100 connections.

Body SizexKit Req/sGo Req/sxKit Transfer/sGo Transfer/sΔ
1 KiB291,086146,916289.93 MB/s147.01 MB/sxKit +98%
4 KiB128,229104,892503.54 MB/s413.20 MB/sxKit +22%
16 KiB38,97537,391609.97 MB/s586.70 MB/sxKit +4%
64 KiB10,27814,994643.30 MB/s939.77 MB/sGo +46%

Analysis:

  • At small payloads (1KB), xKit is +98% faster. At medium payloads (4KB), the gap narrows to +22%.
  • At 16KB, the two are nearly tied (+4%). At 64KB, Go wins by +46% — this is the first scenario where Go decisively beats xKit.
  • The 64KB crossover happens because: (1) TLS encryption of 64KB payloads is CPU-intensive and benefits from Go's multi-core parallelism, (2) HTTP/2 flow control window (default 64KB) creates back-pressure that the single-threaded event loop handles less efficiently than Go's goroutine scheduler.

Protocol Comparison

How does TLS affect performance for each protocol? (GET /ping, 100 connections)

ServerHTTP/1.1HTTPS/1.1Δ (TLS cost)
xKit152,316124,593−18%
Go128,915128,277−0.5%
Serverh2cHTTPS/2Δ (TLS cost)
xKit561,825508,685−9%
Go120,732144,024+19%

Key Insights:

  1. TLS costs xKit 18% on HTTP/1.1 because every connection requires its own TLS session, and all encryption runs on a single thread. Go's multi-core TLS is essentially free (−0.5%).
  2. TLS costs xKit only 9% on HTTP/2 because multiplexed streams share TLS sessions. This is why HTTPS/2 is xKit's sweet spot.
  3. Go actually gets faster with HTTPS/2 vs h2c (+19%) — likely because TLS session caching and ALPN negotiation provide a more optimized code path in Go's crypto/tls + x/net/http2 stack.

Summary

                    xKit vs Go HTTPS (Release build, OpenSSL 3.6.1)
                    =================================================

  HTTPS/1.1 (wrk):
    GET /ping:     Go ≈ xKit (−0% to +10% Go advantage at high connections)
    GET /echo 1KB: Go +10%

  HTTPS/2 (h2load -m10):
    GET /ping:     xKit +209% ~ +278%
    GET /echo:     xKit +92%  ~ +221%
    POST /echo:    xKit +98%  (1KB) → Go +46% (64KB)

  Peak throughput:  xKit 512K req/s  (HTTPS/2 GET /ping, 50 connections)
  Peak transfer:    Go 940 MB/s      (HTTPS/2 POST /echo, 64KB body)

Key Takeaways:

  1. HTTPS/1.1 is TLS-bound. Single-threaded OpenSSL encryption caps xKit at ~125K req/s — the same as Go. The HTTP framework advantage disappears when TLS dominates.
  2. HTTPS/2 restores xKit's advantage. Stream multiplexing amortizes TLS overhead across streams, letting xKit's efficient event loop shine again (+209–278% on GET /ping).
  3. Large payloads favor Go. At 64KB POST bodies, Go's multi-core TLS parallelism wins by +46%. This is the only scenario where Go decisively beats xKit.
  4. Choose your protocol wisely. For latency-sensitive APIs with small payloads, HTTPS/2 + xKit is optimal. For bulk data transfer, Go's multi-core TLS is more efficient.

Reproducing

# Build xKit server
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DXK_BUILD_BENCHMARKS=ON
cmake --build build --parallel

# Build Go HTTPS server
cd bench && go build -o ../build/bench/go_https_bench https_bench_server.go && cd ..

# Generate self-signed certificate
openssl req -x509 -newkey rsa:2048 -keyout /tmp/bench_key.pem \
  -out /tmp/bench_cert.pem -days 365 -nodes -subj '/CN=localhost'

# Install tools (macOS)
brew install wrk nghttp2

# Start servers
./build/bench/https_bench_server 8443 /tmp/bench_cert.pem /tmp/bench_key.pem &
./build/bench/go_https_bench 8444 /tmp/bench_cert.pem /tmp/bench_key.pem &

# HTTPS/1.1 benchmark (wrk)
wrk -t4 -c100 -d10s https://127.0.0.1:8443/ping
wrk -t4 -c100 -d10s https://127.0.0.1:8444/ping

# HTTPS/2 benchmark (h2load)
h2load -t4 -c100 -m10 -D 10 https://127.0.0.1:8443/ping
h2load -t4 -c100 -m10 -D 10 https://127.0.0.1:8444/ping

# POST benchmark
dd if=/dev/zero bs=4096 count=1 | tr '\0' 'x' > /tmp/body_4k.bin
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin https://127.0.0.1:8443/echo
h2load -t4 -c100 -m10 -D 10 -d /tmp/body_4k.bin https://127.0.0.1:8444/echo

# Cleanup
pkill -f https_bench_server
pkill -f go_https_bench