WebSocket Server Benchmark

End-to-end WebSocket echo server benchmark comparing moo (single-threaded event-loop) against three popular Go WebSocket libraries:

  • gorilla/websocket — The most widely used Go WebSocket library
  • nhooyr/websocket (coder/websocket) — Modern API with context support
  • gobwas/ws — Zero-allocation, low-level WebSocket library

Test Environment

ItemValue
CPUApple M3 Pro (12 cores)
Memory36 GB
OSmacOS 26.4 (Darwin)
CompilerApple Clang 17.0.0
BuildRelease (-O2)
Load GeneratorCustom Go client (ws_bench_client.go) using gorilla/websocket

Server Implementations

All servers implement the same behavior: accept WebSocket connections and echo every received message back to the sender.

moo (bench/ws_bench_server.cpp)

Single-threaded event-loop WebSocket server built on xbase/event.h + xhttp/ws.h. Uses xWsServe() for a one-line WebSocket-only server. All frame parsing, masking, ping/pong, and close handshake are handled automatically.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DMOO_BUILD_BENCHMARKS=ON
cmake --build build --parallel
./build/bench/ws_bench_server 9090

gorilla/websocket (bench/ws_bench_server_gorilla.go)

Standard net/http server with gorilla/websocket.Upgrader. One goroutine per connection with a simple ReadMessage / WriteMessage loop. Buffer sizes set to 4KB.

cd bench && go build -o ../build/bench/ws_bench_gorilla ws_bench_server_gorilla.go
./build/bench/ws_bench_gorilla 9091

nhooyr/websocket (bench/ws_bench_server_nhooyr.go)

Standard net/http server with nhooyr.io/websocket.Accept. Uses the streaming Reader / Writer API with io.Copy for zero-copy echo.

cd bench && go build -o ../build/bench/ws_bench_nhooyr ws_bench_server_nhooyr.go
./build/bench/ws_bench_nhooyr 9092

gobwas/ws (bench/ws_bench_server_gobwas.go)

Raw TCP listener with gobwas/ws.Upgrader for zero-allocation upgrade. Uses wsutil.ReadClientData / wsutil.WriteServerMessage for frame I/O. One goroutine per connection.

cd bench && go build -o ../build/bench/ws_bench_gobwas ws_bench_server_gobwas.go
./build/bench/ws_bench_gobwas 9093

Benchmark Methodology

The benchmark client (ws_bench_client.go) establishes N concurrent WebSocket connections to the server. Each connection runs a synchronous echo loop: send a message → wait for the echo → measure round-trip latency → repeat. The test runs for 10 seconds.

Key parameters:

  • Connections: 50, 100, 200, 500
  • Message sizes: 64B, 256B, 1KB, 4KB
  • Message type: Binary
  • Duration: 10 seconds per test

Note: The benchmark client uses gorilla/websocket for all tests. This means the client-side overhead is identical across all server tests, ensuring a fair comparison of server-side performance.

Results

Echo 64B — Varying Connection Count

Tests raw message throughput with minimal 64-byte payloads. Varies connection count to measure scalability.

Connectionsmoo Msg/sgorilla Msg/snhooyr Msg/sgobwas Msg/s
50219,850173,133107,570138,360
100219,813180,373125,386140,522
200218,997184,335140,378141,859
500218,078184,820155,729141,970

moo vs best Go library (gorilla):

ConnectionsmoogorillaΔ
50219,850173,133moo +27%
100219,813180,373moo +22%
200218,997184,335moo +19%
500218,078184,820moo +18%

Latency (64B, varying connections):

Connectionsmoogorillanhooyrgobwas
50227 μs289 μs465 μs361 μs
100455 μs554 μs797 μs711 μs
200913 μs1.08 ms1.42 ms1.41 ms
5002.29 ms2.70 ms3.21 ms3.52 ms

Analysis:

  • moo sustains ~219K msg/s across all connection counts — virtually no throughput degradation from 50 to 500 connections. The single-threaded event loop handles all connections without context-switching overhead.
  • gorilla/websocket is the fastest Go library at ~173–185K msg/s, benefiting from its mature, optimized implementation.
  • gobwas/ws — despite being marketed as "zero-allocation" — is slower than gorilla in this echo benchmark (~138–142K). Its advantage is in memory efficiency for massive connection counts, not raw throughput.
  • nhooyr/websocket is the slowest at ~108–156K msg/s. The streaming Reader/Writer API adds overhead compared to gorilla's simpler ReadMessage/WriteMessage.
  • moo's latency advantage is most pronounced at low connection counts (227 μs vs 289 μs at 50 connections) and narrows at high counts as all servers become scheduling-bound.

Echo — Varying Message Size (100 connections)

Tests message throughput and transfer rate with different payload sizes. Fixed at 100 connections.

Message Sizemoo Msg/sgorilla Msg/snhooyr Msg/sgobwas Msg/s
64 B219,813180,373125,386140,522
256 B216,760179,909122,661140,677
1 KiB197,890173,142120,963133,002
4 KiB133,553125,313100,82992,203

Transfer Rate (send + recv):

Message Sizemoogorillanhooyrgobwas
64 B26.84 MB/s22.02 MB/s15.31 MB/s17.15 MB/s
256 B105.84 MB/s87.85 MB/s59.89 MB/s68.69 MB/s
1 KiB386.50 MB/s338.17 MB/s236.26 MB/s259.77 MB/s
4 KiB1.02 GB/s979 MB/s788 MB/s720 MB/s

Latency (100 connections, varying message size):

Message Sizemoogorillanhooyrgobwas
64 B455 μs554 μs797 μs711 μs
256 B461 μs556 μs815 μs711 μs
1 KiB505 μs577 μs826 μs752 μs
4 KiB749 μs798 μs992 μs1.08 ms

Analysis:

  • moo achieves 1.02 GB/s transfer rate at 4KB messages — the only server to break the 1 GB/s barrier.
  • At 4KB, the ranking shifts: moo > gorilla > nhooyr > gobwas. gobwas drops to last place because its ReadClientData / WriteServerMessage API allocates a new byte slice per message, negating its "zero-allocation upgrade" advantage.
  • moo's advantage over gorilla narrows from +22% (64B) to +7% (4KB) as both servers become I/O bound at larger payloads.
  • All servers show graceful throughput degradation as message size grows, with moo maintaining the lowest latency across all sizes.

Go Library Comparison (WS)

How do the three Go libraries compare against each other? (100 connections, 64B)

LibraryMsg/sLatencyRelative
gorilla/websocket180,373554 μsbaseline
gobwas/ws140,522711 μs−22%
nhooyr/websocket125,386797 μs−30%

Key Insight: In a pure echo benchmark, gorilla/websocket is the fastest Go library. gobwas/ws's advantage lies in memory efficiency for 100K+ idle connections (not measured here), while nhooyr/websocket prioritizes API ergonomics over raw performance.

WSS (WebSocket over TLS) Benchmark

The same echo benchmark repeated over TLS (wss://) to measure the impact of encryption on throughput and latency. All servers use the same self-signed certificate (bench_cert.pem / bench_key.pem, RSA 2048-bit, TLSv1.3).

WSS Server Implementations

  • moo (bench/wss_bench_server.cpp) — Uses xHttpServerCreate() + xWsUpgrade() + xHttpServerListenTls(). ALPN set to http/1.1 only (WebSocket requires HTTP/1.1 upgrade). Single-threaded event loop handles both TLS and WebSocket I/O.
  • Go servers (bench/wss_bench_server_{gorilla,nhooyr,gobwas}.go) — Same logic as WS versions but with ListenAndServeTLS (gorilla/nhooyr) or tls.Listen (gobwas). Go's crypto/tls runs TLS per-goroutine, parallelizing encryption across connections.

WSS Echo 64B — Varying Connection Count

Connectionsmoo Msg/sgorilla Msg/snhooyr Msg/sgobwas Msg/s
50186,513173,125107,589138,317
100186,068180,426133,218142,187
200184,066185,792148,475144,361
500167,019184,532156,695143,220

moo vs gorilla (WSS):

ConnectionsmoogorillaΔ
50186,513173,125moo +8%
100186,068180,426moo +3%
200184,066185,792gorilla +1%
500167,019184,532gorilla +10%

Latency (WSS 64B, varying connections):

Connectionsmoogorillanhooyrgobwas
50268 μs289 μs465 μs361 μs
100537 μs554 μs750 μs703 μs
2001.09 ms1.08 ms1.35 ms1.38 ms
5002.99 ms2.71 ms3.19 ms3.49 ms

Analysis:

  • At low connection counts (50–100), moo still leads by 3–8% over gorilla. The single-threaded event loop's efficiency offsets the TLS overhead.
  • At 200+ connections, gorilla overtakes moo. Go's per-goroutine crypto/tls parallelizes encryption across all CPU cores, while moo's single-threaded OpenSSL must serialize all TLS operations on one core.
  • The TLS overhead reduces moo's throughput by ~15% compared to plain WS (186K vs 220K at 100 conns). Go libraries show minimal TLS impact because Go's TLS is already goroutine-parallel.
  • moo's throughput degrades more steeply at 500 connections (167K, −10% from 50 conns) compared to plain WS (218K, −1%). This confirms TLS as the bottleneck for the single-threaded model.

WSS Echo — Varying Message Size (100 connections)

Message Sizemoo Msg/sgorilla Msg/snhooyr Msg/sgobwas Msg/s
64 B165,952180,923128,983141,951
256 B174,475178,725131,257141,520
1 KiB149,246172,198127,026135,534
4 KiB92,686137,560105,289107,550

Transfer Rate (WSS, send + recv):

Message Sizemoogorillanhooyrgobwas
64 B20.26 MB/s22.09 MB/s15.75 MB/s17.33 MB/s
256 B85.19 MB/s87.27 MB/s64.09 MB/s69.10 MB/s
1 KiB291.50 MB/s336.32 MB/s248.10 MB/s264.71 MB/s
4 KiB723.95 MB/s1.05 GB/s822.88 MB/s840.23 MB/s

Analysis:

  • At 64B, gorilla leads slightly (181K vs 166K). Go's per-goroutine crypto/tls parallelizes encryption across all CPU cores, giving it an advantage even at small payloads.
  • At 256B+, gorilla maintains its lead because Go parallelizes TLS encryption across goroutines while moo serializes it on one thread.
  • At 4KB, moo achieves 92,686 msg/s — competitive with nhooyr (105K) and gobwas (108K), though gorilla leads at 138K. The single-threaded TLS model is the main bottleneck, but moo remains within the same order of magnitude as the Go libraries.
  • Future work could add a TLS write thread pool or io_uring-based async TLS to close the gap at larger payloads.

WS vs WSS Performance Impact

How much does TLS reduce throughput? (100 connections, 64B)

ServerWS Msg/sWSS Msg/sTLS Overhead
moo219,813165,952−25%
gorilla180,373180,923~0%
nhooyr125,386128,983+3% ¹
gobwas140,522141,951+1% ¹

¹ Slight WSS improvement over WS is within measurement noise and likely due to system load variance between test runs.

Key Insight: Go's crypto/tls adds virtually zero overhead in this benchmark because TLS operations run in parallel across goroutines. moo pays a 25% penalty because all TLS encryption/decryption happens on the single event loop thread.

Summary

                    WebSocket Echo Benchmark (Release build)
                    =========================================

  WS — 64B echo (100 conns):
    moo:     219,813 msg/s   455 μs
    gorilla:  180,373 msg/s   554 μs   (moo +22%)
    gobwas:   140,522 msg/s   711 μs   (moo +56%)
    nhooyr:   125,386 msg/s   797 μs   (moo +75%)

  WS — 4KB echo (100 conns):
    moo:     133,553 msg/s   749 μs   1.02 GB/s
    gorilla:  125,313 msg/s   798 μs   979 MB/s   (moo +7%)
    nhooyr:   100,829 msg/s   992 μs   788 MB/s   (moo +32%)
    gobwas:    92,203 msg/s   1.08 ms  720 MB/s   (moo +45%)

  WSS — 64B echo (100 conns):
    gorilla:  180,923 msg/s   553 μs
    moo:     165,952 msg/s   603 μs   (gorilla +9%)
    gobwas:   141,951 msg/s   704 μs
    nhooyr:   128,983 msg/s   775 μs

  WSS — 4KB echo (100 conns):
    gorilla:  137,560 msg/s   728 μs   1.05 GB/s
    gobwas:   107,550 msg/s   930 μs   840 MB/s
    nhooyr:   105,289 msg/s   950 μs   823 MB/s
    moo:      92,686 msg/s   1.08 ms  724 MB/s   (gorilla +48%)

  Peak WS throughput:   moo 219,850 msg/s  (64B, 50 connections)
  Peak WS transfer:     moo 1.02 GB/s      (4KB, 100 connections)
  Peak WSS throughput:  moo 186,513 msg/s  (64B, 50 connections)
  Peak WSS transfer:    gorilla 1.05 GB/s   (4KB, 100 connections)

Key Takeaways:

  1. moo is 18–27% faster than gorilla on plain WS (small messages), and 3–8% faster on WSS at low connection counts. The single-threaded event loop avoids goroutine scheduling overhead.
  2. TLS changes the picture at scale. At 200+ connections or 1KB+ messages over WSS, gorilla overtakes moo because Go parallelizes TLS across goroutines while moo serializes it on one thread.
  3. moo's WS throughput is remarkably stable across connection counts (219K at 50 conns vs 218K at 500 conns — less than 1% variation). WSS shows more degradation (186K → 167K) due to single-threaded TLS.
  4. gorilla/websocket is the fastest Go library for both WS and WSS echo workloads.
  5. Single-threaded TLS is the main bottleneck for large payloads. At WSS 4KB, moo (93K msg/s) trails gorilla (138K msg/s) by ~48%. Future work could add a TLS write thread pool or io_uring-based async TLS to close the gap.

Reproducing

# Build moo servers
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DMOO_BUILD_BENCHMARKS=ON
cmake --build build --parallel

# Build Go servers and client
cd bench
go build -o ../build/bench/ws_bench_client ws_bench_client.go
go build -o ../build/bench/ws_bench_gorilla ws_bench_server_gorilla.go
go build -o ../build/bench/ws_bench_nhooyr ws_bench_server_nhooyr.go
go build -o ../build/bench/ws_bench_gobwas ws_bench_server_gobwas.go
go build -o ../build/bench/wss_bench_gorilla wss_bench_server_gorilla.go
go build -o ../build/bench/wss_bench_nhooyr wss_bench_server_nhooyr.go
go build -o ../build/bench/wss_bench_gobwas wss_bench_server_gobwas.go
cd ..

# Generate self-signed certificate for WSS benchmarks
openssl req -x509 -newkey rsa:2048 \
  -keyout build/bench/bench_key.pem \
  -out build/bench/bench_cert.pem \
  -days 365 -nodes -subj '/CN=localhost'

# Run WS benchmarks (one server at a time)
./build/bench/ws_bench_server 9090 &
./build/bench/ws_bench_client -url ws://127.0.0.1:9090/ -c 100 -d 10s -size 64
kill %1

./build/bench/ws_bench_gorilla 9091 &
./build/bench/ws_bench_client -url ws://127.0.0.1:9091/ -c 100 -d 10s -size 64
kill %1

./build/bench/ws_bench_nhooyr 9092 &
./build/bench/ws_bench_client -url ws://127.0.0.1:9092/ -c 100 -d 10s -size 64
kill %1

./build/bench/ws_bench_gobwas 9093 &
./build/bench/ws_bench_client -url ws://127.0.0.1:9093/ -c 100 -d 10s -size 64
kill %1

# Run WSS benchmarks (from build/bench directory for cert paths)
cd build/bench

./wss_bench_server 9090 bench_cert.pem bench_key.pem &
./ws_bench_client -url wss://127.0.0.1:9090/ -c 100 -d 10s -size 64
kill %1

./wss_bench_gorilla 9091 bench_cert.pem bench_key.pem &
./ws_bench_client -url wss://127.0.0.1:9091/ -c 100 -d 10s -size 64
kill %1

./wss_bench_nhooyr 9092 bench_cert.pem bench_key.pem &
./ws_bench_client -url wss://127.0.0.1:9092/ -c 100 -d 10s -size 64
kill %1

./wss_bench_gobwas 9093 bench_cert.pem bench_key.pem &
./ws_bench_client -url wss://127.0.0.1:9093/ -c 100 -d 10s -size 64
kill %1