Binary Analysis: Network Protocol Capture
Deduplicate binary network captures to identify unique protocol messages and reduce storage for packet analysis workflows.
The Problem
Network packet captures contain massive amounts of repeated data:
- Protocol handshakes repeat - Same TCP/TLS setup sequences
- Keepalive messages - Repeated heartbeat packets
- Duplicate requests - Load balancers sending identical probes
- Large capture files - Gigabytes of pcap data with high redundancy
Binary deduplication can significantly reduce capture file sizes for analysis and storage.
Input Data
network-capture.bin
Binary file containing 6 network packets (445 bytes):
- 3× HTTP GET requests (identical)
- 2× HTTP POST requests (identical)
- 1× HTTP Response (unique)
Packets delimited by null bytes (0x00).
Hex dump (first 20 lines):
00000000: 3200 0147 4554 202f 6170 692f 7573 6572 2..GET /api/user
00000010: 7320 4854 5450 2f31 2e31 0d0a 486f 7374 s HTTP/1.1..Host
00000020: 3a20 6170 692e 6578 616d 706c 652e 636f : api.example.co
00000030: 6d0d 0a0d 0a00 6400 0250 4f53 5420 2f61 m.....d..POST /a
...
Packet structure:
Output Data
expected-output.bin
Deduplicated binary file with 3 unique packets (235 bytes):
- 1× HTTP GET (2 duplicates removed)
- 1× HTTP POST (1 duplicate removed)
- 1× HTTP Response (kept)
Result: 47% size reduction (445 → 235 bytes)
Solution
Options:
--byte-mode: Process binary data (not text lines)--delimiter-hex 00: Split on null byte (0x00) instead of newline--window-size 1: Deduplicate individual packets--quiet: Suppress statistics output
from uniqseq import UniqSeq
uniqseq = UniqSeq(
delimiter=b"\x00", # (1)!
window_size=1, # (2)!
)
with open("network-capture.bin", "rb") as f:
with open("output.bin", "wb") as out:
data = f.read()
# Split on delimiter, keeping empty chunks (consecutive delimiters)
chunks = data.split(b'\x00')
# Process all but last chunk (last is after trailing delimiter)
for chunk in chunks[:-1]:
uniqseq.process_line(chunk, out)
# Process last chunk if non-empty
if chunks[-1]:
uniqseq.process_line(chunks[-1], out)
uniqseq.flush_to_stream(out)
- Use bytes delimiter for binary mode
- Deduplicate individual packets
How It Works
Byte mode processes binary data using custom delimiters instead of text newlines:
Before (6 packets, 445 bytes):
[GET packet #1] <-- Keep
[POST packet #1] <-- Keep
[GET packet #2] <-- Duplicate, remove
[Response #1] <-- Keep (unique)
[POST packet #2] <-- Duplicate, remove
[GET packet #3] <-- Duplicate, remove
After (3 packets, 235 bytes):
[GET packet]
[POST packet]
[Response]
Each packet is hashed and compared, with duplicates removed.
Real-World Workflows
Deduplicate tcpdump Output
Reduce pcap file size for analysis:
# Convert pcap to binary stream, deduplicate
tcpdump -r capture.pcap -w - | \
uniqseq --byte-mode --delimiter-hex 0a --quiet | \
tcpdump -r - -w deduplicated.pcap
Protocol Message Analysis
Extract unique protocol messages from network logs:
# Deduplicate binary protocol logs
uniqseq protocol.bin \
--byte-mode \
--delimiter-hex 00 \
--window-size 1 \
--quiet > unique-messages.bin
# Analyze unique messages
hexdump -C unique-messages.bin | less
Find Repeated Heartbeats
Identify and remove keepalive packets:
# Show how many duplicates were removed
uniqseq capture.bin \
--byte-mode \
--delimiter-hex 00 \
--stats-format json \
--quiet 2>&1 | \
jq '.statistics.lines.skipped'
Output: 150 (150 duplicate packets removed)
DNS Query Deduplication
Deduplicate DNS query logs:
# DNS messages use length prefixes, delimited by custom markers
uniqseq dns-queries.bin \
--byte-mode \
--delimiter-hex 0a \
--window-size 1 \
--quiet > unique-dns-queries.bin
HTTP/2 Frame Analysis
Analyze HTTP/2 binary frames:
# HTTP/2 frames delimited by frame boundaries
uniqseq http2-frames.bin \
--byte-mode \
--delimiter-hex 00 \
--window-size 3 \
--quiet > unique-http2-sequences.bin
Use --window-size 3 to capture frame sequences (request → response → ack).
Advanced Patterns
Multi-Packet Sequences
Deduplicate conversation patterns (multi-packet windows):
# Find unique 3-packet sequences
uniqseq network-capture.bin \
--byte-mode \
--delimiter-hex 00 \
--window-size 3 \
--quiet
Identifies unique request/response/acknowledgment patterns.
Protocol Normalization
Normalize variable fields before deduplication:
# Remove timestamps from packets before comparing
uniqseq capture.bin \
--byte-mode \
--delimiter-hex 00 \
--hash-transform 'sed "s/timestamp=[0-9]*/timestamp=XXX/g"' \
--quiet
Groups packets with different timestamps but identical structure.
Hex Delimiter Discovery
Find the right delimiter for your binary format:
# Try different delimiters
for delim in 00 0a 0d 1a ff; do
echo "Delimiter: 0x$delim"
uniqseq data.bin --byte-mode --delimiter-hex $delim --quiet | wc -c
done
Choose the delimiter that produces the most logical chunk sizes.
Save Binary Statistics
Track deduplication metrics for binary data:
# Process binary capture with stats
uniqseq network.pcap \
--byte-mode \
--delimiter-hex 00 \
--stats-format json \
--quiet \
> deduplicated.pcap \
2> capture-stats.json
# Check compression ratio
jq '.statistics.redundancy_pct' capture-stats.json
Performance Benefits
Storage Reduction
# Before deduplication
$ ls -lh network-capture.bin
445 bytes
# After deduplication
$ ls -lh deduplicated.bin
235 bytes # 47% reduction
For real packet captures with repeated handshakes and keepalives, 60-80% reduction is common.
Faster Analysis
# Time to search through full capture
$ time grep -a "GET" network-capture.bin
real 0m0.050s
# Time to search through deduplicated capture
$ time grep -a "GET" deduplicated.bin
real 0m0.025s # 50% faster
Binary Data Formats
Common Use Cases
| Protocol | Delimiter | Window Size | Use Case |
|---|---|---|---|
| HTTP packets | 0x00 or 0x0a |
1 | Deduplicate requests |
| DNS queries | 0x0a |
1 | Unique query patterns |
| TLS handshakes | 0x00 |
3-5 | Handshake sequences |
| Custom protocols | Variable | 1-3 | Protocol-specific |
| Binary logs | 0x0a or 0x00 |
1 | Application logs |
Delimiter Selection
Common binary delimiters:
- 0x00 (null byte) - C-style string termination
- 0x0a (newline) - Line-oriented binary
- 0x0d0a (CRLF) - Network protocols
- 0x1a (EOF marker) - Some file formats
- 0xff (all bits set) - Custom protocols
Working with pcap Files
For pcap files, you may need preprocessing:
# Extract payloads from pcap
tcpdump -r capture.pcap -w - -x | \
# Process hex output
grep "0x" | \
# Convert to binary
xxd -r -p | \
# Deduplicate
uniqseq --byte-mode --delimiter-hex 00 --quiet
Integration Examples
Wireshark Workflow
# Export packets from Wireshark
tshark -r capture.pcap -T fields -e frame.protocols > packets.txt
# Deduplicate protocol sequences
uniqseq packets.txt --window-size 3 --quiet > unique-conversations.txt
Zeek (Bro) Logs
# Deduplicate Zeek binary logs
uniqseq conn.log \
--byte-mode \
--delimiter-hex 0a \
--quiet > conn-deduped.log
Custom Protocol Analysis
from uniqseq import UniqSeq
# Analyze custom binary protocol
uniqseq = UniqSeq(delimiter=b"\xff", window_size=1)
with open("protocol.bin", "rb") as f:
with open("unique-messages.bin", "wb") as out:
data = f.read()
for packet in data.split(b'\xff'):
if len(packet) > 0:
uniqseq.process_line(packet, out)
uniqseq.process_line(b'\xff', out)
uniqseq.flush_to_stream(out)
# Print statistics
stats = uniqseq.get_stats()
print(f"Unique packets: {stats['emitted']}")
print(f"Duplicate packets removed: {stats['skipped']}")
When to Use This
Good candidates: - ✅ Network captures with repeated handshakes - ✅ Protocol dumps with keepalive messages - ✅ Binary log files with structured data - ✅ IoT device communication logs - ✅ Game network traffic analysis
Not recommended: - ❌ Encrypted traffic (all bytes appear random) - ❌ Compressed binary data (low entropy) - ❌ Random data / cryptographic material - ❌ Small binary files (<1KB)
See Also
- Byte Mode - Binary data processing
- Custom Delimiters - Delimiter configuration
- Window Size - Multi-packet sequences
- Memory Forensics - Binary memory dump analysis