What question did this study set out to answer?

The aim is to provide a comprehensive dataset that merges application labels with encrypted WireGuard tunnel features.

March 21, 2026Open Access

A flow-level dataset of WireGuard tunnel traffic with matched encrypted-side features and application labels

Key Points

The aim is to provide a comprehensive dataset that merges application labels with encrypted WireGuard tunnel features.
Captured traffic from both inner and outer sides of a WireGuard tunnel.
Used NFStream for flow record generation and deep packet inspection.
Matched inner packet data to outer encrypted packets using timing and padding rules.
Aggregated statistics on encrypted-side flows and exported data in Parquet format.
Released two Parquet files with detailed flow-level data from two capture sessions.
Data includes application names and categories linked to encrypted tunnel metrics.
Supports advanced research on encrypted traffic classification and VPN detection.

Abstract

This data article describes a flow-level dataset derived from paired captures on both sides of a WireGuard virtual private network tunnel. Pre-tunnel traffic was recorded on the inner tunnel interface before encapsulation, and encrypted transport traffic was recorded on the outer side, using a GL.iNet Flint 2 (GL-MT6000) router, an inline network TAP, and a Linux capture host. Two capture sessions totaling approximately 80 hours of residential broadband traffic from 10 devices were recorded with nanosecond-precision packet timestamps; the released flow-level data uses millisecond resolution as exported by NFStream. The raw captures were cleaned to retain TCP and UDP packets and to remove non-initial IPv4 fragments. Flow records were generated from the cleaned inner-side captures using NFStream, which assigned each flow an application name and application category label via deep packet inspection. Inner packets were matched to outer WireGuard transport data packets using time alignment and a padded-length consistency rule, and matched packets were attributed to flows using 5-tuple keys with temporal and capacity constraints. Encrypted-side statistics were then aggregated per flow. The released dataset consists of two Parquet files, one per capture session, that combine NFStream flow fields, including application labels and inner-side per-packet sequences for the first 255 packets, with encrypted-side derived attributes such as matched packet counts, byte totals, durations, rates, direction-specific byte volumes, packet-size statistics, inter-arrival time distributions, size-ratio metrics and outer-side per-packet sequences for the first 255 packets. This cross-correlation structure pairing pre-tunnel application labels with encrypted tunnel-side features, can support research on encrypted traffic classification, application identification, VPN detection, and feature engineering for flow-level analysis under encryption.

A flow-level dataset of WireGuard tunnel traffic with matched encrypted-side features and application labels

Key Points

Abstract

Cite This Study