openHSU logo
Log In(current)
  1. Home
  2. Helmut-Schmidt-University / University of the Federal Armed Forces Hamburg
  3. Publications
  4. 3 - Publication references (without full text)
  5. MAWIFlow benchmark: realistic flow-based evaluation for network intrusion detection

MAWIFlow benchmark: realistic flow-based evaluation for network intrusion detection

Publication date
2026
Document type
Conference paper
Author
Schraven, Joshua  
Windmann, Alexander  
Niggemann, Oliver  
Organisational unit
Informatik im Maschinenbau  
DTEC.bw  
DOI
10.5220/0014463900004061
URI
https://openhsu.ub.hsu-hh.de/handle/10.24405/23075
Conference
12th International Conference on Information Systems Security and Privacy (ICISSP 2026) ; Marbella, Spain ; March 4–6, 2026
Publisher
SciTePress
Book title
ICISSP 2026 : Proceedings of the 12th International Conference on Information Systems Security and Privacy
Volume (part of multivolume book)
1
ISBN
978-989-758-800-6
First page
549
Last page
556
Peer-reviewed
✅
Part of the university bibliography
✅
Additional Information
Language
English
Keyword
Network Intrusion Detection
Flow-Based Evaluation
Temporal Drift
Benchmark Dataset
MAWILab
dtec.bw
Abstract
Flow-based Network Intrusion Detection Systems (NIDS) are typically evaluated on synthetic or short-lived benchmarks that emphasize snapshot accuracy and neglect temporal robustness. Recent studies have shown that widely used datasets such as CIC-IDS2017 contain design flaws and artifacts, casting doubt on near-perfect headline scores. In contrast, operational NIDS must cope with long-term changes in traffic, attack patterns, and annotation quality. This position paper introduces MAWIFlow, a benchmark that derives labeled flows from MAWILab v1.1 over multiple years and preserves its anomaly semantics. We construct a scalable preprocessing pipeline, define strictly time-respecting training and test splits, and instantiate representative tabular baselines and a CNN-BiLSTM model. Long-horizon robustness is quantified via a horizon-limited normalized Area Under Time (nAUT) metric adapted from concept-drift-aware evaluation. Experiments on MAWILab flows from 2007–2024 show that all models suffer substantial performance decay on future years, with 2–3 year training windows offering the best trade-off between initial accuracy and long-term robustness. Code and sampled benchmark subsets are publicly available.
Description
Licensed under CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Version
Published version
Access right on openHSU
Metadata only access

  • Privacy policy
  • Send Feedback
  • Imprint