Enhancing military decision-making through simulation-guided post-training of Large Language Models
Publication date
2026-01-20
Document type
Conference paper
Author
Becker, Leon
Rose, Oliver
Organisational unit
Chair of Modeling and Simulation, University of the Bundeswehr Munich
Conference
Publisher
Universitätsbibliothek der HSU/UniBw H
Book title
Artificial Intelligence in Security and Defense : Proceedings of the workshop AI4SD
First page
25
Last page
30
Peer-reviewed
✅
Part of the university bibliography
Nein
Language
English
Keyword
Large Language Models (LLM)
Military decision-making
Reinforcement fine-tuning
Simulation-guided training
Combat simulation
Group relative policy optimization
Abstract
The capabilities of Large Language Models (LLMs) have rapidly evolved, enabling them to perform increasingly complex reasoning tasks. However, while their general reasoning abilities are shaped during large-scale pretraining, domain-specific reasoning such as tactical decision-making in military contexts requires dedicated post-training. This paper introduces a simulation-guided Reinforcement Fine-Tuning (RFT) approach in which reward signals are derived from the outcomes of a combat simulation environment. By embedding a military-grade simulator into the RFT loop via the Group Relative Policy Optimization (GRPO) algorithm, model outputs are evaluated based on tactical effectiveness rather than human annotations or rule-based correctness. A proof-of-concept study demonstrates that the method significantly improves the feasibility and tactical quality of generated Courses of Action (COAs), even under limited training schedules. These findings establish simulation-guided RFT as a promising direction for equipping LLMs with tactically relevant reasoning skills and pave the way toward next-generation decision-support systems in military environments.
Version
Published version
Access right on openHSU
Open access
