openHSU logo
Log In(current)
  1. Home
  2. Helmut-Schmidt-University / University of the Federal Armed Forces Hamburg
  3. Publications
  4. 1 - Initial full text publications (except theses)
  5. Enhancing military decision-making through simulation-guided post-training of Large Language Models

Enhancing military decision-making through simulation-guided post-training of Large Language Models

Publication date
2026-01-20
Document type
Conference paper
Author
Becker, Leon
Rose, Oliver
Organisational unit
Chair of Modeling and Simulation, University of the Bundeswehr Munich
DOI
10.24405/22133
URI
https://openhsu.ub.hsu-hh.de/handle/10.24405/22133
Conference
1st Workshop on AI in Security and Defense  
Publisher
Universitätsbibliothek der HSU/UniBw H
Book title
Artificial Intelligence in Security and Defense : Proceedings of the workshop AI4SD
First page
25
Last page
30
Peer-reviewed
✅
Part of the university bibliography
Nein
File(s)
openHSU_22133.pdf (631.78 KB)
Additional Information
Language
English
Keyword
Large Language Models (LLM)
Military decision-making
Reinforcement fine-tuning
Simulation-guided training
Combat simulation
Group relative policy optimization
Abstract
The capabilities of Large Language Models (LLMs) have rapidly evolved, enabling them to perform increasingly complex reasoning tasks. However, while their general reasoning abilities are shaped during large-scale pretraining, domain-specific reasoning such as tactical decision-making in military contexts requires dedicated post-training. This paper introduces a simulation-guided Reinforcement Fine-Tuning (RFT) approach in which reward signals are derived from the outcomes of a combat simulation environment. By embedding a military-grade simulator into the RFT loop via the Group Relative Policy Optimization (GRPO) algorithm, model outputs are evaluated based on tactical effectiveness rather than human annotations or rule-based correctness. A proof-of-concept study demonstrates that the method significantly improves the feasibility and tactical quality of generated Courses of Action (COAs), even under limited training schedules. These findings establish simulation-guided RFT as a promising direction for equipping LLMs with tactically relevant reasoning skills and pave the way toward next-generation decision-support systems in military environments.
Version
Published version
Access right on openHSU
Open access

  • Privacy policy
  • Send Feedback
  • Imprint