Back in 2016, Mike Avanzini published a multi-part WDS blog series on researching Orders of Battle. He explains where the data comes from, how it’s validated, and how a designer turns that research into something the game engine can actually use. This two-part post is meant as a follow-up in the same spirit, but with a wider lens: instead of focusing on one period, it examines how the “archive to OOB” problem changes over the WDS time span and across series.
Part I covers the pre–World War II world: from eras when sources are thin and often rhetorical, such as antiquity or the Middle Ages, through the early modern period, when muster rolls and reports exist but don’t always agree (Musket & Pike), and into the 19th century, when documentation grows. Yet interpretation still matters (Napoleonic Battles and Civil War Battles). Part II will shift to World War II, where the challenge flips: there’s no shortage of documents, but contradictions, definitions, and long-lived myths can be just as treacherous as missing data.
At its core, building a scenario is equal parts history and craft. An Order of Battle isn’t just a roster—it’s a set of decisions about who fought, in what strength and organization, with what equipment, and in what condition.
That means moving between primary accounts, administrative records, and modern scholarship, then translating all of it into the series’ language: unit strengths, quality, fatigue, etc. In practice, most battles require a tailored OOB because the same formation can look very different from week to week due to attrition and other factors. That said, designers do sometimes reuse a parent OOB across closely related scenarios, campaigns, or “variant” setups, especially when the underlying force structure is the same and only the situation or start time changes.
We cover battles across the ages in series like Sword & Siege, Musket & Pike, Civil War Battles, Napoleonic Battles, Panzer Campaigns, and more. Each era brings its own research traps. In ancient and medieval warfare, the problem is often scarcity: the sources are fragmentary, biased, or both, and you’re building a plausible reconstruction rather than copying a table. In the early modern period, paperwork improved, but it didn’t always tell you who actually showed up on the day. By the 19th century, the records are richer. Yet even here, there are still gaps, such as mismatched dates and competing interpretations. The through-line is simple: don’t rely on a single source, be honest about what you’re inferring, and keep testing until the scenario feels like the history you’re trying to model.
Choosing the Battle and Gathering Sources
Everything begins with choosing a battle to recreate. For a game designer, this involves balancing interest and information: the battle should be interesting to play, but also have enough historical data available to construct an OOB. We often focus on famous battles within its game series – for example, medieval sieges and field battles for
Sword & Siege, or famous World War II operations for
Panzer Campaigns. Often, these are iconic engagements (e.g.,
Agincourt 1415 (in
Age of Longbow, Vol. I), or
Kursk 1943) that players care about. However, sometimes designers also tackle more obscure battles, which can excite grognards by shedding light on lesser-known clashes – provided the research is up to the task.
Once a battle is chosen, research kicks off in earnest. The scenario designer collects all available sources on that battle: primary accounts (such as chronicles, memoirs, and official reports) and secondary analyses by historians. The goal is to identify all the units involved and their approximate strengths, as well as leadership, equipment, and condition. For modern battles, our designers often start with a definitive book as the main reference, then drill down into archival records for details. Mike Avanzini noted that for World War II, "we live in a golden age of accessible scholarship" – authors like David Glantz, David Stahel, Jason Marks, and others have published detailed operational histories that include divisional-level OOBs. These can serve as a solid starting point. But to flesh out an OOB down to regiments, battalions, or even companies, designers need more granular data, such as tables of organization and equipment (TO&Es) and daily unit reports. Mike described building an impressive personal library of over 200 military books and collecting hundreds of gigabytes of primary documents – from German archival rolls (NARA) to Soviet archive files (TsAMO) – so that the team can reference original war diaries, strength reports, and organizational charts. In short, for a well-documented battle, no stone is left unturned: published histories, official archives, personal memoirs, unit war diaries, and even foreign-language sources are gathered. This provides multiple viewpoints and data points to compare.
Comparing sources is not just about collecting more books; it’s also about judging what kind of evidence each source actually is. A designer has to ask basic but important questions: was the author present, or did they know people who were? How long after the event did they write? Who was the intended audience, and what did the author gain by telling the story a certain way? A dispatch written for a superior, a memoir written for reputation, and a chronicle written to inspire patrons can all describe the same battle, yet pull the numbers in different directions. Even “good” sources can be precise about what they saw and careless about what they heard. This kind of source criticism matters as much as the arithmetic, because it tells you which figures are likely to be estimates, which are propaganda, and which are the closest thing to a headcount.
For earlier eras, the hunt is often more challenging. The sources might be limited to a few chronicles or archaeological finds. When designing scenarios in the Sword & Siege series, the research may involve reading medieval chronicles, studying academic papers on troop numbers, or consulting translations of ancient texts. Sometimes indirect sources must be used when exploring 6th-century late Roman or Byzantine battles. To reconstruct a late Roman army’s OOB, one might rely on the narrative of Procopius (who described Emperor Justinian’s wars) and on military manuals like Maurice’s Strategikon. The Strategikon is a Byzantine field manual from the late 6th century that describes how the army was organized and fought. Even though it was written after some of the battles in question, it provides invaluable clues about unit structures and tactics (“how the army fought, who fought… and how they were organised”).
In essence, designers use such treatises to fill gaps in the record. If we lack an exact OOB for, say, a Roman expedition in 540 AD, we can infer its composition from what the Strategikon says about typical cavalry and infantry units of that era. Likewise, the Notitia Dignitatum (a late Roman list of units from circa 400 AD) and later writings, such as those of Emperor Leo VI, can help triangulate which units might have existed and their relative sizes.
The Strategikon (presumably) by Emperor Maurice, on amazon.com
No matter the era, one principle stands: no single source is sufficient. Our designers emphasize that no single book or document provides everything needed for a full scenario OOB. For example, in researching
Panzer Campaigns: Scheldt ’44 (a WWII campaign in the Netherlands), they found that “
no single source, either in print or on the internet, outlined the Allied or German organization for this period in sufficient detail and accuracy” to build the OOB. Instead, they had to piece together clues from a variety of primary and secondary sources, many of which had incomplete or even contradictory information. The process became one of historical detective work: gathering snippets from unit war diaries, battle reports, organizational charts, and existing histories, then assessing the validity of each and reconciling conflicts. The result might be a composite OOB that isn’t found verbatim in any book, but emerges from comparing multiple references. This multi-source approach is what lifts a scenario from generic to authentically researched.
In the following sections, we will travel through time – from antiquity to modernity – seeing concrete examples of the challenges and methods of building historical OOBs. We’ll see how designers handle eras with almost no data, eras with biased or wildly divergent accounts, and eras with an overload of data that still needs careful sifting. Throughout, remember that the aim is a scenario that both reflects historical reality and provides a fair, engaging game. That means designers sometimes have to make tough choices when sources conflict – and those choices are exactly what fascinate grognards who read these behind-the-scenes insights.
Scarce Sources and Plausible Reconstruction (Antiquity & Medieval)
In the distant past, commanders did not leave neat staff reports or detailed unit returns for historians to find. Often, the only accounts of a battle come from one side – or are laden with propaganda. Scenario designers tackling these periods must be part historian, part sleuth, and part skeptic. The goal is to construct a plausible OOB that fits the fragmentary evidence. Let’s look at some examples from antiquity and the Middle Ages where sources are sparse or skewed, and how a designer can still create a credible scenario.
Case Study: The Battle of Kadesh (1274 BC)
For instance, Ramesses boasted that the Hittite king Muwatalli II fielded 2,500 heavy chariots, each carrying three warriors, and that after a desperate fight, the Egyptians had destroyed “2,000 enemy chariots”, which would have been virtually the entire Hittite force.
While not (yet?!?) a battle featured in WDS games, Kadesh is a classic illustration of ancient source issues. Our knowledge of Kadesh, a chariot battle between Pharaoh Ramesses II of Egypt and the Hittite Empire, comes almost entirely from Egyptian records – and Ramesses had every incentive to exaggerate his glory. Egyptian inscriptions and the famous poetic account of the battle depict an Egyptian army caught in a Hittite ambush, only to be saved by Ramesses’s personal valor. These records claim enormous numbers for the enemy.
Such figures strain credulity – 2,500 chariots would imply something like 7,500 Hittite charioteers, a number logistically and tactically daunting for the Bronze Age. Modern historians have treated these numbers with caution. In fact, analysis of the battle ,comparing Egyptian and the scant Hittite sources we have, suggests the actual Hittite chariot contingent was likely much smaller. One modern estimate is that Muwatalli may have had on the order of 1,500 chariots at Kadesh, not 2,500 – the Egyptian scribes appear to have inflated the figures, perhaps counting reinforcements or using “symbolic” numbers. The
Poem of Pentaur (the Egyptian epic) “
greatly exaggerates the power of Ramesses, and the position of Egypt when the battle is over”, essentially spinning a stalemate (or even a narrow escape by Ramesses) into a propaganda victory for Egypt.
Bas-relief of Ramesses II on his chariot during the Battle of Kadesh, Great Temple of Abu Simbel, Egypt (Diego Delso, delso.photo / CC BY-SA)
If one were building a scenario for Kadesh, how to proceed? The designer would compare sources and modern research: Egyptian reliefs and texts give a general outline (the division of the Egyptian army into four named divisions, the Hittite use of an initial chariot ambush, etc.), but their numbers would be taken with a grain of salt. One would incorporate plausible numbers based on scholarly estimates – maybe field 1,500 Hittite chariots instead of 2,500, and perhaps around 16,000–20,000 infantry (Egyptian accounts mention two Hittite infantry divisions of 18,000 and 19,000, but notably these never engaged). The Egyptian army’s size (perhaps ~20,000 men and 2,000 chariots, according to Ramesses’s own inscriptions) would also be cross-checked against logistical realities and modern Egyptology; some suggest Ramesses led about 20,000 troops with 4,000 charioteers (2 men per chariot). The discrepancy between sources is itself an important clue: when ancient accounts yield impossibly large or one-sided numbers, designers assume propaganda or error, and adjust towards consensus estimates. In the absence of a neutral third-party record, one might lean on what archaeology and warfare studies indicate about army sizes in that era. In short, a Kadesh scenario OOB would not naively copy Ramesses’s bombast; it would be a reconstruction that balances the Egyptian narrative with rational analysis, giving players a scenario that feels historically authentic rather than legendary hyperbole.
Case Study: Caesar’s Gaul
Fast forward to classical antiquity, and we still face one-sided sources. Gaius Julius Caesar’s Commentaries on the Gallic War are a seminal firsthand account, but they were also political documents meant to boost Caesar’s reputation in Rome. Caesar meticulously lists his own legionary forces, but when it comes to enemy numbers, he is less reliable. A famous example is the Siege of Alesia (52 BC), where Caesar claims he besieged 80,000 Gauls under Vercingetorix within Alesia while an external Gallic relief force of a quarter of a million men approached. He writes that his legions, numbering perhaps 50,000, faced over 330,000 Gauls in total – an implausibly huge number even by ancient standards. Later ancient writers, such as Plutarch and Strabo, repeated or embellished these figures (Plutarch says 470,000 Gauls; Strabo 400,000). Modern historians, armed with archaeological evidence from the probable site of Alesia, strongly doubt those figures. The consensus is that the relief army was much smaller, maybe on the order of 50,000–100,000 warriors. Caesar likely exaggerated to magnify the glory of his victory. Today’s estimates put the total Gallic forces at a fraction of Caesar’s claim, with the aforementioned 50k–100k being more credible.

Vercingetorix throws down his arms at the feet of Julius Caesar (painting by Lionel Royer, 1899, Public Domain)
A scenario designer recreating Alesia must therefore make judgment calls: How many Gauls to include in the relief force? Using Caesar’s 250,000 would produce a ludicrous scenario in which the map is flooded with Gauls; using modern estimates around 80,000 (combining both the main force and the relief forces) might be far more reasonable. The designer might cite the analysis of Hans Delbrück or other military historians who examined Alesia’s numbers, and opt for a mid-range that fits the excavated fortifications (archaeology can tell us how big Caesar’s double lines of circumvallation were, hence roughly how many troops could man them). The key is that primary sources, even from literate generals, have bias – Caesar wanted to appear outnumbered and heroic, just as Ramesses did. A wargame scenario aims to represent what most likely happened, not the self-aggrandizing narrative. Thus, the OOB would be tweaked: perhaps 10 Roman legions (~50,000 men) vs ~70,000 Gauls (10,000 inside Alesia and ~60,000 in relief, for example). That still gives the Gauls a numerical edge, but not an absurd five-to-one ratio. Meanwhile, quality ratings could reflect Caesar’s professional legions versus the mixed quality of the Gallic tribal levies (some fierce warriors, but less organized). In effect, the designer produces a historically plausible balance: the Romans have a tough fight, but not an impossible one, which aligns with the fact that Caesar did prevail, yet with difficulty.
Late Antiquity and Early Medieval
Moving into the late Roman and early medieval era (roughly the 4th–8th centuries), sources remain scarce. This is sometimes lazily bundled under the label “Dark Ages,” but that term is misleading—Peter Brown famously argued in his phenomenal work "The World of Late Antiquity" that what we’re seeing is less a straightforward decline than a long transformation of Roman society and its institutions. The documentation gaps, in other words, shouldn’t be waved away as “darkness,” but understood as the result of changing administrative practices, survival of archives, and what different authors chose to record. We might have a single main narrative source for a war (Procopius for Justinian’s 6th-century campaigns, or, for an earlier example, Ammianus Marcellinus for 4th-century battles).
These writers sometimes give numbers, but not always, and their reliability varies. Procopius, for instance, is generally detailed but might not always list exact troop counts – and later copyists or secondary chronicles sometimes garbled those figures. Designers benefit greatly from supplementary materials such as military manuals. We already mentioned the Strategikon of Maurice; this text provides insight into unit structures (e.g., how many men were in a typical cavalry tagma or infantry numerus, and how units were brigaded). By Procopius’s time, the Eastern Roman (Byzantine) army was in transition, but historians can correlate Procopius’s descriptions of field armies with those laid out in the Strategikon (written perhaps a generation later). For example, Procopius might say Belisarius took “all the cavalry” in a certain campaign – the Strategikon tells us what proportion of an army was cavalry and how they were armed. Using such cross-references, a designer can reconstruct an OOB even when numbers are not explicitly given. It becomes an educated reconstruction: “Belisarius had perhaps 15,000 troops, composed of X cavalry units (cataphracts, bucellarii, etc.) and Y infantry units (limitanei, foederati, etc.), based on analogies to Maurice’s reforms and other references.” Primary accounts might mention the names of key units or commanders, which can be included in the scenario for flavor and accuracy, even if we guess their strength.

Battlemap of the Battle of Taginae/Busta Gallorum (from A History Of The Art Of War, The Middle Ages From The Fourth To The Fourteenth Century (1898), Public Domain)
For instance, if designing a scenario for the Battle of Taginae (552 AD) from the Gothic War, Procopius tells us which Roman general led which wing and that the Ostrogoth king had a bodyguard of elite troopers – but the actual counts of each contingent aren’t precise. The designer would draw on research by modern historians (many of whom have tried to estimate those armies) and muster an OOB that looks right for the period. If need be, they borrow data from slightly later periods: the structure of a late Roman/early Byzantine army can partly be derived from other works, such as the Strategikon and the Notitia Dignitatum, because military institutions evolved gradually.
Medieval Chronicles and Conflicting Numbers
As we enter the
High Middle Ages and Crusades (12th–14th centuries), the source base widens a bit – but paradoxically, that can create more confusion. There may be multiple chronicles (Christian, Muslim, etc.) describing the same battle, each with different biases and figures. Medieval chroniclers often had little direct knowledge of army sizes and sometimes repeated hearsay or symbolic numbers (for example, citing “40,000” or “100,000” in a very loose sense). For example, during the Crusades, accounts of major battles or sieges might vary wildly: one source might say the Crusaders had 50,000 knights (a gross exaggeration), while another contemporary might say 10,000. The true figure could be lower still. A modern scenario designer working on, say, the
Siege of Antioch (1097–1098) (in
Crusades, Book I) or the
Battle of Hattin (1187) (in the upcoming
Crusades, Book II) has to sift through these chroniclers.
Battle of Hattin, as shown in Sword & Siege: Crusades, Book II
They would likely consult modern works – academic histories that critically assess the primary sources – to get a consensus OOB. Even then, uncertainty is the norm. The designer might list certain contingents with a strength range in their notes (“perhaps 7,000–12,000 Crusader infantry at Hattin, split among these barons’ retinues”) and then choose a plausible midpoint for the game. What’s important is internal consistency and plausibility rather than claiming absolute certainty. Including a bibliography or notes can be helpful – indeed, the
Sword & Siege: Crusades game provides a bibliography on their website, precisely because “medieval battle reports can vary wildly depending on where you’re getting your information”. We want to show players which sources were used to compile the OOB, underscoring that a lot of debate and interpretation went into those numbers.
Case Study: Agincourt (1415)
To illustrate medieval source conflicts, consider the Battle of Agincourt, one of Sword & Siege’s topics (specifically in
Age of Longbow). Agincourt’s narrative is famous: a small English army led by Henry V defeated a much larger French feudal host. But how much larger? For centuries, it was accepted that the English were outnumbered about 5-to-1 – roughly 6,000 English against 25,000–30,000 French. This view was based on the accounts of chroniclers and later historians. However, in recent years, a scholarly debate erupted.
It is also worth remembering that modern historians are not neutral measuring instruments. They can have their own biases, and they also face incentives: reputations are built on new interpretations, and publishers like clear, dramatic claims. That does not make revisionist work wrong, but it does mean a designer should look closely at what the “new” argument rests on. In Agincourt’s case, the debate is not simply “old myths versus new facts,” but a question of what the surviving documentation can and cannot capture, and what it may systematically leave out.

French battle-plan preceding the Battle of Agincourt, drawn up by Marshal Boucicault, Captain of Normandy, and others. Between 13th-21st October 1415 (Public Domain)
In 2005, historian Anne Curry published a major study arguing that the French army at Agincourt was much smaller than traditionally stated and that the English were closer to parity than the familiar story suggests. Her case draws heavily on administrative survivals such as muster and pay records. This approach has real strengths: such documents are often closer to “countable” evidence than narrative chronicle figures. The problem is that they can also be incomplete in ways that are hard to quantify. They may omit men who were present but not properly recorded, undercount armed servants and attached personnel, or reflect paperwork created at a different moment than the battle itself.
From a common-sense perspective, there are reasons to be cautious about any conclusion that equates the English invasion force with, or makes it larger than, a French host raised locally. Henry V’s army had to cross the Channel and had already taken heavy losses at Harfleur (very likely more from illness than combat), while the French had been shadowing his march and had time to call up additional forces. None of that proves the traditional “vast numerical superiority” figures are right, but it does suggest that a strong French advantage in numbers is not inherently implausible. At the same time, Agincourt is a reminder that larger numbers are not automatically an advantage: in the constrained space and muddy conditions of the battlefield, mass and crowding could become a liability, which helps explain how a numerically superior French force could still lose decisively.
For the designer recreating Agincourt, this debate directly affects the OOB. He has to decide which interpretation to follow, or how to reflect the uncertainty. All those modern sources have their own numbers. The scenario could offer an alternative setup or commit to one set of numbers.
The French might be given say 20,000 troops (including thousands of crossbowmen and militia who historically were present but not very effective) against ~6,000 English (about 5,000 longbowmen and a thousand men-at-arms). If they lean toward Curry’s findings, the French might be reduced to around 12,000 total. (In the concrete case, the game has two variants, “068. Azincourt_a” and “069. Azincourt_b”) reflecting exactly this.
In any case, the composition of the armies is clearer than the total numbers: both sides’ types of units are well known (English longbowmen and men-at-arms vs French knights, men-at-arms, crossbowmen, etc.). The quality differences are also part of the scenario design: the French army at Agincourt had many noble knights of high individual quality, but was poorly led and in disarray, which can be reflected by giving them strong combat stats but poor cohesion or command-and-control in-game. The English, though weary from marching, were disciplined and led by an inspired Henry V. English units might have higher morale or better leadership bonuses, and they were deployed ideally for the terrain.
Agincourt underscores that even with multiple primary sources, “truth” is elusive. Medieval sources “vary wildly,” and it’s hard to judge accuracy. The best a designer can do is consult the range of scholarship and make a case for their chosen OOB. Notably, we are transparent about this by listing bibliographies – inviting informed players to see why certain choices were made. Such notes turn the scenario into a miniature historical argument, which many wargamers appreciate.
The broader lesson is that “more documentation” does not automatically mean “more certainty”: sometimes the best-preserved records are still only a partial window, and sometimes the most quoted narratives are also the most interested.
In summary, building OOBs for antiquity through the medieval period involves coping with limited and biased sources. The strategies include: using later or analogous sources to fill structural gaps (like using the Strategikon for late Roman armies), taking all headcounts with skepticism and adjusting toward plausible values (discounting clear propaganda numbers), and reconciling multiple conflicting chronicles by favoring those supported by modern research (or a middle ground among them). The result isn’t a single “correct” OOB – it’s a reasoned reconstruction. And because scenario designers also must consider playability, they will ensure that whatever numbers they choose still yield a scenario that plays out credibly. A battle where one side has no chance might require careful victory conditions or be framed as an asymmetrical challenge. Often, though, the exercise of digging into sources reveals that few battles were as one-sided as myth would have it; there were usually opportunities for either side, which is exactly what a good wargame scenario provides.
Early Modern Conflicts: Piecing Together Musket & Pike Armies
By the 16th and 17th centuries, warfare had evolved – and so had record-keeping, albeit slowly. In the
Musket & Pike era (roughly 1500–1700), armies grew more professional, and states kept better track of their troops (for pay, logistics, and related purposes). However, the reports we have are still imperfect. Many accounts from this era come from commanders' memoirs, diplomatic dispatches, early newspapers, and official musters. They often conflict or leave gaps. Creating an OOB for a
Thirty Years’ War battle or an
English Civil War clash still requires detective work, though of a different flavor than medieval battles.
During this era, sources began to provide regular musters and strength returns, although these were often monthly or irregular. They indicated the number of men on the rolls, but did not necessarily reflect how many actually appeared on the battlefield, as illness, desertion, and other losses could reduce the numbers. Commanders’ after-action reports frequently included casualty figures and sometimes estimates of enemy strength. But these reports could be biased, as few generals were willing to admit to heavy losses or acknowledge the enemy’s true strength. Additionally, multiple eyewitness accounts might exist for the same event, offering differing perspectives, such as a Swedish officer’s account versus a Catholic League report for a battle in the Thirty Years’ War. The concept of organized orders of battle was beginning to emerge, with some sources recording orders of battle by brigade or regimental names. However, standardization was lacking: one source might list units by the colonel’s name, while another might use the regiment's title, making it difficult for modern researchers to match them.
Case Study: A Thirty Years’ War Battle (Lützen 1632)
The Battle of Lützen (1632) saw Swedish King Gustavus Adolphus face the Imperial general Wallenstein. Lützen’s OOB is comparatively well documented for the time; we have a general sense of the brigades on both sides. Yet, sources even differ on the exact numbers of men and guns. The Swedish army’s strength is often given as around 19,000, the Imperial's as around 18,000. However, some records differ by a few hundred here or there. The infobox of the Wikipedia article, for instance, shows one source citing 19,175 men for the Swedes vs 18,738 for the Imperials. Another source might round those to 19,000 vs 18,000. The Swedes had about 60 guns according to one report, while another said 43 – possibly one is counting only heavy field guns and omitting regimental pieces. Casualty figures are similarly debated: Swedish losses are often stated around 5,000, Imperial around 6,000, but how many of those “wounded” were lightly hurt or returned to ranks is unclear.
For a scenario designer, these discrepancies are relatively minor – whether the Swedes had 18,700 or 19,100 men doesn’t change the scenario much. A bigger challenge in Musket & Pike battles is often identifying unit quality and armament. By this time, armies were mixtures of veteran regiments and raw recruits, well-equipped units and under-supplied ones. The
Thirty Years’ War was notorious for units dissolving and reforming, or serving under new paymasters. A scenario OOB has to decide, for example, if Wallenstein’s infantry regiments at Lützen were at full strength or had detached garrisons? Were the musketeer-to-pikeman ratios standard, or had they run low on firearms? These details might be found in specialized studies or muster records discovered in archives. Often, scenario designers rely on military historians who compiled such data. In our research process, we mention using archival records for accuracy whenever possible. For early modern battles, that could mean digging up a translated Swedish chancellery record or a captured regimental list from the Imperial side.

Death of Gustavus Adolphus of Sweden in the battle of Lützen (painting by Carl Wahlbom, Public Domain)
Another issue is that nomenclature and structure were not standardized like in modern armies. A “tercio” or “regiment” could vary in size. So, a Musket & Pike scenario might list a regiment of 600 musketeers in one battle, but 1,000 in another, reflecting its actual strength at that time. Designers must avoid the trap of assuming paper strength equals field strength. For example, a Swedish infantry regiment might theoretically have been 1,200 men, but at Lützen, many Swedish units had been in action earlier and might have had only 600-800 effectives left. If one source lists the full roster, the designer must scale it down to a realistic number for the day. This is where modern research and even archaeology help – remarkably, Lützen has been subject to archaeological study (mass graves were unearthed, etc.), giving insights into casualties and possibly the scale of forces present.
In general, for early modern OOBs, designers will:
- Use known OOB records (if available) for unit names and commanders. E.g., they know the Swedish brigades at Lützen were the Yellow, Blue, etc., brigades, each composed of certain regiments.
- Use estimated strengths from modern studies, then drill down into more detailed works and primary material where possible. If multiple sources disagree, either bracket the range and choose a defensible midpoint, or privilege the best-documented source closest to the event.
- Classify unit quality based on context. In Musket & Pike, not all pike units are equal – e.g., veteran mercenaries vs conscripted militia. For example, at the Battle of Rocroi (1643), a Spanish tercio of elite guards might be highly morale, whereas a newly raised French militia unit would be low in morale. Contemporary accounts often comment on troops’ training or bravery, which designers translate into game ratings.
- Handle casualties and reinforcement carefully. Many battles have multiple phases; scenario designers sometimes create sub-scenarios (such as one that starts at a particular phase) to model this. Alternatively, they might introduce reinforcements entering later (if, say, a detachment arrived mid-battle). They ensure that initial fatigue levels might reflect long marches – e.g., if one army forced-marched to the field at dawn, the scenario can start those units at a higher fatigue level to simulate their tiredness.
A concrete challenge in the Musket & Pike era is that sources often describe the same army in different—and sometimes incompatible—ways: units are named differently (by colonel, region, or title), grouped differently, and reported at different dates or on-paper strengths. The designer’s job is to reconcile those versions into one consistent hierarchy and set of strengths that the engine can use. Once that groundwork is done, the scenario can reflect what matters most: not just who was present, but why some formations—veteran brigades versus newly raised troops—behave very differently under pressure.
To illustrate with a shorter example: Suppose WDS is making a scenario for the Battle of Naseby (English Civil War, 1645). They’d find that Parliamentarian sources list their New Model Army’s regiments and strengths fairly well (the New Model Army kept decent records), but Royalist numbers are sketchier – perhaps gleaned from letters or later estimates. The designer might rely on scholarly consensus that Fairfax’s New Model had about 13,000 men and King Charles’s Royalists about 8,000. But then dive in deeper: how many of those were cavalry vs infantry? Which units were understrength? There’s evidence (from muster rolls before the battle) that some New Model infantry regiments were understrength due to sickness. A thorough scenario might actually reduce those regiments’ strength accordingly. It might also represent the Royalist musketeers who ran out of ammunition, by giving them a low-ammunition flag in-game terms. These nuances make the battle play out more historically: the Royalists initially hit hard (with many veterans in their ranks) but then falter, exactly as history records.

A representation of the Armies of King Charles I and Sir Thomas Fairfax exhibiting the exact order, preparatory to the Battle of Naseby, 1645 (Public Domain)
In summary, the Musket & Pike era offers more data than medieval times, but still requires interpretation. Designers cannot take one source at face value; they gather multiple records and often rely on modern historians to guide them. Reconstructing an OOB involves aligning names, numbers, and narratives from a variety of sources – effectively solving a puzzle. And as we’ll see next, even as we move into the 18th and 19th centuries with even better documentation, the job isn’t done yet – because history is full of debates at every stage.
Well-Documented Does Not Mean Simple (Napoleonic & Civil War Battles)
By the Napoleonic Wars (1800s) and the American Civil War (1860s), one might think that designing an OOB is straightforward: these are highly literate eras with copious records. Indeed, both the French Empire and its adversaries, and the Union and Confederate armies, left behind volumes of official reports, returns, and correspondence. We often have precise orders of battle preserved in archives for major campaigns – down to the names of regimental commanders and the number of men “present for duty.” However, even these wealths of information come with caveats and conflicts. Let’s consider how a scenario designer tackles Napoleonic and Civil War OOBs.
Napoleonic Battles
Our Napoleonic Battles series covers famous engagements such as
Waterloo,
Austerlitz, and
Borodino. The
French Grande Armée and its foes (
Austrians,
Russians,
Prussians,
British, and allies) all had formal staff systems that produced OOB lists and casualty returns. For example, Napoleon’s army was well organized: corps, divisions, brigades, regiments. Historians today can often reconstruct exactly which regiments were in which brigade on a given day. So, the basic structure of a Napoleonic OOB is often known. However, the strength figures can be tricky. An official return might state that on 1 June 1815, a French corps had 20,000 men. But by 18 June 1815 (the day of Waterloo), after two weeks of marching and a preliminary battle (Ligny/Quatre-Bras), the actual number in that corps could be far less. Some soldiers straggled, some detachments were left guarding the supply, and some were wounded earlier. So, a scenario builder must decide: do I use the paper's stated strength or estimate the effective strength? Typically, designers use effective strength – often gleaned from post-battle reports or later analyses.

The Campaign Of Waterloo, Illustrated With Engravings Of Les Quatre Bras, La Belle Alliance, Hougoumont, La Haye Sainte, And Other Principal Scenes Of Action... (Public Domain)
For Waterloo, we have fairly good figures: Wellington’s Allied army had around 68,000–72,000 men on the field (depending on whether you count all Dutch-Belgian militia), and Napoleon had about 72,000. But some older sources might count only certain troops (for instance, excluding part of the Prussian contingent that arrived later). The presence of the Prussian army under Blücher is another factor: if you start a Waterloo scenario at 11 AM, how do you represent Blücher’s army? They historically began arriving on the French right flank in mid-afternoon. In a game, you would schedule Prussian units to enter as reinforcements around the historical times (with some randomness, perhaps). Their strength and timing might be drawn from historical records of who arrived when. But there is debate even here – for example, how many Prussians fought at Waterloo vs how many were held off by French rearguards at places like Wavre. Designers consult the literature: some sources say about 30,000 Prussians made it to Waterloo in time, others give different numbers. The scenario will reflect a reasoned choice (maybe 30,000 in the late game, spread over a few turns of arrivals).
Quality ratings in Napoleonic scenarios are crucial, and those can be contentious. Take Waterloo again: the French Grande Armée of 1815 was not the veteran force of 1805; many units were newly raised in a hurry after Napoleon’s return from exile. Some historians argue that French infantry quality in 1815 was mixed, whereas the British army had a core of veteran redcoats (though also many inexperienced Dutch-Belgian troops). Units classified with grades of quality (e.g., “A” elite, “B” seasoned, down to “F” militia, etc. in John Tiller’s system. Assigning these grades is part art, part science: it involves reading unit histories to see their experience. For instance, the British 1st Foot Guards at Waterloo – definitely an “A” or “B” elite unit. The French Middle Guard – also elite. But the French line conscripts might be “C” or “D” to simulate shakiness. These judgments are informed by how units have historically performed (whether they have held or run) and by known factors such as how long they have been in service.
American Civil War Battles
The Civil War Battles series benefits from a unique resource: the Official Records (OR) of the Union and Confederate armies, a massive compilation of reports and returns published after the war. From the OR, designers can pull fairly detailed OOBs for battles such as Gettysburg, Antietam, Chickamauga, etc. For example, the OR includes tables of “effective strength” for many battles – so we might read that on July 1, 1863, General Lee’s Army of Northern Virginia had X number of officers and Y number of men present for duty in each brigade. This seems like an OOB designer’s dream. And indeed, the John Tiller games, which WDS has continued to produce and maintain, often used the OR figures as a research base. However, caution is still needed. Some returns were incomplete or arrived late. Confederate records in particular are spotty (many were lost or never recorded accurately). Historians still debate, for instance, how many Confederate soldiers were at Gettysburg – estimates range from about 70,000 to 75,000 or more. The Union Army of the Potomac’s strength is better documented (around 82,000–85,000). A discrepancy of a few thousand can come from whether one counts support units, teamsters, stragglers, etc. A designer likely uses the numbers of “present for duty combatants” – which might be, say, ~71,000 Confederates and ~83,000 Federals at Gettysburg (these are within common estimates). But suppose one source uses a higher number for Confederates by including every man on the rolls, not just those with muskets in hand – that could inflate the count by 5-10%.
Case Study: Battle of Antietam (1862)
At the Battle of Antietam, 1862, General McClellan famously overestimated Confederate strength, thinking Lee had ~100,000 when Lee actually had perhaps 40,000. If one relied on McClellan’s report, one would double the Confederate OOB incorrectly. Thankfully, we now have a clearer analysis.
In ACW scenarios, unit strength is often given down to the regiment. Those are drawn from returns and battle reports. But note that even in the OR, a regimental strength might be listed weeks earlier or later than the battle, and not exactly on the day. Designers might review multiple dates and adjust accordingly. They also incorporate losses from earlier engagements if doing a multi-day battle or campaign. For example, a Gettysburg scenario covering days 1-3 might show that certain units were mauled on Day 1 and thus weaker on Day 2. The game engine might handle some of that by carrying over casualties in a campaign game, but individual scenarios might be set at specific moments.
Cover sheet of War Of The Rebellion: A Compilation Of The Official Records..., Series I, Volume 19, containing Antietam (Public Domain)
However, the ACW also has its share of myth-busting. A classic myth is the idea that the Union had overwhelming manpower in every battle – not true for some battles early on, but Confederate post-war writers pushed that narrative. Modern data compilation shows many battles were closer in numbers than claimed. A scenario designer must be mindful not to perpetuate old myths when new research corrects them.
In both Napoleonic and ACW contexts, multiple perspectives are vital. Designers will compare, for example, a Union regimental history’s numbers with the
Official Records and perhaps with a recent scholarly work that re-evaluated strengths using demographic methods. If errors or contradictions are found, they choose what aligns with the bulk of evidence. They also keep in mind that every number is a snapshot – a report might be taken in the morning, and by afternoon, after some fighting, the number is different. As noted in the
Scheldt ’44 notes, even a precise archival report is “only a snapshot of a particular moment” and not necessarily the whole campaign. That principle holds for the Napoleonic and ACW periods, too. A clever scenario might actually break a battle into segments to reflect such changes (for example, one scenario for the morning, another for the afternoon, with adjusted OOBs).
Ultimately, Napoleonic and ACW OOB design is about fine-tuning a wealth of data. It’s perhaps less guesswork than medieval OOBs, but it involves a lot of data management and ensuring consistency. By this stage, players expect that if the history books say a certain brigade was present, it will be on the scenario’s unit roster. And we strive to meet that expectation: their scenarios are often lauded for including every historically present unit, sometimes even down to separate companies or artillery batteries. This obsessive detail is possible because of the rich sources available – but managing it is a painstaking process. The payoff is a scenario where a Civil War buff can find their great-great-grandfather’s regiment in the OOB and see that it has the correct strength and parent brigade. Or a Napoleonic enthusiast can admire that the game shows Quiot’s brigade at Waterloo with the battalions exactly as per history. That attention to detail is what grognards love, and why we invest so much effort in getting the numbers right.
Next time: The World War II OOB paradox
In Part II, we shift to World War II, where the OOB problem flips on its head. Instead of scarcity, you get abundance: daily strength states, equipment counts, organizational tables, and archives full of “answers.” And yet you also get a new battlefield: competing definitions (“on hand” vs “operational”), contradictory national records, and narratives that hardened into myths before many archives even opened.

Until then, the big takeaway from Part I is simple: pre-WW2 OOB work is rarely about perfect certainty. It’s about disciplined reconstruction—anchored in sources, honest about ambiguity, and always tested against what the battle was actually like when human beings tried to win it.
Books
Last, not least, below you find a list of all books mentioned in the essay above (as always, clicking the cover leads you to Amazon.com)
Brown, Peter. The World of Late Antiquity: From Marcus Aurelius to Muhammad. London: Thames and Hudson, 1971.

Caesar, Julius. The Gallic War. Translated by Cynthia Damon. Loeb Classical Library. Cambridge, MA: Harvard University Press, 2025.

Curry, Anne. Agincourt: A New History. Stroud, UK: Tempus, 2005.

Leo VI. The Taktika of Leo VI. Text, translation, and commentary by George T. Dennis. Washington, DC: Dumbarton Oaks Research Library and Collection, 2010.

Maurikios (Maurice). Maurice’s Strategikon: Handbook of Byzantine Military Strategy. Translated by George T. Dennis. Philadelphia: University of Pennsylvania Press, 1984.

Procopius. The Complete Works of Procopius. Translated by H. B. Dewing. Delphi Classics, 2016. - This edition includes the Anekdota (Secret History), the controversial, highly salacious work in which Procopius airs the empire’s dirty laundry through a barrage of scandal, insinuation, and character assassination directed at the imperial couple and their entourage.

United States War Department. The War of the Rebellion: A Compilation of the Official Records of the Union and Confederate Armies. Washington, DC: Government Printing Office, 1880–1901. Reprint 2018 by Forgotten Books

Great article. Shows the amount of research and analysis that goes into accurate game design. Looking forward to part 2.
thank you for sharing the backstage on how the research is done. I find it very impressive and proves that your work is far beyond gaming. Thank you.
Excellent article. @Gary wrote: “Since then I’ve been very aware that designers can bring their own biases into their games for all sorts of reasons. For example by overstating/understating statistics, or omission/invention.” My concern, in this regard, has been with “fixed units”.
I contend that “fixed units” are unrealistic. Basically, once a battle is initiated, the battle plan immediately becomes toilet paper. As such, you can’t truly recreate the original battle as conducted by the original generals since each (game general) player will have different thoughts on how the battle will proceed.
The OOB is important. The arrival of units and their quality to the battlefield is a valid known factor. However, the presence of “fixed units” and their capacity to join the battle is too subjective. The game generals will not be fighting the original underlying battle. Plans change once the battle is initiated. Their battle strategy should not be “forced” to mimic the original battle.
All this talk of ancient Roman battles makes me hope for a WDS ancient game series!
@STEFAN and GREG: Well, this part is about challenges in building pre-20th-century OOBs. There will be a part 2.
Leave a comment