Pentagon Lock Security Defense Concept Illustration (Getty images)

The rapid development of generative AI and large language models (LLMs) has created an opportunity for the US intelligence community to break with its long-standing reluctance to use unclassified information — a reluctance that has until this point largely closed an entire avenue of intelligence and information-gathering and made the US and its allies much more vulnerable to strategic or tactical surprise.

But while adoption of AI models and LLMs by national security agencies could be technologically transformative, it must be done in the right way, without transferring longstanding biases treasuring classified information over unclassified information, or the opportunity to substantially improve America’s intelligence capability will be missed.

Government agencies have long monitored news reports and what is said online, starting with the Foreign Broadcast Information Service, but they have largely overlooked — or at best, siloed off — the huge store of publicly available information from other reputable sources. There is now a rich stream of such data. For instance, just days before Hamas’ surprise attack on Israel last October, there was an upsurge in visits to the Arabic-language web content about many of the locations subsequently targeted by Hamas. A properly trained AI model could have detected these and provided critical early warning.

The Intelligence Community has traditionally regarded anything that is secret as inherently better than anything that is not secret, the idea being “if only I know this, then it’s much more valuable than something everybody else knows.” At the height of the Cold War, that would have been true because most good intelligence was secretive; the internet or forums where data was publicly available did not yet exist. Yet the mindset around the perceived superiority of classified information has persisted even as good-quality online information has mushroomed. It is a cultural problem that many within the Intelligence Community believe should be addressed.

To its credit, the Intelligence Community has long recognized it has a challenge here. For over a decade, the prominence of open-source missions and agencies has risen significantly in Congressional language, in agency reviews, and in Presidential Executive Orders. This year’s Intelligence Community strategy [PDF] uses the strongest language yet to address this problem. Yet, despite lots of writing, several new agencies or offices created, and multiple Executive Branch-backed initiates, they’ve still failed to break through long-held institutional biases.

Success or failure to use AI to its fullest in the IC will hinge on how analysts and officers go about training generative AI and LLM models to synthesize and analyze the vast amount of publicly available data that could enhance the classified sources they already employ. Critically, during this process, they will need to be careful not to transfer their inherent reservations about unclassified information to the models. Without such caution, the risk of human bias literally becoming machine bias is incredibly high and will limit the potential benefits of widening the scope of data sources.

One way such biases could be introduced into training AI models or LLMs is when the trainer grades how well the model answers the questions it is asked. If the answer cites unclassified information, when the trainer believes there is a better classified source it could have used, they may mark it down. If this pattern is repeated time and time again during training, then the model will have effectively been taught that it should disregard unclassified data. Therefore, when training generative AI models and LLMs, intelligence agencies would need to be expansive with both the set of data they train the models on and the set of graders they use to assess the quality of the content that’s being sourced.

The other principal concern for the introduction of bias is the propensity for the Intelligence Community to do things differently on high-side (or classified) computer networks as opposed to low-side (or unclassified) networks. The community is no doubt already at work experimenting with LLMs and Generative AI on both networks — however, because of security concerns, these are likely segregated efforts. The true transformational opportunity here is to let a model work across both networks, even if its results must only be displayed on the classified platform. If the models and resulting outputs are kept separated, the institutional bias will just be further perpetuated.

This is a pivotal moment. There’s now an opportunity to tackle a pre-existing and counterproductive prejudice within the Intelligence Community that everyone there recognizes. Yet, if they are not careful, intelligence analysts tasked with training models could end up setting the bias problem in “technological stone” and make it even more intractable.

In other words, there is a risk that analysts may spend the next few decades teaching models to make the same mistakes that they made in the past in excluding unclassified data from their assessments. However, with training where the only bias is towards the best quality data, generative AI models and LLMs could transform intelligence-gathering for the US and its allies.

Joshua Haecker is Head of Product at FiscalNote Global Intelligence. He is a former Intelligence Specialist in the US Army.