For much of last year, about 2,500 US service members from the 15th Marine Expeditionary Unit sailed aboard three ships throughout the Pacific, conducting training exercises in the waters off South Korea, the Philippines, India, and Indonesia. At the same time, onboard the ships, an experiment was unfolding: The Marines in the unit responsible for sorting through foreign intelligence and making their superiors aware of possible local threats were for the first time using generative AI to do it, testing a leading AI tool the Pentagon has been funding. Two officers tell us that they used the new system to help scour thousands of pieces of open-source intelligenceânonclassified articles, reports, images, videosâcollected in the various countries where they operated, and that it did so far faster than was possible with the old method of analyzing them manually. Captain Kristin Enzenauer, for instance, says she used large language models to translate and summarize foreign news sources, while Captain Will Lowdon used AI to help write the daily and weekly intelligence reports he provided to his commanders. âWe still need to validate the sources,â says Lowdon. But the unitâs commanders encouraged the use of large language models, he says, âbecause they provide a lot more efficiency during a dynamic situation.â The generative AI tools they used were built by the defense-tech company Vannevar Labs, which in November was granted a production contract worth up to $99 million by the Pentagonâs startup-oriented Defense Innovation Unit with the goal of bringing its intelligence tech to more military units. The company, founded in 2019 by veterans of the CIA and US intelligence community, joins the likes of Palantir, Anduril , and Scale AI as a major beneficiary of the US militaryâs embrace of artificial intelligenceânot only for physical technologies like drones and autonomous vehicles but also for software that is revolutionizing how the Pentagon collects, manages, and interprets data for warfare and surveillance. Though the US military has been developing computer vision models and similar AI tools, like those used in Project Maven , since 2017, the use of generative AIâtools that can engage in human-like conversation like those built by Vannevar Labsârepresent a newer frontier. The company applies existing large language models, including some from OpenAI and Microsoft, and some bespoke ones of its own to troves of open-source intelligence the company has been collecting since 2021. The scale at which this data is collected is hard to comprehend (and a large part of what sets Vannevarâs products apart): terabytes of data in 80 different languages are hoovered every day in 180 countries. The company says it is able to analyze social media profiles and breach firewalls in countries like China to get hard-to-access information; it also uses nonclassified data that is difficult to get online (gathered by human operatives on the ground), as well as reports from physical sensors that covertly monitor radio waves to detect illegal shipping activities. Vannevar then builds AI models to translate information, detect threats, and analyze political sentiment, with the results delivered through a chatbot interface thatâs not unlike ChatGPT. The aim is to provide customers with critical information on topics as varied as international fentanyl supply chains and Chinaâs efforts to secure rare earth minerals in the Philippines. âOur real focus as a company,â says Scott Philips, Vannevar Labsâ chief technology officer, is to âcollect data, make sense of that data, and help the US make good decisions.â That approach is particularly appealing to the US intelligence apparatus because for years the world has been awash in more data than human analysts can possibly interpretâa problem that contributed to the 2003 founding of Palantir, a company with a market value of over $200 billion and known for its powerful and controversial tools, including a database that helps Immigration and Customs Enforcement search for and track information on undocumented immigrants . In 2019, Vannevar saw an opportunity to use large language models, which were then new on the scene, as a novel solution to the data conundrum. The technology could enable AI not just to collect data but to actually talk through an analysis with someone interactively. Vannevarâs tools proved useful for the deployment in the Pacific, and Enzenauer and Lowdon say that while they were instructed to always double-check the AIâs work, they didnât find inaccuracies to be a significant issue. Enzenauer regularly used the tool to track any foreign news reports in which the unitâs exercises were mentioned and to perform sentiment analysis, detecting the emotions and opinions expressed in text. Judging whether a foreign news article reflects a threatening or friendly opinion toward the unit is a task that on previous deployments she had to do manually. âIt was mostly by handâresearching, translating, coding, and analyzing the data,â she says. âIt was definitely way more time-consuming than it was when using the AI.â Still, Enzenauer and Lowdon say there were hiccups, some of which would affect most digital tools: The ships had spotty internet connections much of the time, limiting how quickly the AI model could synthesize foreign intelligence, especially if it involved photos or video. With this first test completed, the unitâs commanding officer, Colonel Sean Dynan, said on a call with reporters in February that heavier use of generative AI was coming; this experiment was âthe tip of the iceberg.â This is indeed the direction that the entire US military is barreling toward at full speed. In December, the Pentagon said it will spend $100 million in the next two years on pilots specifically for generative AI applications. In addition to Vannevar, itâs also turning to Microsoft and Palantir, which are working together on AI models that would make use of classified data. (The US is of course not alone in this approach; notably, Israel has been using AI to sort through information and even generate lists of targets in its war in Gaza, a practice that has been widely criticized .) Perhaps unsurprisingly, plenty of people outside the Pentagon are warning about the potential risks of this plan, including Heidy Khlaaf, who is chief AI scientist at the AI Now Institute, a research organization, and has expertise in leading safety audits for AI-powered systems. She says this rush to incorporate generative AI into military decision-making ignores more foundational flaws of the technology: âWeâre already aware of how LLMs are highly inaccurate, especially in the context of safety-critical applications that require precision.â Khlaaf adds that even if humans are âdouble-checkingâ the work of AI, thereâs little reason to think theyâre capable of catching every mistake. ââHuman-in-the-loopâ is not always a meaningful mitigation,â she says. When an AI model relies on thousands of data points to come to conclusions, âIt wouldnât really be possible for a human to sift through that amount of information to determine if the AI output was erroneous.â One particular use case that concerns her is sentiment analysis, which she argues is âa highly subjective metric that even humans would struggle to appropriately assess based on media alone.â If AI perceives hostility toward US forces where a human analyst would notâor if the system misses hostility that is really thereâthe military could make an misinformed decision or escalate a situation unnecessarily. Sentiment analysis is indeed a task that AI has not perfected . Philips, the Vannevar CTO, says the company has built models specifically to judge whether an article is pro-US or not, but MIT Technology Review was not able to evaluate them. Chris Mouton, a senior engineer for RAND, recently tested how well-suited generative AI is for the task. He evaluated leading models, including OpenAIâs GPT-4 and an older version of GPT fine-tuned to do such intelligence work, on how accurately they flagged foreign content as propaganda compared with human experts. âItâs hard,â he says, noting that AI struggled to identify more subtle types of propaganda. But he adds that the models could still be useful in lots of other analysis tasks. Another limitation of Vannevarâs approach, Khlaaf says, is that the usefulness of open-source intelligence is debatable. Mouton says that open-source data can be âpretty extraordinary,â but Khlaaf points out that unlike classified intel gathered through reconnaissance or wiretaps, it is exposed to the open internetâmaking it far more susceptible to misinformation campaigns, bot networks, and deliberate manipulation, as the US Army has warned . For Mouton, the biggest open question now is whether these generative AI technologies will be simply one investigatory tool among many that analysts useâor whether theyâll produce the subjective analysis thatâs relied upon and trusted in decision-making. âThis is the central debate,â he says. What everyone agrees is that AI models are accessibleâyou can just ask them a question about complex pieces of intelligence, and theyâll respond in plain language. But itâs still in dispute what imperfections will be acceptable in the name of efficiency. Update: This story was updated to include additional context from Heidy Khlaaf.
đ¤ Generative AI is learning to spy for the US military
Back to all articles
HUGEAINEWS