Propublica is a nonprofit newsroom that investigates power abuse. This story originally appeared in Dispatches Newsletter. Sign up to receive notes from journalists.
In February, my colleague Ken Schwencke saw a post about a social media network about a database released by Senator Ted Cruz. Senator TedCruz has shown that he will show more than 3,400 “awakening” grants awarded by the National Science Foundation “promoting diversity, equity, recognition (DEI) or advanced neomarkist class Warfare Propaganda.”
Given that Schwencke is a senior editor of the data and news app, he downloaded the data, mocked and saw several grants that Texas Republican Cruz appears to be far from what he calls “The Radical Left’s Woke Nonsense.” The grant included what Schwenke considered a “very cool sound project” in the development of advanced mirror coatings for gravitational wave detectors at his alma mater, the University of Florida.
However, the grant description stated that the project “promotes education and diversity, provides research opportunities for students at various educational levels, and promotes women’s participation and underrepresented minorities.”
Schwencke thought it would be interesting to run the data via the AI leading language model (one of the things that power ChatGpt) to understand the types of grants that created the list of Cruz and why the flags were created. He realized there was a story of accountability to explain.
In that article, the song by Agnel Philip and Lisa found that “Cruz’s Dragget wiped out many examples of scientific projects funded by the Social Science Foundation, simply acknowledging social inequality or is completely irrelevant to the social or economic themes cited by his committee.”
What we see
During Donald Trump’s second presidency, Propovica will focus on areas that need scrutiny. Below are some of the issues reporters watch, and how to safely communicate with them.
We are doing something new. Helpful?
Among them is a $470,000 grant to study the evolution of mint plants and how they spread across the continent. This project was flagged for two specific words used in applications to NSF, as Philip and Song can convey. “Diversify,” referring to plant biodiversity, “Women” looked at how the project supports young female scientists on the research team.
Another included the development of devices that could treat severe bleeding. Like the victim of gunshots, it included the word “victim” and “trauma.”
Neither Cruz’s Office nor spokesman for Republicans on Commerce, Science and Transportation responded to requests for comment on the article.
The story was a great example of how artificial intelligence could help reporters analyze large amounts of data and try to identify patterns.
First, I told the AI model to mimic the research journalists reading each of these grants and to identify whether they contained themes that people looking for “Wokeness” might have found. And importantly, we made sure to instruct the model not to speculate whether they were unsure. (AI models are known to be hallucinating, and we wanted to protect them.)
For a new newsroom for AI and readers who are interested in how this actually worked, here is an excerpt from the actual prompts used:
Background: We present grants from the National Science Foundation that were eligible for cancellation as Republican Sen. Ted Cruz’s office includes themes that include alarm ideology. Diversity, equity, and inclusion; or Marxist ideology. We consider analyzing the themes of award descriptions on this list to determine what has “awakening” or that has conditions or themes related to diversity, equity, and inclusion (DEI). Determine if the text contains these themes and tell us what you find. Information is extracted only if the NSF grant contains the requested information.
–
As an investigative journalist, I’m looking for the following information
–
Woke_description: A short explanation (maximum paragraph) of why this grant is chosen to promote “awakening” ideology, diversity, equity, inclusion (DEI) or advanced neo-marxistist class war propaganda. If you are unsure, leave this blank.
Why_Flagged: Look at the “Status”, “Social Justice Categories”, “Racism Categories”, “Gender Categories”, and “Environmental Justice Categories” fields. If it is filled in, it means that the authors of this document believed the grants were so promoting deideological dementia. Analyze the Award Description field to see if the author understands why he flags it like this. Win one explanation and award for each type and write it in a thorough and easy to understand way.
Citation_for_flag: “why_flagged” Extracts very concise text quoting the passage of the “award description” that backs up data.
Of course, members of our staff reviewed and confirmed all the details before we published our stories. We called all nominated people and agencies seeking comment.
Philip, one of the journalists who wrote the above questions and stories, is also excited about the potential new technology that potential new technology holds, as is the case with the entire newsroom.
“The technology has a lot of promises in the lead generation and is pointing us in the right direction,” he told me. “But in my experience it still requires a lot of human supervision and review. When used correctly, it can really speed up the process of understanding a large set of information. Reading creatively and critically at the prompt can help make it clear that you’re not thinking.”
This is just the latest initiative for Propublica to experiment with using AI to make our work faster, and uses it responsibly in a way that supports human journalists.
In 2023, our partnership with the local reporting network partner, Salt Lake Tribune, helped us to use AI to uncover patterns of sexual misconduct among mental health professionals disciplined by Utah licensing agencies. This investigation relied on a large collection of disciplinary reports covering a wide range of potential violations.
To narrow down the types of cases they were interested, they urged AI to review the documents and identify documents related to sexual misconduct. To help the bot do its work, we have given confirmed cases of sexual misconduct that we already know well, as well as examples of specific keywords to look for. Each result was then reviewed by two reporters and used license records to confirm that it was correctly classified.
Additionally, during a report on the 2022 school shooting in Ubarde, Texas, Propublica, and the Texas Tribune, acquired a pile of unpublished raw materials collected during the state’s investigation. This included hundreds of hours of audio and video recording, which was difficult to sift. The footage was not organized and not clearly labelled, some of which were very graphic and intrusive for journalists to see.
Washington Blues: Government Town Faces a Pessimistic Future
Materials were safely transcribed and classified using self-hosted open source AI software. This allowed reporters to match related files and rebuild the events for the day.
We know well that AI isn’t replicating the very time-consuming work we do. Our journalists write takeout at the top of our stories, our newsletters, our headlines and longer stories. We also know there is a lot about AI that needs to be investigated, such as the companies that sell their products, how they train them, the risks they have.
But for us, it could also be used as one of many reporting tools that can help us to research AI creatively and pursue stories that will help us understand the powers that shape our world.