Scientists from the Army’s corporate research laboratory have developed a unique framework for a semantic search tool that enables faster, more targeted scientific research, speeding up the process of efficiently finding needed information to aid in the fight against COVID-19.
The framework is also positioned to one day aid soldiers on the battlefield by facilitating a way to query systems using natural language, as opposed to having to learn how to query/interact with the system to achieve their goals.
For example, the semantic search tool could ultimately be deployed in the operational environment to translate commanders’ information requirements from one echelon to another and expedite searching through the huge volume of soldiers’ reports coming in from disparate information sources to address those needs. While the vocabulary and context of soldiers’ reports at different echelons will vary, semantic search enables the underlying meaning of those reports to be understood and retrieved as relevant to the translated commanders’ requirements.
The framework, called InfoForager, is the first of its kind and was designed over the course of six weeks as a collaborative effort among researchers at the U.S. Army Combat Capabilities Development Command, now known as DEVCOM, Army Research Laboratory.
“With the outbreak of the novel coronavirus in the spring of 2020 and the call for Army researchers to support our fight against the virus, four DEVCOM ARL natural language processing researchers quickly innovated and operationalized existing research on both semantic representation and information extraction to speed up the research of fellow ARL scientists in the Battlefield Environment Division, advancing their progress in ultraviolet inactivation of the virus,” said ARL researcher Dr. Claire Bonial.
Virology research shows that UV light can inactivate viruses under certain conditions by making them unable to infect cells, thereby reducing the transmission of viral diseases. While a wealth of literature pertaining to UV inactivation of viruses exists, searching the literature for trustworthy and relevant information can be inefficient and difficult.
Efficiently finding needed information may be critical in discovering improved methods, such as the use of germicidal UV, to reduce the transmission of COVID-19 and other diseases, Bonial said.
“Collaborating with the Battlefield Environment Division expert users, we designed a framework for a semantic search tool called InfoForager, which enables faster, more targeted scientific research,” Bonial said. “The framework that was prototyped as part of a six-week research sprint demonstrated, in a preliminary user study, that our approach with InfoForager allowed study participants to find answers to scientific queries more quickly and accurately than comparison systems using keyword matching, including PubMed search.”
In order to automatically sift through a mass of documents to find relevant answers to specific and targeted questions, InfoForager first analyzes a user research question into abstract meaning representation (AMR), which captures not only the words in the questions, but the semantic relationships between them in a directed, acyclic graph format.
Unlike many existing search systems that are limited to keyword search over front matter such as article titles and abstracts, InfoForager can take as input full natural language questions and understand the underlying meaning of those queries in attempting to find answers. For example: What is the concentration of the virus in saliva?
InfoForager then compares the AMR graph of the question to a collection of medical research papers already analyzed into AMR graphs. All AMR query-sentence graph pairs in each paper are scored for their similarity, and InfoForager returns the highest-ranking answer sentence and the source document.
In assessing this semantic approach to search, the team’s user study supported their hypothesis that the user will more easily search through medical documents when they do not need to rephrase their questions, for example, into keywords, to conform to the system’s search limitations and capabilities, Bonial said.
“Given the observations of user studies and an initial evaluation of the prototype framework, we found support for our working hypothesis that the InfoForager framework facilitates a semantic approach to search that allows for natural language questions, and finds answers matching on content, going beyond just keywords, to include the semantic relations between those keywords,” Bonial said.
This research goes beyond the current paradigm of question-answering, leading the researchers to call their approach info-foraging instead, given that their expert users are not searching for a single, right answer to each of their questions, Bonial said.
Researchers forage for different pieces of evidence from different areas of scientific research articles. The evidence must be weighed with respect to the particular research methods used in gathering it, as well as other factors in describing the results, such as the evolving scope and varied definitions of scientific terms across technical subfields, she said.
InfoForager offers a unique way to interact with a system to carry out scientific research, starting with natural language questions, as opposed to keyword searches where users have to adapt their research questions to a search system’s capabilities, Bonial said.
The researchers also identified facts about COVID-19 now more widely known: Testing saliva is more effective than nasopharyngeal swabs, given that the concentration of the virus is higher in saliva.
“Within six weeks, we pivoted our research program and operationalized semantic and information extraction research into a proof-of-concept framework for semantic search that allows researchers to find targeted answers to their questions in scientific research papers more quickly and efficiently,” Bonial said. “While the current framework has been designed to help researchers in the urgent battle against COVID-19, we demonstrated that we can supply core capabilities that can be dynamically adapted from one problem domain to another.”
This research can be operationalized to help future Army researchers address dynamically changing and urgent questions; however, it can also be pivoted to directly benefit future Soldiers within other types of information interaction, such as command and control processes, by facilitating a way in which people can query systems using natural language as opposed to placing the burden on the user to learn how to query/interact with the system to achieve their goals, Bonial said.
This project also demonstrated that one aspect of the team’s previous approach, using an existing algorithm called Smatch for graph matching, does not allow them to take full advantage of the semantic structure that AMR offers in this task—namely pinpointing the concept sought in an answer and its direct semantic relations to other concepts.
This research informs their future directions focusing on other types of semantic matching and beginning to leverage ontological resources to find paraphrased alternatives for words in questions, for example, bodily fluids, that constitute good answers like saliva or effluvia.
In their next steps, the research team is maintaining the promising aspects of their semantic search approach, but exploring new ways of doing semantic match to find answers and/or sentences that address a particular query or information requirement.
“We are broadening our research beyond UV inactivation of COVID-19 to explore how narratives surrounding COVID-19 scientific claims change over time, and the extent to which this may lead to discovering some narratives that lack primary scientific evidence and are potentially misinformation or disinformation,” Bonial said. “Additionally, we are exploring how semantic search could be deployed in command and control for “translating” information needs from one echelon to another and reporting up the chain as information comes in.”
The research team includes Bonial, fellow ARL Content Understanding Branch researchers Drs. Clare Voss and Stephanie Lukin, former ARL researcher Dr. Stephen Tratz, and ARL Battlefield Environment Division researchers Drs. David Doughty and Steven Hill, all from the lab’s Computational and Information Sciences Directorate.
Read more: https://www.army.mil/article/241833