The United States Patent and Trademark Office’s (USPTO) Office of the Chief Economist (OCE) on July 1 released the Artificial Intelligence Patent Dataset (AIPD)—identifying which of the 13.2 million U.S. patents and pre-grant publications include artificial intelligence (AI)—to help enable researchers, policymakers, and the public explore the impacts of AI on invention.
The OCE constructed the AIPD using machine learning models for eight AI component technologies.
The AIPD consists of two data files. The first data file identifies U.S. patents issued between 1976 and 2020 and pre-grant publications (PGPubs) published through 2020 that contain one or more of several AI technology components (including machine learning, natural language processing, computer vision, speech, knowledge processing, AI hardware, evolutionary computation, and planning and control). OCE generated this data file using a machine learning (ML) approach that analyzed patent text and citations to identify AI in U.S. patent documents. The second data file contains the patent documents used to train the ML models.
A new working paper highlights the machine learning approach used to generate the dataset, which achieved superior performance compared to existing alternatives.
This effort was made possible through cross business unit collaboration among OCE, the Office of Policy and International Affairs, the Patents Business Unit, and the Office of the Chief Information Officer.
Read the working paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3866793