Available Technology

A Generic, Extensible, Configurable Push Pull Framework for Large Scale Science Missions

The CAS Crawling framework supports many of the Nutch crawler's generic services, including metadata extraction, crawling, and ingestion, however, one service that was not ported over from Nutch is a generic protocol layer service that allows the Nutch crawler to obtain content using protocol plug-ins that download content using implementations of remote protocols such as HTTP, FTP, WinNT file system, HTTPS, etc. Such a generic protocol layer would greatly aid the CAS Crawling Framework, as the layer would allow the crawling framework to generically obtain content (i.e., data products) from remote sites using protocols such as FTP, and others.
U.S. Government Purpose Release
