An open source framework for metadata exploration and discovery of Polar Data
This project would deliver an open source framework for metadata exploration, automatic text mining and information retrieval of polar data that uses the Apache Tika technology. Apache Tika is currently the de facto 'babel fish', aiding in the automatic MIME detection, text extraction, and metadata classification of over 1200 data formats. The PI would expand Tika to handle polar data and scientific data formats, making Polar data more easily available, searchable, and retrievable by all major content management systems. This activity would lay the framework for a thorough automatically generated inventory of polar metadata and data. Expanding Tika to handle polar data would also naturally invite the technology/open source community to deal with polar use cases, helping to increase understanding of the arctic. The resultant software produced through effort would be disseminated to the software and polar communities through the Apache Software Foundation. A computer science graduate student and postdoc will be exposed to Cryosphere and Arctic data, helping to train the next generation of cross disciplinary data scientists in the domain. The PI's Search Engines (20-40 students annual enrollment) and Software Architecture (30-50 students annual enrollment) graduate courses at USC would benefit from the Arctic cyberinfrastructure use cases disseminated through course projects and lecture material. The PI would also work collaboratively with NSF-funded projects dealing with projects focusing on the archiving, discovery and access of polar data, such as ACADIS and the Antarctic Master Directory.