Data extraction from text is a fundamental human action and is key to gathering information in any field. With the exponential increase in material being published on any given subject it is becoming increasingly hard for humans to read and extract data from every relevant source of information. Automating data extraction would save a tremendous amount of human resources and possibly result in more accurate and more extensive extraction.
Data extraction relies on a proper identification of objects and concepts in the documents, therefore a very important sub-task of data extraction is to find and classify names in text, also known as Named Entity Recognition.
Last year, the European Food Safety Authority (EFSA) ran a Challenge seeking a general algorithm for automated data extraction from text, graphs, and images in electronic documents. The primary outcome of that Challenge was the realization that general automated data extraction was an extremely difficult task and far beyond the scope of a single Challenge. As a result, EFSA is running the current Challenge focused on Named Entity Recognition (NER) in text documents and is offering a total award pool of $60,000 with at least $40,000 for the winning algorithm.
This is a Reduction-to-Practice Challenge that requires written documentation, output from the data extraction algorithm, and, upon request, submission of source code and executable.
Data extraction from texts is a fundamental human action. Anytime we read a book or newspaper we’re extracting data whether we realize it or not. Beyond everyday reading, data extraction is a key part of gathering information for almost any endeavor and spans all fields of work. Automation of data extraction would save a tremendous amount of human resources for organizations that depend on extracting data from published material, particularly considering the ever-increasing amount of such material available.
Some of the first researchers working to extract information from unstructured texts recognized the importance of “units of information”, like names (such as person, organization, and location names): data extraction relies on a proper identification and classification of entities in the documents. The task of identifying names of organizations, people and geographic locations in text was termed Named Entity Recognition (NER) in the nineties and since then there has been increasing interest in NER.
Last year EFSA ran a Challenge seeking a general algorithm for automated data extraction from text, graphs, and images in electronic documents. The primary outcome of that Challenge was the realization that general automated data extraction was an extremely difficult task and far beyond the scope of a single Challenge.
Named entity recognition (NER) is a crucial step towards information extraction, therefore for the current Challenge EFSA is interested in obtaining a tool to aid in data extraction from textual material with a focus on Named Entity Recognition (NER) or similar approaches. EFSA will provide specific datasets and entities (bioconcepts) to identify and classify for this Challenge.
A submission to the Challenge should include the following:
The Challenge award is contingent upon theoretical evaluation of the method/algorithm by the Seeker, and validation by the Seeker of the submitted software/algorithm/package.
To receive an award, the Solvers will not have to transfer their exclusive IP rights to the Seeker. Instead, Solvers will grant to the Seeker a non-exclusive license to practice their solutions. The award(s) will be paid by ADAS under the procurement contract referenced in the ABOUT THE SEEKER section.
Submissions to this Challenge must be received by 11:59 PM on 02 Sept 2019 US Eastern Time (05:59 AM on 03 Sept 2019 European Central Time.) Late submissions will not be considered.
In line with the rights and obligations laid down in the Staff Regulations and CEOs deriving from their contract of employment with EFSA, EFSA staff shall seek permission prior engaging in the Challenge (outside activity) since receiving the award will be equivalent to accepting from other sources outside EFSA any honor, decoration, favor, gift or payment of any kind.
ABOUT THE SEEKER
EFSA is a decentralized agency of the European Union (EU) funded by the European Union that operates independently of the European legislative and executive institutions (European Commission, Council, and European Parliament) and EU Member States. EFSA contributes to the safety of the EU food chain by providing scientific advice to risk managers, by communicating on risks to the public, and by cooperating with Member States and other parties to deliver a coherent, trusted food safety system in the EU.
EFSA (http://www.efsa.europa.eu/) commissioned a project in 2016 titled “OC/EFSA/AMU/2015/03: Crowdsourcing: engaging communities effectively in food and feed risk assessment”. This project was awarded to ADAS (http://www.adas.uk/) and this current Challenge being run through the InnoCentive platform is being conducted via ADAS on behalf of EFSA as part of this project. ADAS are therefore an intermediary in this process and the ultimate seeker remains EFSA.
What is an RTP Challenge?
An InnoCentive RTP (Reduction to Practice) Challenge is a prototype that proves an idea, and is similar to an InnoCentive Theoretical Challenge in its high level of detail. However, an RTP requires the Solver to submit a validated solution, either in the form of original data or a physical sample. Also the Seeker is allowed to test the proposed solution. For details about treatment of IP rights, please see the Challenge-Specific Agreement.