Data extraction from text and images is a fundamental human action and is key to gathering information in any field. With the exponential increase in material being published on any given subject it is becoming increasingly hard for humans to read and extract data from every relevant source of information. Automating data extraction would save a tremendous amount of human resources and possibly result in more accurate and more extensive extraction, and thus the Seeker is looking for a general algorithm for automated data extraction from electronic documents including graphs and images.
This is a Reduction-to-Practice Challenge that requires written documentation, output from the data extraction algorithm, and submission of source code and executable.
Data extraction from texts and images is a fundamental human action. Anytime we read a book or newspaper we’re extracting data whether we know it or not. This extraction may be non-targeted, such as when reading an article on a new topic and placing various key points into memory, or targeted, such as when searching for the score of a particular sporting event. Beyond everyday reading, data extraction is a key part of gathering information for almost any endeavor and spans all fields of work. Investors scan news items for companies of interest and stock prices, scientists read publications to extract data relevant to their own studies, auto mechanics look for torque specification for tightening bolts, etc.. These are targeted data extractions wherein the person is looking for and extracting specific information from the content and the data elements to be extracted can be defined beforehand. Automation of this type of targeted data extraction would save a tremendous amount of human resources for organizations that depend on extracting data from published material, particularly considering the ever-increasing amount of such material available. The Seeker is interested in gathering and comparing the performance of different algorithms and methods for automated data extraction and will provide specific datasets and data elements to extract for this Challenge. The ideal solution will be a tool that could perform all meaningful and relevant information/data extraction from texts, graphs and images.
A submission to the Challenge should include the following:
The Challenge award is contingent upon theoretical evaluation of the method/algorithm by the Seeker, and validation by the Seeker of the submitted software/algorithm/package.
To receive an award, the Solvers will not have to transfer their exclusive IP rights to the Seeker. Instead, Solvers will grant to the Seeker a non-exclusive license to practice their solutions. The award(s) will be paid by ADAS under the procurement contract referenced in the ABOUT THE SEEKER section.
Submissions to this Challenge must be received by 11:59 PM on 10 July 2018 US Eastern Time (05:59 AM on 11 July 2018 European Central Time).
Late submissions will not be considered.
In line with the rights and obligations laid down in the Staff Regulations and CEOs deriving from their contract of employment with EFSA, EFSA staff shall seek permission prior engaging in the Challenge (outside activity) since receiving the award will be equivalent to accepting from other sources outside EFSA any honor, decoration, favor, gift or payment of any kind.
ABOUT THE SEEKER
EFSA is a decentralized agency of the European Union (EU) funded by the European Union that operates independently of the European legislative and executive institutions (European Commission, Council, and European Parliament) and EU Member States. EFSA contributes to the safety of the EU food chain by providing scientific advice to risk managers, by communicating on risks to the public, and by cooperating with Member States and other parties to deliver a coherent, trusted food safety system in the EU.
EFSA (www.efsa.europa.eu) commissioned a two year project in 2016 titled “OC/EFSA/AMU/2015/03: Crowdsourcing: engaging communities effectively in food and feed risk assessment”. This project was awarded to ADAS (www.adas.uk) and this current Challenge being run through the InnoCentive platform is being conducted via ADAS on behalf of EFSA as part of this project. ADAS are therefore an intermediary in this process and the ultimate seeker remains EFSA.
What is an RTP Challenge?
An InnoCentive RTP (Reduction to Practice) Challenge is a prototype that proves an idea, and is similar to an InnoCentive Theoretical Challenge in its high level of detail. However, an RTP requires the Solver to submit a validated solution, either in the form of original data or a physical sample. Also the Seeker is allowed to test the proposed solution. For details about treatment of IP rights, please see the Challenge-Specific Agreement.