
▲Performance comparison index between paraphrase recognition API developed by ETRI researchers and open source
ETRI, Korean Administrative Document QA and Paraphrase API Released
A domestic research team has developed a technology that can understand documents as intelligently as a human and find the information you want.
The Electronics and Telecommunications Research Institute (ETRI) announced that it has developed two APIs that answer users' questions from office documents and understand whether two sentences have the same meaning.
This technology is open to the public on ETRI’s public artificial intelligence open API/data service portal (https://aiopen.etri.re.kr/), so anyone can easily use it.
ETRI has developed a technology that not only allows users to search for desired information using artificial intelligence SW, but also confirms the answers and basis for their questions.
First, the administrative document question-and-answer (QA) API technology is a technology that recognizes paragraphs and tables using a deep learning language model to recognize correct answers and supporting sentences.
For example, if you enter a question like 'If the business trip expenses are 1 million won, how much should I get in approval?', you will get information about internal regulations such as 'If it is less than 1 million won, it must be approved by the manager.' It is a way to find the document and its supporting parts.
The accuracy of this technology was measured through blind evaluation by Hangul and Computer, a joint research institute.
As a result, the accuracy of the top five results for paragraph searches was 89.65%, and the accuracy of the search for tables was 81.5%, showing high accuracy.
Additionally, the Paraphrase Recognition API is a technology that reads documents as intelligently as a human and determines whether different forms of sentences have the same meaning.
This is a source technology that can be used in the development of Korean AI as well as the previously introduced administrative document QA API.
Unlike humans, artificial intelligence and deep learning technologies have a robustness problem in that they cannot correctly recognize semantic relationships even when sentences change slightly.
For example, the sentences 'He bought a red bicycle' and 'The bicycle he bought is red' are easy for people and machines to distinguish, but they have a hard time distinguishing them from the sentence 'He didn't buy a red bicycle'.
ETRI developed this technology to improve the robustness limits of deep learning technology and recognize semantic relationships in various types of sentences.
As a result of the evaluation on the robustness evaluation set, it showed 96.63% accuracy, which significantly improved performance compared to existing open source deep learning technologies.
The developed technology processes document formats based on the standard XML.
Currently, the service is only provided for Korean documents, but the development technology itself can be used universally for other documents such as Word and PDF.
Thanks to this, it is expected to be applied to various documents and fields such as internal regulations, manuals, and online announcements.
The research team found that office document formats are diverse and unstandardized, making artificial intelligenceAlthough it was difficult to apply the ability technology, they were able to achieve this result by building highly robust data and improving the performance of the algorithm to determine what the problem is.
In the future, we plan to develop a deep learning language model that simultaneously learns language understanding and generation in response to GPT-3, and to disclose related technologies to advance AI technology and contribute to platform development.
ETRI Language Intelligence Lab's Dr. Lim Jun-ho said, "We hope that this technology will further activate the Korean AI service market, prevent foreign AI solutions from encroaching on the domestic market, and help the public acquire useful knowledge information easily and quickly."