You cannot escape the hype about Artificial Intelligence and Machine Learning. Indeed, advances in computing power and specialized processors have enabled machines to outperform humans in many tasks. This puts also Natural Language Processing back on the radar. After all humans use language to interact and to gain, store, and communicate knowledge. Still, machines have a hard time to deal with the variety, richness, and ambiguity of language. A bigger problem, though, is the availability of high quality data. Today’s AI approaches require mountains of it. This need can be reduced by several orders of magnitude by adding multilingual knowledge to the equation. Combined with Coreon, AI and ML algorithms can also do wonders for companies with lesser data and power than Google or Amazon.
Why Machine Learning still Needs Humans for Language

Machine Learning (ML) begins to outperform humans in many tasks which seemingly require intelligence. The hype about ML makes it even into mass media. ML can read lips, recognizes faces, or transform speech to text. But when ML has to deal with the ambiguity, variety and richness of language, when it has to understand text or extract knowledge, ML continues to need human experts.

ML is only as good as its available relevant training material. For many tasks mountains of data are needed. And the data better be of supreme quality. For language related tasks these mountains of data are often required per language and per domain.

It is the humanly curated Multilingual Knowledge System that enables ML and Artificial Intelligence solutions to work for specific domains with only small amounts of textual data and also for less resourced languages.

Why ML needs Humans

CORDIS Golden Taxonomy

In a recent call for tender by the EU Publications Office for its Community R&D Information Service (CORDIS) the institution explained the central role of its Golden Taxonomy. The taxonomy is used to automatically classify the subjects of EU-funded scientific projects stored in CORDIS. Whoever knows the incredible broadness and depth of FP7 and H2020 projects will quickly grasp how important such effort is for the wider leverage of EU funded research.

The taxonomy was built in an iterative process. Each step involves a specific natural language processing (NLP) technique to progressively enrich the taxonomy with new categories, more coherent relations, and proper association of categories with their relevant keywords. First a taxonomy framework is created by merging existing taxonomies. Next new generic categories are detected by clustering projects. Then keyword indexation identifies the relevant keywords, both at document and corpus levels. Finally, automatic consolidation and classification validation detects any missing categories and suggests new ones and their possible path in the taxonomy.

One of these existing taxonomies is EuroVoc, the Publications Office’s taxonomy in all official EU language. You can explore it for free in Coreon.

Coreon API: Multilingual Knowledge Powering Your Application

Machine Learning, semantic search, auto-classification, computer aided translation, technical authoring, or just a smart lookup widget in your intranet portal – through the API your valuable Coreon assets are available at a user‘s fingertip, always up-to-date!

Which functions are available? Their parameters and uses? – The Coreon API is fully documented. Samples and a tutorial empowers software developers with all technical information to build impressive solutions. Download at the Coreon resources page

