Almost everybody thinks that simply translating means they can operate in another country. Many even believe that letting a machine do it is good enough. Senior officials in the European Union are among them, arguing this more out of avoiding the problem of cross-border interoperability than addressing it. In fact, recent events show we’re not very successful in extracting information from unstructured data, not even monolingually.
Structuring Unstructured Information
A text, such as this one, is an unstructured information resource. However, the text must first be found and it is more valuable if combined and understood together with more such texts. Only then can conclusions be made and actions taken based on the information inside the texts. This has been done for centuries — by lawyers and courts in countries that work with case based law, for example. The multilingual angle of texts has become increasingly important due to international trade, globalization, social media, and in Europe specifically due to the central organization of the European Commission governing so many diverse countries.
Everyone knows from painful searches that a smart organization of filed documents is key for finding anything back. The best strategy for retrieving information was the library hierarchy and its terms used to organize the resources. This proven approach has been neglected in the digital age, because users were told search, folders, and titles will do the job. Today, most organizations barely know or care whether terms are used correctly in authoring or in translation. Therefore their texts are no longer useful when queried to support decision making. This is even more evident cross-border, when imperfect translation, especially when automated, not only loses terms but also introduces errors.
Interoperability by Machine Learning or Linked Open Data?
Eager research and software companies will say: never mind, Big Data methods using Machine Learning or Linked Open Data will come to the rescue. Indeed, both are great for gisting, but they fail to function reliably beyond that point.
Imagine asking them what 1 + 1 is. Machine Learning will tell you the answer is in the range of 1.5 to 2.5. Linked Open Data will tell you that the answer might be in the following 42 documents.
Ensuring Cross-border Interoperability
We seem to have forgotten that library science, such as classification systems, taxonomy hierarchies, and thesauri, are the core for reuse of textual data. When these knowledge resources are multilingual they become a Multilingual Knowledge System. An MKS can extract insights of texts in and across multiple languages.
I am not saying terminology is the answer – terms are mostly flat, unrelated and compiled for translation support. Instead we need a structure to give us the context and to be able to drill through a concept map to find relationships. The term resources are rather an asset that can be levitated to become a knowledge structure.
Multilingual Knowledge Systems provide cross-border data processing possibilities, often called semantic or information interoperability. Actually an MKS is the only possible path to achieve cross-border interoperability. They retrieve the needle in the haystack – in all languages. They make it possible to pinpoint the units sought after, while linking to all information related to that unit. If an MKS is in place to support Big Data and Linked Open Data, these technologies will also efficiently support cross-border interoperability!