Automate Exports Of Custom Data Into Another Solution

Imagine you want to regularly export selected information from your Multilingual Knowledge System into another solution, such as an authoring tool or a search engine. As a live connection to your repository via the Coreon APIs is not an option, you would have to export the data and import it into the target system, like a mirror.

This entails four specific requirements that all have to be covered. In this post, we will outline how various functionalities and components can be leveraged to get there:

1. Information Selection: You do not want to export ‘everything’, but only selected information such as specific languages, properties, or concepts.

2. Repeatability: You do not want to configure or design all the settings again and again. Rather, you want to define them once and reuse next time.

3. Optimization for the Targeted Solution: The ‘receiving’ solution may have some specific conditions that the process should take into account.

4. Automation: Instead of triggering the process manually once a day, you want a daily ‘auto-update’ without any activity on your side.

1. Information Selection

Often you won’t want to export a whole repository, with all its concepts, languages, relations and history log. Coreon provides two mechanisms that enable only ‘needed’ information to be selected and mirrored into the targeted solution:

Filtering concepts for exportation

A so-called Filter applies one or more rules to select concepts and/or terms via criteria you specify. Such a filter should be given a clearly recognizable name (so that you can refer to it later) and contain one or more rules. Some typical examples for filters might be:

‘All concepts with at least one definition’: To make sure that at least one definition has been written. Could be useful when exporting concepts for a glossary.
‘All concepts where English and German terms exist’: To make sure that a translation in German exists. Could be useful when exporting concepts for a translation tool.
‘All concepts that are approved’: To make sure that only reviewed and signed-off concepts are exported. Could be useful when exporting concepts for a publication.

You can use a filter when working with Coreon in the UI, but such a filter can also be applied when exporting data.

Exporting specific languages or properties only

The above-mentioned filters help you to select which concepts or which terms you wish to export. Now, instead of exporting all the information contained in a concept (that has already passed through your filter), you want to select which languages or properties should be written into the export. For instance:

Languages: While your concepts may contain terms in up to 30 languages, you only want to pick specific ones. E.g., English, German, Spanish.
Properties: While your concepts and terms are elaborated with many details, only certain properties are relevant for the target application. E.g., Definition, Term Type, Usage Status.
Relations: Your target application may not be able to cater for semantic information stored in the knowledge graph. Therefore you would deselect Broader-Narrower Relations and Associative Relations.

Export What Content to Export — Applying a filter and selecting which values to export

2. Repeatability: Setting up a Reusable Template

Filtering and export selection allow you to fine-tune which data is selected and which is not. When working with real repositories, this may already mean quite a large set of selections and settings. You do not want to have to select these again and again for each export.

To make your life easier, so-called Export Templates can store all relevant settings for an export, so that you can easily repeat it using the same conditions and settings. A template should be given a clearly recognizable name (so that you can refer to it later) and contain the following settings:

Export method: Coreon XML, Coreon Spreadsheet, ISO TBX, or W3C RDF.
General settings: Export filename; plain text file or compressed into a zip archive, etc.
Filters to be applied (see Section 1 above).
Selectors to be applied (see Section 1 above).
Transformation Chain (see Section 3 below): Should the generated export file be further transformed into another text or PDF format?

Through your filters, property selections, and the further settings also stored in a template, repeating an export job is easy. Instead of selecting and adjusting what to export again and again, you can simply start an export job via a template knowing that the generated file contains the information that is required.

Export Templates list — Reusable, custom “templates” to apply a large set of settings

3. Optimization Of Your Export: Adapting Formats and Structures

With the tools above you can comfortably generate targeted export files in the built-in export formats that Coreon supports, such as Coreon XML, Coreon Spreadsheet, ISO TBX, and W3C RDF. However, even if the targeted solution can read the format you are exporting in (so-called syntactic interoperability), it may be that the values from your repository still do not match some constraints of the targeted solution.

Let’s look at an example. In your repository model you have defined the property of a term. You named it Usage Status and gave it the values Preferred | Allowed | Forbidden. However, the target application for the export (in this example, being done in the ISO TBX format) has a fixed set of data fields and expects such a field to not be named Usage Status but always and only Usage.

That means even when Coreon and the target application both nicely support ISO TBX, the export file needs be changed. To achieve this so-called semantic interoperability, an export process can apply one or more custom transformations. In our example, the change would be to replace/rename one or more property keys, as well as their values.

Export Extensions xslt list — Typical list of transformations: reformat the generated export files

Such a custom extension (via XSLT) can even be chained: you can apply a transformation on top of another transformation. Of course, the settings of such a transformation are also stored in the export template.

With these mechanisms all in place — which content to filter down to, with which settings, and with suitable customizations and formatting for a direct import into the target application — we are almost ready to export.

4. Automation: Trigger from a Script

What is missing? Well, even when all the above information is stored in a template that enables the running of perfectly customized exports, there is one final step required. Namely, to click the template and start the export. This can be automated.

Via the Coreon API, you can trigger the start of an export job by instructing it to take a given template (by specifying the template’s ID) into account. For instance, this triggering could be part of a python script (as we have developed for a customer only recently). Such a script would run nightly and generate the most recent, targeted export file every 24 hours. If the target application supports scripted imports, the newly generated file would be immediately picked up and imported.

Summing Up

So, to achieve an automated, custom export of all or parts of a repository, we’ve outlined four mechanisms that complement each other. 1) Defining data via filtering and export selection, 2) Using templates to repeat detailed export configurations, 3) Using XSLT transformations to fine-tune the generated format, and 4) Using a script to leverage the Coreon API so that you can automate the whole process without any tedious manual intervention.

1. Information Selection

Filtering concepts for exportation

Exporting specific languages or properties only

2. Repeatability: Setting up a Reusable Template

3. Optimization Of Your Export: Adapting Formats and Structures

4. Automation: Trigger from a Script

Summing Up

Michael Wetzel

Related Posts

How to Save and Transfer Repository Configuration

Maintaining Concept Maps: A Time-Saver For Terminologists

Add Rich Formatting With Markdown Syntax