Leverage term store taxonomy when creating an extractor in Microsoft Syntex

Applies to:   ✓ Unstructured document processing



When you create an extractor in your unstructured document processing model using Microsoft Syntex, you can take advantage of global term sets in the term store to display preferred terms for data that you extract.

As an example, your model identifies and classifies all Contract documents that are uploaded to the document library. Additionally, the model also extracts a Contract Service value from each contract, and will display it in a column in your library view. Among the various Contract Services values in the contracts, there are several older values that your company no longer uses and have been renamed. For example, all references to the terms Design, Graphics, or Topography contract services should now be called Creative. Whenever your model extracts one of the outdated terms from a contract document, you want it to display the current term—Creative—in your library view. In the following example, while training the model we see that one sample document contains the outdated term of Design.

Term store.

Use a managed metadata column in your extractor

Term sets are configured in the Managed Metadata services (MMS) term store in the SharePoint admin center. In the example below, the Contract Services term set is configured to include several terms, including Creative. The details for it show that the term has three synonyms (Design, Graphics, and Topography) and the synonyms should be translated to Creative.

Term set.

There could be many reasons why you might want to use a synonym in your term set. For example, there could be outdated terms, renamed terms, or variations between your organizations departments on naming.

To make the managed metadata field available to select when you create your extractor in your model, you need to add it as a managed-metadata site column. After you add the site column, you can select it when you create the extractor for your model.

Contract service.

After applying your model to the document library, when documents are uploaded to library, the Creative Services column will display the preferred term (Creative) when the extractor finds any of the synonym values (Design, Graphics, and Topography).

Contract service column.

Note

If the term set is open, then any extracted values that do not match a preferred term or synonym value will be added as a new term to the root of the term set. These new terms can be moved, merged, or made synonyms in the term store where the term set resides.

See also

Introduction to managed metadata

Create an extractor

Create a managed metadata column