Categorization

This classifier is a specialized multiclass text classifier that categorizes media posts based on the IPTC 17 Media Topic NewsCodes for the Media extracted from the given text. “other” label will be displayed when it has multiple topics or not enough information. The output would be the following 12 labels, including ITPC combined NewsCodes topics.

  • Entertainment -> arts, culture, and entertainment
  • Conflicts -> conflicts, war, and peace
  • Crime -> crime, law, and justice
  • Disaster -> disaster, accident, and emergency incident
  • Business -> economy, business, and finance
  • Health -> health
  • Lifestyle -> lifestyle and leisure
  • Politics -> politics, religion, and labor
  • Technology -> science and technology, environment
  • Society -> society and education
  • Sport -> sport
  • Other -> multiple topics, short text, or not enough information

Statistics

TypeSpeedPartner Type
Post-Processing ClassifierInstantDatastreamer Internal

Example Use Cases

  • Marketing and security companies can use a category classifier in combination with hard news classifiers to identify and track recent news about organizations or people.
  • Media companies can aggregate technology news based on categories and published dates.

Compatible Data Sources

As a Post-Processing operation, it can be run on any data source.

📘

Recipe Available

View the below recipe to see using post-processing, and easily view how to integrate it into your own data pipeline.

Usage

This Operation allows a user to specify the destination field, source fields, and separator.

{
    "query": {
		...
},
    "operations": [
        {
            "name": "category",
            "destination_path": "operations.category",
            "parameters": {
                "language": "enrichment.language",
                "main": "content.body"
            }
        }
    ]
}
```