Bring Your Own Data

Add your own data source, combine and run operations on the data.

Datastreamer unlocks the ability to seamlessly and quickly add new data sources into your data pipeline. Run transformation and AI predictions on your data sources.

You can also follow the Bring Your Own Data receipt.

Schema Mapping and Operations

The first step is to create a JSON Schema Mapping and define the Operations (optional). The Datastreamer classifiers and other sources are mapped to use the official Datastreamer's unified schema. While not required, taking advantage of the other features and functionality of the Datastreamer platform is recommended.

πŸ“˜

Datastreamer Unified schema

You can use the following Google Doc as a metadata field template to help develop your schema.

Datastreamer Unified Schema

To create the schema, you need to specify the source metadata field (source_path), destination metadata fields (destination_path), and data type (string, date, etc.). Here is an example of a schema that shows a schema named sentiment_demo having field mapping. The source_path from the original gets mapped to destination_path i.e., the Datastreamer data schema along with a data type.

Please visit the Operations section for more details About Operations.

{
    "name": "sentiment_demo",
    "mappings": [
        {
            "source_path": "id", "destination_path": "id",
            "type": "string"
        },
        {
            "source_path": "published", "destination_path": "doc_date",
            "type": "date"
        },
        {
            "source_path": "name", "destination_path": "author.name",
            "type": "string"
        },
        {
            "source_path": "text", "destination_path": "content.body",
            "type": "string"
        }
    ],
    "operations": [
        {
            "name": "sentiment",
            "type": "classifier",
            "stage": "dataflow",
            "destination_path": "enrichment.sentiment",
            "parameters": {
                "main": "content.body"
            }
        }
    ]
}


Submitting a Schema

The next step is to create the schema using the Create Schema API. Also, The Schema API provides other operations to validate, modify, or delete a schema.

Push the data

The Documents API offers a seamless way to connect your streaming data source, simplifying the process significantly. In the following example, there are 2 JSON documents matching the source JSON schema map defined in the previous step.

πŸ“˜

Your Data is Private

The data submited can only be accesible by your Datastreamer ApiKey.

[
  {
    "id": "1",
    "published": "2023-04-04T16:00:00Z",
    "text": "I really like the new design of your website!",
    "name": "Bob"
  },
  {
    "id": "2",
    "text": "The new design is awful!",
    "published": "2023-04-05T16:00:00Z",
    "name": "Steve"
  }
]

Use the Post Documents API to send the JSON content.

Later on, the content will be transformed and enriched as defined by the schema definition in the section above by the pipeline. The transformation will modify the content based on the source and destination path and execute the sentiment operation using the field value from content.body. Below is the result of the transformation and sentiment operation.

[
	{
    "data_source": "datastream_sentiment_demo",
    "id": "1",
    "doc_date": "2023-04-04T16:00:00+00:00",
    "author": {
      "name": "Bob"
    },
    "content": {
      "body": "I really like the new design of your website!"
    },
    "enrichment": {
      "sentiment": {
        "label": "positive",
        "confidence": 0.9996
      }
    }
  },
  {
    "data_source": "datastream_sentiment_demo",
    "id": "2",
    "doc_date": "2023-04-05T16:00:00+00:00",
    "author": {
      "name": "Steve"
    },
    "content": {
      "body": "The new design is awful!"
    },
    "enrichment": {
      "sentiment": {
        "label": "negative",
        "confidence": 0.9882
      }
    }
  }
]

Using the Data in Datastreamer

Once the schema is validated and the streaming data source is successfully connected to the Datastreamer, the new data source will be available in the Datastreamer platform. You can query the data and use all the features available on Search API or send the data to another API by using Monitored Search. Utilize the Datastreamer APIs and metadata fields defined in your schema to begin using the integrated data within your application.