Skip to main content

Uploading Data from Contentful into DX Graph

You can implement a solution based on this recipe that uses DX Graph jobs to import, validate, and transform data from Contentful's webservice.

Overall Orchestration Flow

DX Graph can load records in from a wide variety of sources, from flat files to API webservices. In this case, we will demonstrate connecting to Contentful's webservice to load blog records into DX Graph.

The steps look like this:

Preconditions:

  1. Create incoming, processed, and invalidated buckets (Working with Data Buckets)
  2. Create a Collection for the Contentful blog data (Creating Collections)
  3. Update the left hand nav in DX Graph Studio with the new Collection (Adding the Collection to the Sources Navigation)

Recipe:

  1. Upload blog records into DX Graph
  2. Create the Collection schema
  3. Import the blog records into the Collection
  4. Query the Collection

You can also review your buckets and the Collection, as well as edit the schema within DX Studio. Try creating an inspector card after your records are imported. See the Business User Guide for further instruction.

The overall orchestration flow:

Contentful to DX Graph Flow

DX Graph Jobs

This recipe includes jobs to upload records into DX Graph and to import records into a Collection. There are also API calls to create a schema and query the new Collection. Let's examine each of these more closely.

Upload blog records into DX Graph

DX Graph has a number of job types available, and we'll make use of the downloadDataFromWebservice job type.

Here we're running the job on demand for an initial data import.

This job will download all "fashion-blog" content type records from our Contentful sandbox into a jsonl file and upload it into the incoming bucket. The job is set up for pagination.

POST https://io-staging.conscia.ai/vue/_api/v1/buckets/incoming/files/_download-from-webservice

{
"targetBucketCode": "incoming",
"targetFilename": "articles_{{timestamp}}_{{batchNumber}}.jsonl",
"isPaged": true,
"isJson": true,
"hasMore": "`responseBody.total/50 > batchNumber`",
"initialRequest": {
"url": "https://cdn.contentful.com/spaces/{{space}}/environments/{{environment}}/entries",
"method": "GET",
"queryParameters": {
"access_token": "{{access_token}}",
"content_type": "fashion-blog",
"skip": 0,
"limit": 50
}
},
"nextPageRequest": {
"url": "https://cdn.contentful.com/spaces/{{space}}/environments/{{environment}}/entries",
"method": "GET",
"queryParameters": {
"access_token": "{{access_token}}",
"content_type": "fashion-blog",
"skip": "`batchNumber * 50`",
"limit": 50
}
},
"responsePrep": "`response.items`"
}

One line (or record) of our jsonl file from Contentful looks like this:

{
"metadata": {
"tags":[]
},
"sys": {
"space": {
"sys":{
"type":"Link",
"linkType":"Space",
"id":"{{space}}"
}
},
"id":"1dsYFCJ48C3YSYMMEkZca0",
"type":"Entry",
"createdAt":"2023-12-22T20:25:42.317Z",
"updatedAt":"2023-12-22T20:30:50.643Z",
"environment":{
"sys":{
"id":"{{environment}}",
"type":"Link",
"linkType":"Environment"
}
},
"revision":2,
"contentType":{
"sys":{
"type":"Link",
"linkType":"ContentType",
"id":"fashion-blog"
}
},
"locale":"en-US"
},
"fields": {
"title":"GeekDad Review: Vasque Breeze LT Low GTX Waterproof Hiking Shoes",
"description":"That means wearing hiking boots is something I’d rather not do if possible. [cut for length]",
"url":"/geekdad-review-vasque-breeze-lt-low-gtx-waterproof-hiking-shoes",
"urlToImage":"https://149455152.v2.pressablecdn.com/wp-content/uploads/2021/05/Vasque-Breeze-LT-Low-GTX-review.jpg",
"content":"We are well into the 2021 hiking season in my area. Temperatures are getting hot. [cut for length]",
"publishedAt":"dec 2023"
}
}

Create the Collection schema

Next we'll create a schema for our inbound records. We're going to want to flatten the raw response and remove unneeded metadata, so we will define our schema with only the necessary fields.

POST https://io-staging.conscia.ai/vue/_api/v1/collections/{{collectionCode}}/schema

{
"fields": {
"article_id": {
"jsonSchema": { "type": "string" }
},
"title": {
"jsonSchema": { "type": "string" }
"options": {},
"displaySchema": {},
"shadow": false,
"computed": false
}, ...
}
}

Import the blog records into the Collection

This call is to import records into our Collection. This process will validate and transform the data as necessary. In this example, we're skipping validation and only transforming the fields we want to import.

POST https://io-staging.conscia.ai/vue/_api/v1/buckets/incoming/files/_import

{
"skippedBucketCode": "skipped",
"processedBucketCode": "processed",
"invalidBucketCode": "invalidated",
"filenamePattern": "articles*.jsonl",
"skipInvalidRecords": false,
"recordIdentifierField": "article_id",
"transformers": [
{ "type": "javascript", "config": { "expression": "_.set(data, 'title', data.fields.title)" } }, //adding
{ "type": "javascript", "config": { "expression": "_.set(data, 'description', data.fields.description)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'url', data.fields.url)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'urlToImage', data.fields.urlToImage)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'content', data.fields.content)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'publishedAt', data.fields.publishedAt)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_space', data.sys.space.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_environment', data.sys.environment.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_contentType', data.sys.contentType.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'article_id', data.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.pick(data, ['title','description', 'url', 'urlToImage', 'content', 'publishedAt', '_space', '_environment', '_contentType', 'article_id'])" } } //specify what fields to keep, the rest will not be imported
],
"collectionCode": "contentful-articles",
"parseOptions": {
"format": "JSONL"
}
}

Query the Collection

Lastly, to verify the data import, we can query the collection via the API.

POST https://io-staging.conscia.ai/vue/_api/v1/collections/{{collectionCode}}/records/_query

The response will show our data in the schema model we defined, with the addition of @iat and @uat properties to track import and update timestamps:

{
"data": [
{
"@iat": 1710956620065,
"@uat": 1710956620065,
"_contentType": "fashion-blog",
"_environment": "{{environment}}",
"_space": "{{space}}",
"article_id": "1dsYFCJ48C3YSYMMEkZca0",
"content": "We are well into the 2021 hiking season in my area. Temperatures are getting hot. That means wearing hiking boots is something I’d rather not do if possible. They tend to be hot, heavy, and not very breathable. [cut for length]",
"dataRecordIdentifier": "1dsYFCJ48C3YSYMMEkZca0",
"description": "That means wearing hiking boots is something I’d rather not do if possible. [cut for length]",
"publishedAt": "dec 2023",
"title": "[Contentful] GeekDad Review: Vasque Breeze LT Low GTX Waterproof Hiking Shoes",
"url": "/geekdad-review-vasque-breeze-lt-low-gtx-waterproof-hiking-shoes",
"urlToImage": "https://149455152.v2.pressablecdn.com/wp-content/uploads/2021/05/Vasque-Breeze-LT-Low-GTX-review.jpg"
},

Alternative: Create Reusable Job Definitions

Alternatively, create reusable Job Definitions at this endpoint for the upload and import jobs. Put the body of the upload and import jobs we ran above into the params field.

POST https://io-staging.conscia.ai/vue/_api/v1/job-definitions

And then execute them here:

POST https://io-staging.conscia.ai/vue/_api/v1/job-definitions/{{jobDefinitionCode}}/_execute

The response will return a jobId and you can check its status at:

GET https://io-staging.conscia.ai/vue/_api/v1/jobs/{{jobId}}

References