Uploading Data from Contentful into DX Graph
You can implement a solution based on this recipe that uses DX Graph jobs to import, validate, and transform data from Contentful's webservice.
Overall Orchestration Flow
DX Graph can load records in from a wide variety of sources, from flat files to API webservices. In this case, we will demonstrate connecting to Contentful's webservice to load blog records into DX Graph.
The steps look like this:
Preconditions:
- Create incoming, processed, and invalidated buckets (Working with Data Buckets)
- Create a Collection for the Contentful blog data (Creating Collections)
- Update the left hand nav in DX Graph Studio with the new Collection (Adding the Collection to the Sources Navigation)
Recipe:
- Upload blog records into DX Graph
- Create the Collection schema
- Import the blog records into the Collection
- Query the Collection
You can also review your buckets and the Collection, as well as edit the schema within DX Studio. Try creating an inspector card after your records are imported. See the Business User Guide for further instruction.
The overall orchestration flow:
DX Graph Jobs
This recipe includes jobs to upload records into DX Graph and to import records into a Collection. There are also API calls to create a schema and query the new Collection. Let's examine each of these more closely.
Upload blog records into DX Graph
DX Graph has a number of job types available, and we'll make use of the downloadDataFromWebservice
job type.
Here we're running the job on demand for an initial data import.
This job will download all "fashion-blog" content type records from our Contentful sandbox into a jsonl file and upload it into the incoming bucket. The job is set up for pagination.
POST https://io-staging.conscia.ai/vue/_api/v1/buckets/incoming/files/_download-from-webservice
{
"targetBucketCode": "incoming",
"targetFilename": "articles_{{timestamp}}_{{batchNumber}}.jsonl",
"isPaged": true,
"isJson": true,
"hasMore": "`responseBody.total/50 > batchNumber`",
"initialRequest": {
"url": "https://cdn.contentful.com/spaces/{{space}}/environments/{{environment}}/entries",
"method": "GET",
"queryParameters": {
"access_token": "{{access_token}}",
"content_type": "fashion-blog",
"skip": 0,
"limit": 50
}
},
"nextPageRequest": {
"url": "https://cdn.contentful.com/spaces/{{space}}/environments/{{environment}}/entries",
"method": "GET",
"queryParameters": {
"access_token": "{{access_token}}",
"content_type": "fashion-blog",
"skip": "`batchNumber * 50`",
"limit": 50
}
},
"responsePrep": "`response.items`"
}
One line (or record) of our jsonl file from Contentful looks like this:
{
"metadata": {
"tags":[]
},
"sys": {
"space": {
"sys":{
"type":"Link",
"linkType":"Space",
"id":"{{space}}"
}
},
"id":"1dsYFCJ48C3YSYMMEkZca0",
"type":"Entry",
"createdAt":"2023-12-22T20:25:42.317Z",
"updatedAt":"2023-12-22T20:30:50.643Z",
"environment":{
"sys":{
"id":"{{environment}}",
"type":"Link",
"linkType":"Environment"
}
},
"revision":2,
"contentType":{
"sys":{
"type":"Link",
"linkType":"ContentType",
"id":"fashion-blog"
}
},
"locale":"en-US"
},
"fields": {
"title":"GeekDad Review: Vasque Breeze LT Low GTX Waterproof Hiking Shoes",
"description":"That means wearing hiking boots is something I’d rather not do if possible. [cut for length]",
"url":"/geekdad-review-vasque-breeze-lt-low-gtx-waterproof-hiking-shoes",
"urlToImage":"https://149455152.v2.pressablecdn.com/wp-content/uploads/2021/05/Vasque-Breeze-LT-Low-GTX-review.jpg",
"content":"We are well into the 2021 hiking season in my area. Temperatures are getting hot. [cut for length]",
"publishedAt":"dec 2023"
}
}
Create the Collection schema
Next we'll create a schema for our inbound records. We're going to want to flatten the raw response and remove unneeded metadata, so we will define our schema with only the necessary fields.
POST https://io-staging.conscia.ai/vue/_api/v1/collections/{{collectionCode}}/schema
{
"fields": {
"article_id": {
"jsonSchema": { "type": "string" }
},
"title": {
"jsonSchema": { "type": "string" }
"options": {},
"displaySchema": {},
"shadow": false,
"computed": false
}, ...
}
}
Import the blog records into the Collection
This call is to import records into our Collection. This process will validate and transform the data as necessary. In this example, we're skipping validation and only transforming the fields we want to import.
POST https://io-staging.conscia.ai/vue/_api/v1/buckets/incoming/files/_import
{
"skippedBucketCode": "skipped",
"processedBucketCode": "processed",
"invalidBucketCode": "invalidated",
"filenamePattern": "articles*.jsonl",
"skipInvalidRecords": false,
"recordIdentifierField": "article_id",
"transformers": [
{ "type": "javascript", "config": { "expression": "_.set(data, 'title', data.fields.title)" } }, //adding
{ "type": "javascript", "config": { "expression": "_.set(data, 'description', data.fields.description)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'url', data.fields.url)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'urlToImage', data.fields.urlToImage)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'content', data.fields.content)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'publishedAt', data.fields.publishedAt)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_space', data.sys.space.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_environment', data.sys.environment.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, '_contentType', data.sys.contentType.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'article_id', data.sys.id)" } },
{ "type": "javascript", "config": { "expression": "_.pick(data, ['title','description', 'url', 'urlToImage', 'content', 'publishedAt', '_space', '_environment', '_contentType', 'article_id'])" } } //specify what fields to keep, the rest will not be imported
],
"collectionCode": "contentful-articles",
"parseOptions": {
"format": "JSONL"
}
}
Query the Collection
Lastly, to verify the data import, we can query the collection via the API.
POST https://io-staging.conscia.ai/vue/_api/v1/collections/{{collectionCode}}/records/_query
The response will show our data in the schema model we defined, with the addition of @iat
and @uat
properties to track import and update timestamps:
{
"data": [
{
"@iat": 1710956620065,
"@uat": 1710956620065,
"_contentType": "fashion-blog",
"_environment": "{{environment}}",
"_space": "{{space}}",
"article_id": "1dsYFCJ48C3YSYMMEkZca0",
"content": "We are well into the 2021 hiking season in my area. Temperatures are getting hot. That means wearing hiking boots is something I’d rather not do if possible. They tend to be hot, heavy, and not very breathable. [cut for length]",
"dataRecordIdentifier": "1dsYFCJ48C3YSYMMEkZca0",
"description": "That means wearing hiking boots is something I’d rather not do if possible. [cut for length]",
"publishedAt": "dec 2023",
"title": "[Contentful] GeekDad Review: Vasque Breeze LT Low GTX Waterproof Hiking Shoes",
"url": "/geekdad-review-vasque-breeze-lt-low-gtx-waterproof-hiking-shoes",
"urlToImage": "https://149455152.v2.pressablecdn.com/wp-content/uploads/2021/05/Vasque-Breeze-LT-Low-GTX-review.jpg"
},
Alternative: Create Reusable Job Definitions
Alternatively, create reusable Job Definitions at this endpoint for the upload and import jobs. Put the body of the upload and import jobs we ran above into the params
field.
POST https://io-staging.conscia.ai/vue/_api/v1/job-definitions
And then execute them here:
POST https://io-staging.conscia.ai/vue/_api/v1/job-definitions/{{jobDefinitionCode}}/_execute
The response will return a jobId
and you can check its status at:
GET https://io-staging.conscia.ai/vue/_api/v1/jobs/{{jobId}}
References
As a follow up, you may want to explore a recipe to use DX Engine Orchestration to update a search engine index based on record created and updated events in your DX Graph collection.