Lift and Shift Migration Recipe from Legacy CMS to Contentful
This recipe accomplishes a "lift and shift" migration from one CMS to another. In this recipe, we are working with an older "legacy" CMS that is not API-accessible, and moving data into Contentful, a modern CMS.
The entire migration - including ingestion, schema model and data transformations, a business user validation check, and finally the load into the destination system - is done through DX Graph.
Overall Orchestration Flow
The steps for the Orchestration flow look like this:
Preconditions:
- Buckets for incoming, processed, invalidated, skipped, and for-export are created in DX Graph
- Articles content from the former CMS has been exported as a jsonl file
- An Article content model has been created in Contentful
- A mapping exercise has been done to determine which fields will be imported and how they map to the new content model
Import Flow:
- Upload Data File to Bucket
- Create Collection and set Collection schema, including a boolean column for Validated/Ready to Migrate
- Import Data File into Collection
Review Flow:
- Business users do their review through the DX Graph UI and check off the Validated column for records they deem ready to migrate. They may update and correct records as necessary.
Export Flow:
- Export the Collection into a Data File, filtering the data and transforming the schema
- Push records into Contentful through a Process Collection with Webservice Endpoint job
Import Flow
Upload Data File to Bucket
This recipe begins with the assumption that the CMS we're migrating from is not API-accessible and we will need to export data in some other way. We further assume we've been able to export a file containing article data in jsonl format. The file includes records structured like this:
legacy_cms_articles.jsonl
{"type": "article", "id": 1, "created_timestamp": "2023-01-07T00:00:00", "updated_timestamp": "2023-02-06T00:00:00", "headline": "Understanding the Basics of Coding", "subtitle": "Essential tips for achieving your goals", "summary": "Explore practical steps to improve workflow in daily tasks.", "content": "Suspendisse potenti. Nulla facilisi. Pellentesque mollis eros vel purus consectetur, non pulvinar est tincidunt.", "author": "Chris Green", "publish_date": "2023-01-22T00:00:00", "image_url": "https://example.com/image1.jpg", "tags": ["Workflow", "Productivity"]}
{"type": "article", "id": 2, "created_timestamp": "2023-01-22T00:00:00", "updated_timestamp": "2023-02-09T00:00:00", "headline": "The Ultimate Guide to Productivity", "subtitle": "Strategies to stay productive in any environment", "summary": "Learn essential strategies to enhance productivity levels.", "content": "Curabitur venenatis ut elit quis tempus, sed eget sem pretium. Donec bibendum nisl eu eros volutpat.", "author": "Jane Smith", "publish_date": "2023-01-30T00:00:00", "image_url": "https://example.com/image4.jpg", "tags": ["Workflow"]}
{"type": "article", "id": 3, "created_timestamp": "2023-02-27T00:00:00", "updated_timestamp": "2023-03-21T00:00:00", "headline": "Top 10 Tips for Success", "subtitle": "A beginner's guide to coding languages", "summary": "A comprehensive look at coding fundamentals for beginners.", "content": "Curabitur venenatis ut elit quis tempus, sed eget sem pretium. Donec bibendum nisl eu eros volutpat.", "author": "John Doe", "publish_date": "2023-03-14T00:00:00", "image_url": "https://example.com/image5.jpg", "tags": ["Events", "News"]}
{"type": "article", "id": 4, "created_timestamp": "2023-03-13T00:00:00", "updated_timestamp": "2023-03-17T00:00:00", "headline": "How to Improve Your Workflow", "subtitle": "Boosting your efficiency with proven methods", "summary": "This article covers recent major events impacting the world.", "content": "Suspendisse potenti. Nulla facilisi. Pellentesque mollis eros vel purus consectetur, non pulvinar est tincidunt.", "author": "Alex Brown", "publish_date": "2023-03-27T00:00:00", "image_url": "https://example.com/image4.jpg", "tags": ["Productivity"]}
{"type": "article", "id": 5, "created_timestamp": "2023-02-01T00:00:00", "updated_timestamp": "2023-02-26T00:00:00", "headline": "Breaking News: Major Event Unfolds", "subtitle": "Strategies to stay productive in any environment", "summary": "Explore practical steps to improve workflow in daily tasks.", "content": "Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Morbi ac purus sem.", "author": "Chris Green", "publish_date": "2023-02-04T00:00:00", "image_url": "https://example.com/image4.jpg", "tags": ["Tips", "Success"]}
{"type": "article", "id": 6, "created_timestamp": "2023-02-17T00:00:00", "updated_timestamp": "2023-02-20T00:00:00", "headline": "Breaking News: Major Event Unfolds", "subtitle": "Essential tips for achieving your goals", "summary": "Explore practical steps to improve workflow in daily tasks.", "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus lacinia odio vitae vestibulum vestibulum.", "author": "Alex Brown", "publish_date": "2023-02-18T00:00:00", "image_url": "https://example.com/image1.jpg", "tags": ["Success"]}
{"type": "article", "id": 7, "created_timestamp": "2023-03-16T00:00:00", "updated_timestamp": "2023-03-27T00:00:00", "headline": "The Ultimate Guide to Productivity", "subtitle": "Essential tips for achieving your goals", "summary": "Explore practical steps to improve workflow in daily tasks.", "content": "Curabitur venenatis ut elit quis tempus, sed eget sem pretium. Donec bibendum nisl eu eros volutpat.", "author": "John Doe", "publish_date": "2023-03-24T00:00:00", "image_url": "https://example.com/image3.jpg", "tags": ["Updates", "News"]}
{"type": "article", "id": 8, "created_timestamp": "2023-02-25T00:00:00", "updated_timestamp": "2023-03-27T00:00:00", "headline": "The Ultimate Guide to Productivity", "subtitle": "Boosting your efficiency with proven methods", "summary": "A comprehensive look at coding fundamentals for beginners.", "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus lacinia odio vitae vestibulum vestibulum.", "author": "Jane Smith", "publish_date": "2023-03-10T00:00:00", "image_url": "https://example.com/image1.jpg", "tags": ["Coding", "Development"]}
Our first step is to upload this file into an incoming Data Bucket, which has already been created as a precondition for this recipe.
A POST request to the upload
endpoint is all that's needed to perform the upload, specifying the customer code, bucket code, and file name:
curl --request POST
--url '{{engineUrl}}/vue/_api/v1/buckets/incoming/upload'
-H 'Authorization: Bearer ***'
-H 'X-Customer-Code: {{customerCode}}'
-F 'file[]=@Documents/legacy_cms_articles.jsonl'
If the CMS we're migrating from is API-accessible, then we can use the Loading Data from Webservices job, demonstrated in this recipe.
Set Collection and Schema
Before importing the data into a Collection for validation, we need to create a Collection and set the Schema. This will include creating a column with a checkbox for business users to mark when they've completed their record review.
Create the Collection
Creating a Collection through the API requires that you define a code and name for the Collection, as well as identify the dataRecordIdentifierProperty
, which will set the beginnings of the data schema.
The dataRecordIdentifierProperty
is the property in the incoming data files that serves as the unique identifier for each record. This value cannot be updated later.
POST {{engineUrl}}/vue/_api/v1/collections
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
content-type: application/json
{
"collectionCode": "legacy-articles",
"name": "Legacy Articles",
"description": "Articles from Legacy CMS",
"dataRecordIdentifierProperty": "id"
}
Set the Schema
After the Collection is created, set the schema. You have the freedom and flexibility here to set whatever schema you need for your purposes. In this case, we want to handle the fields that will be migrated to Contentful, and we want to set which fields can be edited (or not) by business users in order to correct or update the data before migration.
Therefore we are not importing all the fields in the original data file, and we are setting many of the fields to "readonly": false
. We are also adding a validated
boolean field at the end to enable business user review.
POST {{engineUrl}}/vue/_api/v1/collections/legacy-articles/schema
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
content-type: application/json
{
"fields": {
"id": {
"jsonSchema": { "type": "string", "title": "Article ID" },
"displaySchema": {},
"options": { "readonly": true, "required": true }
},
"headline": {
"jsonSchema": { "type": "string", "title": "Headline" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"subtitle": {
"jsonSchema": { "type": "string", "title": "Subtitle" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"summary": {
"jsonSchema": { "type": "string", "title": "Summary" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"content": {
"jsonSchema": { "type": "string", "title": "Summary" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"author": {
"jsonSchema": { "type": "string", "title": "Author" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"publish_date": {
"jsonSchema": { "type": "string", "title": "Publish Date" },
"displaySchema": {},
"options": { "readonly": true, "required": false }
},
"image_url": {
"jsonSchema": { "type": "string", "title": "Image URL" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"tags": {
"jsonSchema": {
"type": "array",
"title": "Tags",
"items": {
"jsonSchema": { "type": "string" }
}
},
"displaySchema": {},
"options": { "readonly": false, "required": false }
},
"validated": {
"jsonSchema": { "type": "boolean", "title": "Ready to Migrate" },
"displaySchema": {},
"options": { "readonly": false, "required": false }
}
}
}
Configure Left Nav for the UI
Collections do not appear by default in the UI. It is up to us how we want our Collections to be organized and visible.
Use the following call to set the Legacy CMS Articles to display on the Sources page.
PUT {{engineUrl}}/vue/_api/v1/applications/dx-graph/pages/source/_configureLeftNav
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
{
"navigationConfiguration": [
{
"title": "Legacy CMS",
"ordinal": 1,
"active": true,
"actions": [
{
"label": "Articles",
"dataRepositoryCode": "content-master",
"dataCollectionCode": "legacy-articles"
}
]
}
]
}
Import Data File into Collection
With a Data File sitting in a Bucket in DX Graph and a Collection defined, we can use the Import endpoint to import the Article records from the Data File into the Collection.
Specify the Bucket where the data is sitting in the URL, and in the body specify the file name and the Collection we are importing this file into.
POST {{engineUrl}}/vue/_api/v1/buckets/incoming/files/_import
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
content-type: application/json
{
"skippedBucketCode": "skipped",
"processedBucketCode": "processed",
"invalidBucketCode": "invalidated",
"filenamePattern": "legacy_cms_articles.jsonl",
"skipInvalidRecords": false,
"recordIdentifierField": "id",
"collectionCode": "legacy-articles",
"parseOptions": {
"format": "JSONL"
}
}
If the import failed, we'll see relevant details in the response and/or find an error file in the Invalidated Bucket with more information specific to each record. If it succeeded, we'll see a response like this:
{
"filenamesToProcess": [
"legacy_cms_articles.jsonl"
],
"filenamesToSkip": [],
"processedFiles": [
{
"filename": "legacy_cms_articles.jsonl",
"nbrValidRecords": 20,
"nbrValidationIssues": 0,
"nbrInserts": 20,
"nbrUpdates": 0,
"message": "Successfully loaded file: legacy_cms_articles.jsonl into legacy-articles. "
}
],
"hasInvalidRecords": false
}
We can verify the imported Collection data by checking the UI or by querying the collection through the API at this endpoint: {{engineUrl}}/vue/_api/v1/collections/legacy-articles/records/_query
Review Flow
The next step is to provide business users access to the UI and let them review the records. They will be able to update the fields we've allowed them to edit and if they feel a record is ready to migrate to Contentful, they can mark the Validated checkbox.
There's no need for a separate system or portal or any kind of steps outside of normal DX Graph capabilities and flows for this step. DX Graph is designed to make ETL processes transparent to both the technical and business user, and to allow them both to participate.
Export Flow
When the review is complete, we can export data from the DX Graph Collection into a "ready to export" Bucket, transforming and filtering as required, then send the records directly into Contentful to create new Article entries.
If our transformation and schema mapping requirements were very straightforward, we could choose to skip the Export Collection to File step and just use the Process Collection with Webservice Endpoint job instead. The parameters are similar to the Process File with Webservice Endpoint job we use below and filtering is available. However any transformation and mapping would need to be detailed directly in the body
field in the Process Collection job.
Export Collection to File
This is the step where we will filter and transform the data for consumption by Contentful's APIs.
The key fields here are filter
, where we say we only want to export records that our business users have marked as validated, and transformers
, where we set each field required for migration into Contentful.
This recipe keeps the transformations fairly uncomplicated for demonstration's sake, but we could just as easily be handling multi-level schemas or separating or combining data from different Collections through relationships. Here all we need to do is add a locale for each field and remember not to bring over the Validated field.
POST {{engineUrl}}vue/_api/v1/job-types/exportCollection/_execute
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
content-type: application/json
{
"params" : {
"collectionCode": "legacy-articles",
"targetBucketCode": "for-export",
"filenamePattern": "legacy_cms_articles_for_migration.jsonl",
"filter" : {
"$eq": {
"field": "validated",
"value": true
}
},
"transformers": [
{ "type": "javascript", "config": { "expression": "_.set(data, 'title', { 'en-US' : data.headline } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'description', { 'en-US' : data.summary } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'author', { 'en-US' : data.author } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'urlToImage', { 'en-US' : data.image_url } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'content', { 'en-US' : data.content } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'publishedAt', { 'en-US' : data.publish_date } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'category', { 'en-US' : data.tags } )" } },
{ "type": "javascript", "config": { "expression": "_.set(data, 'legacyId', { 'en-US' : _.toNumber(data.id) } )" } },
{ "type": "javascript", "config": { "expression": "_.pick(data, ['title', 'description', 'author', 'urlToImage', 'content', 'publishedAt', 'category', 'legacyId'])" } }
]
}
}
Process File with Webservice Endpoint
In this step, we use the processFileWithWebserviceEndpoint
job type to call the Contentful API to insert one record at a time, creating a new Entry in our predefined Articles content type. If the API allowed for batch uploading, this job would also accommodate that through use of the batchSize
property.
Because of the transformation we did in the previous step, we can insert records[0]
directly into the body of the API call to Contentful. In this example, we are allowing Contentful to generate a new Entry ID for us, but we could also have taken the legacyId
and used it as the new Entry ID. The records
object is available to any field in the webserviceEndpoint
property.
POST {{engineUrl}}/vue/_api/v1/job-types/processFileWithWebserviceEndpoint/_execute
Header: X-Customer-Code : {{customerCode}}
Authorization: Bearer ***
content-type: application/json
{
"params" : {
"dataBucketCode": "for-export",
"filename": "legacy_cms_articles_for_migration.jsonl",
"batchSize" : 1,
"webserviceEndpoint" : {
"url": "https://api.contentful.com/spaces/{{space_id}}/environments/{{environment_id}}/entries",
"method": "POST",
"headers": {
"Authorization": "Bearer ***",
"Content-Type": "application/vnd.contentful.management.v1+json",
"X-Contentful-Content-Type": "{{contentType_id}}"
},
"body": { "fields" : "`records[0]`" }
}
}
}