Loading Data from Web Services
The DXO provides the Job Type Download Data From Webservice
to download data from a webservice into a Bucket. The approach for loading data from web services is to download the data into files in a Data Bucket and then use the Data File jobs to validate, transform and import the data into a Collection (as described in Working with Data Files).
Job Type Code | Description |
---|---|
downloadDataFromWebservice | The Job Type Download Data From Webservice is used to download data from a webservice into a Bucket. |
Input Parameters
The following paramaters are used to configure the webservice request(s).
When fetching data from a webservice, the data may be paginated. If it is paginated, multiple requests are made to fetch all the data.
Parameter | Required | Description |
---|---|---|
targetBucketCode | Yes | The name of the Bucket that the data will be written to. |
targetFilename | Yes | The name of the file that the data will be written to. It has placeholders ({{timestamp}} , {{batchNumber}} , and {{batchId}} ) in order to give each batched set of data a unique filename. - {{timestamp}} is the timestamp of the initial request. - {{batchNumber}} is a sequential number starting from 1 . - {{batchId}} is the {{batchNumber}} padded with zeros to make a 5 character value (e.g. 00001 , 00014 ). Example: products_{{timestamp}}_{{batchId}}.jsonl would result in files that look like: products_20230514_131001_00001.jsonl followed by products_20230514_131001_00002.jsonl |
isPaged | Yes | Indicates whether the webservice returns paginated data. |
isJson | Yes | Indicates whether the webservice returns JSON data. Otherwise, it is assumed the webservice returns text (like XML). |
hasMore | Yes | A Javascript expression that is evaluated after each webservice request to determine if another batch should be fetched. The expression has access to the following variables: responseBody , responseHeaders , responseStatus , batchNumber , _ (the lodash library). It must evaluate to true or false. |
initialRequest | Yes | The initial webservice request. See the example below for details. |
nextPageRequest | Yes | The webservice request to fetch the next batch of data. See the example below for details. |
responsePrep | No | A Javascript expression that is evaluated after each webservice request to extract the data from the response that will be written to the file. Defaults to writing the entire response body to the file. |
responseBody
is the data from the last webservice request after the filtering, mapping, and transforming you may have done inresponsePrep
. This is important to keep in mind if you are referencing anext-page
type object from the response body in thehasMore
expression.responseHeaders
andresponseStatus
are from the last webservice request.batchNumber
is the count up to the last fetched batch._
is the lodash library.- If
isJson
istrue
,responsePrep
should evaluate to a JSON array. Each JSON object will be written to a line intargetFilename
. IfisJson
isfalse
,responsePrep
should evaluate to a string. The string will be written totargetFilename
as-is. IfresponsePrep
is not specified, the entire response body will be written totargetFilename
.
Example
Let's take a look at an example. The following configuration will fetch all the products from the https://api.example.com/products
endpoint. The endpoint returns paginated JSON data. The data is written to the incoming
Bucket in the file products_{{timestamp}}_{{batchNumber}}.jsonl
. The hasMore
expression will evaluate to true
if the x-total-pages
response header is greater than the current batch number. The initialRequest
and nextPageRequest
are the same, except for the offset
value. The responsePrep
expression will extract the data
property from the response body.
POST {{engineUrl}}/buckets/incoming/files/_download-from-webservice
Content-Type: application/json
Authorization: Bearer {{apiKey}}
X-Customer-Code: {{customerCode}}
{
"targetBucketCode": "incoming",
"targetFilename": "products_{{timestamp}}_{{batchId}}.jsonl",
"isPaged": true,
"isJson": true,
"hasMore": "`responseHeaders.x-total-pages > batchNumber`",
"initialRequest": {
"url": "https://api.example.com/products",
"method": "GET",
"queryParameters": {
"offset": 0,
"limit": 2
},
"headers": {
"authorization": "Bearer xyz123"
}
},
"nextPageRequest": {
"url": "https://api.example.com/products",
"method": "GET",
"queryParameters": {
"offset": "`batchNumber * 2`",
"limit": 2
},
"headers": {
"authorization": "Bearer xyz123"
}
},
"responsePrep": "`response.body.data`"
}
The above configuration will result in two files being written to the incoming
Bucket:
- products_20230514_131001_00001.jsonl
- products_20230514_131001_00002.jsonl
Let's look at each step in more detail.
Initial Request
The initial request is the first request made to the webservice.
The response may look like this:
{
"responseBody": {
"data": [
{
"product_id": "123",
"name": "iPhone 12",
"brand": "Apple",
"price": 999.99
},
{
"product_id": "456",
"name": "Galaxy S21",
"brand": "Samsung",
"price": 899.99
}
]
},
"responseHeaders": {
"content-type": "application/json",
"x-total-pages": 2
},
"responseStatus": 200,
"batchNumber": 1,
"batchId": "00001"
}
The file products_20230514_131001_00001.jsonl
will be written to the incoming
Bucket and will contain the following:
{"product_id":"123","name":"iPhone 12","brand":"Apple","price":999.99}
{"product_id":"456","name":"Galaxy S21","brand":"Samsung","price":899.99}
The hasMore
evaluates to true
so another request must be made.
responseHeaders.x-total-pages > batchNumber
=2 > 1
=true
Next Page Request
The next page request is the request made to the webservice to fetch the next batch of data. The request is:
{
"url": "https://api.example.com/products",
"method": "GET",
"queryParameters": {
"offset": 2,
"limit": 2
},
"headers": {
"authorization": "Bearer xyz123"
}
}
The response may look like this:
{
"responseBody": {
"data": [
{
"product_id": "789",
"name": "Pixel 5",
"brand": "Google",
"price": 799.99
},
{
"product_id": "101112",
"name": "OnePlus 9",
"brand": "OnePlus",
"price": 699.99
}
]
},
"responseHeaders": {
"content-type": "application/json",
"x-total-pages": 2
},
"responseStatus": 200,
"batchNumber": 2,
"batchId": "00002"
}
The file products_20230514_131001_00002.jsonl
will be written to the incoming
Bucket and will contain the following:
{"product_id":"789","name":"Pixel 5","brand":"Google","price":799.99}
{"product_id":"101112","name":"OnePlus 9","brand":"OnePlus","price":699.99}
The hasMore
evaluates to false
so no more requests will be made.
responseHeaders.x-total-pages > batchNumber
=2 > 2
=false