Loading Data from Web Services

DX Graph provides the Job Type Download Data From Webservice to download data from a webservice into a Bucket. The approach for loading data from web services is to download the data into files in a Data Bucket and then use the Data File jobs to validate, transform and import the data into a Collection (as described in Working with Data Files).

Download Data From Webservice Job Type

The following paramaters are used to configure the webservice request(s).

When fetching data from a webservice, the data may be paginated. If it is paginated, multiple requests are made to fetch all the data.

Parameter	Description
targetBucketCode	The name of the Bucket that the data will be written to.
targetFilename	The name of the file that the data will be written to. It has placeholders (`{{timestamp}}`, `{{batchNumber}}`, and `{{batchId}}`) in order to give each batched set of data a unique filename. `{{timestamp}}` is the timestamp of the initial request. `{{batchNumber}}` is a sequential number starting from `1`. `{{batchId}}` is the `{{batchNumber}}` padded with zeros to make a 5 character value (e.g. `00001`, `00014`). Example: `products_{{timestamp}}_{{batchId}}.jsonl` would result in files that look like: `products_20230514_131001_00001.jsonl` followed by `products_20230514_131001_00002.jsonl`
isPaged	Indicates whether the webservice returns paginated data.
isJson	Indicates whether the webservice returns JSON data. Otherwise, it is assumed the webservice returns text (like XML).
hasMore	A Javascript expression that is evaluated after each webservice request to determine if another batch should be fetched. The expression is evaluated using the following variables: `responseBody`, `responseHeaders`, `responseStatus`, `batchNumber`, `_` (the lodash library).
initialRequest	The initial webservice request.
nextPageRequest	The webservice request to fetch the next batch of data. Each property of this configuration may use a Javascript expression by specifying a string wrapped in backticks.
responsePrep	A Javascript expression that is evaluated after each webservice request to extract the data from the response that will be written to the file. The expression is evaluated using the following variables: `responseBody`, `responseHeaders`, `responseStatus`, `batchNumber`, `_` (the lodash library).

responseBody (will be a JSON value if isJson is true), responseHeaders, responseStatus are from the last webservice request. batchNumber is the count up to the last fetched batch. _ is the lodash library.

If isJson is true, responsePrep should evaluate to a JSON array. Each JSON object will be written to a line in targetFilename. If isJson is false, responsePrep should evaluate to a string. The string will be written to targetFilename as-is. If responsePrep is not specified, the entire response body will be written to targetFilename.

Let's take a look at an example. The following configuration will fetch all the products from the https://api.example.com/products endpoint. The endpoint returns paginated JSON data. The data is written to the incoming Bucket in the file products_{{timestamp}}_{{batchNumber}}.jsonl. The hasMore expression will evaluate to true if the x-total-pages response header is greater than the current batch number. The initialRequest and nextPageRequest are the same, except for the offset value. The responsePrep expression will extract the data property from the response body.

POST https://io-staging.conscia.ai/vue/_api/v1/buckets/incoming/files/_download-from-webservice
Content-Type: application/json
Authorization: Bearer {{apiKey}}
X-Customer-Code: {{customerCode}}
{
  "targetBucketCode": "incoming",
  "targetFilename": "products_{{timestamp}}_{{batchNumber}}.jsonl",
  "isPaged": true,
  "isJson": true,
  "hasMore": "`responseHeaders.x-total-pages > batchNumber`",
  "initialRequest": {
    "url": "https://api.example.com/products",
    "method": "GET",
    "queryParameters": {
      "offset": 0,
      "limit": 2
    },
    "headers": {
      "authorization": "Bearer xyz123"
    }
  },
  "nextPageRequest": {
    "url": "https://api.example.com/products",
    "method": "GET",
    "queryParameters": {
      "offset": "`batchNumber * 2`",
      "limit": 2
    },
    "headers": {
      "authorization": "Bearer xyz123"
    }
  },
  "responsePrep": "`response.body.data`"
}

The above configuration will result in two files being written to the incoming Bucket:

products_20230514_131001_00001.jsonl
products_20230514_131001_00002.jsonl

Let's look at each step in more detail.

Initial Request

The initial request is the first request made to the webservice.

The response may look like this:

{
  "responseBody": {
    "data": [
      {
        "product_id": "123",
        "name": "iPhone 12",
        "brand": "Apple",
        "price": 999.99
      },
      {
        "product_id": "456",
        "name": "Galaxy S21",
        "brand": "Samsung",
        "price": 899.99
      }
    ]
  },
  "responseHeaders": {
    "content-type": "application/json",
    "x-total-pages": 2
  },
  "responseStatus": 200,
  "batchNumber": 1,
  "batchId": "00001"
}

The file products_20230514_131001_00001.jsonl will be written to the incoming Bucket and will contain the following:

{"product_id":"123","name":"iPhone 12","brand":"Apple","price":999.99}
{"product_id":"456","name":"Galaxy S21","brand":"Samsung","price":899.99}

The hasMore evaluates to true so another request must be made.

responseHeaders.x-total-pages > batchNumber = 2 > 1 = true

Next Page Request

The next page request is the request made to the webservice to fetch the next batch of data. The request is:

{
    "url": "https://api.example.com/products",
    "method": "GET",
    "queryParameters": {
      "offset": 2,
      "limit": 2
    },
    "headers": {
      "authorization": "Bearer xyz123"
    }
  }

The response may look like this:

{
  "responseBody": {
    "data": [
      {
        "product_id": "789",
        "name": "Pixel 5",
        "brand": "Google",
        "price": 799.99
      },
      {
        "product_id": "101112",
        "name": "OnePlus 9",
        "brand": "OnePlus",
        "price": 699.99
      }
    ]
  },
  "responseHeaders": {
    "content-type": "application/json",
    "x-total-pages": 2
  },
  "responseStatus": 200,
  "batchNumber": 2,
  "batchId": "00002"
}

The file products_20230514_131001_00002.jsonl will be written to the incoming Bucket and will contain the following:

{"product_id":"789","name":"Pixel 5","brand":"Google","price":799.99}
{"product_id":"101112","name":"OnePlus 9","brand":"OnePlus","price":699.99}

The hasMore evaluates to false so no more requests will be made.

responseHeaders.x-total-pages > batchNumber = 2 > 2 = false

Download Data From Webservice Job Type​

Initial Request​

Next Page Request​

Download Data From Webservice Job Type

Initial Request

Next Page Request