Best Practices

Abstract

Describes best practices for optimal performance and expected results when working with DataX APIs and endpoints.

Overview

Understanding each DataX API endpoint and expected results is important for DataX partners and developers. This document describes a series of best practices, with accompanying code examples, in addition to notes and clarifications on Production and Sandbox differences, polling, and FAQS.

The code examples focus on how best to use the following APIs:

  • POST /taxonomy

  • PUT /taxonomy/append/<parent id>

  • POST /usermatch

  • POST /audience

  • GET /self and /errors

Note

A full list of all status codes and messages is also provided.

POST /taxonomy API

Use this API endpoint to create a new version of your Yahoo Ad Tech taxonomy. If successful, the new taxonomy replaces and supersedes the currently active taxonomy.

The example below describes the changes that occur from the current taxonomy when making a POST call to /taxonomy with these segments:

  1. Segment 222 is soft-deleted. Note that segment membership will not be cleared.

  2. Segment 333 is created.

  3. Segment 111 is updated.

Example:

Current Taxonomy:

 [{
               "id": "111",
               "name": "Segment 111"
       },

       {
               "id": "222",
               "name": "Segment 222"
       }
]

POST to /taxonomy with following:

     [{
             "id": "111",
             "name": "Segment 112"
     },

     {
             "id": "333",
             "name": "Segment 333"
     }
]

An Exception Thrown

Making this POST call to /taxomony will cause an exception to be thrown because the parent segment 111 is assigned to mdm id 100 and the child is assigned to 123.

As a best practice, the parent segment in this example needs to include mdm id 123.

[{
              "id": "111",
              "name": "Segment 111",
              "user": {
                      "include": ["100"]
              },
              "subTaxonomy": [{
                      "id": "333",
                      "name": "Segment 333",
                      "user": {
                              "include": ["123"]
                      }
              }]
      },

      {
              "id": "222",
              "name": "Segment 222"
      }
]

As a best practice, shown in the example below, the child segment 333 inherits the permissions mdm id 100.

[{
              "id": "111",
              "name": "Segment 111",
              "user": {
                      "include": ["100"]
              },
              "subTaxonomy": [{
                      "id": "333",
                      "name": "Segment 333"
              }]
      },

      {
              "id": "222",
              "name": "Segment 222"
      }
]

POST /usermatch API

Expected status

Most successful requests to this API endpoint will return a status of ACCEPTED_WITH_ERRORS. The reason is that unmatched records will count as an error, and in most cases there will be unmatched records.

Event Log Data (DSP API) Primary ID Behavior

The primary id reported in the event level data coming from the DSP API will switch to PXID once the first usermatch request successfully processes. If DXIDs were used before, they will no longer be returned. This behavior requires planning on the partners, so that you don’t accidentally cause the switch before you are ready to consume PXIDs keyed event logs.

POST /audience API

Use this API to upload any and all types of user data – for example, to upload one or more segments, scores and/or a set of attributes, or a disjoint mixture of all of these in a single upload for a set of users.

Note

Audience requests sent within 8 hours of each other will be batch processed together. The API does not provide any guarantees that it will process requests in the exact order as they were received.

As a best practice, to delete users at the segment level and resubmit the user with the expiry set to 0, do this:

{ "urn":{string}, "seg":[{"id":"1111","exp":0}] }

Example Zip+4

Example: 94519-2710

Streamline urn types list

The list should include these (since they are the most common):

  • ZIP4

  • IXID

  • Mobile IDs: IDFA (iOS ID for Advertisers) and GPADVID (Google Play Advertising ID)

  • PXID

  • DXID (Deprecated)

Filename field in the body

As a best practice, note that the filename field is required. If it is not included, the request will return a 500 error code.

Example:

POST /v1/audience HTTP/1.1
Host: datax.yahooapis.com
Content-Type: multipart/form-data;boundary=xyz

--xyz
Content-Type: application/json;charset=UTF-8 Content-Disposition: form-data; name="metadata"
{
        "description" : "user qualifications – daily bucket 05/19/2013",
        "extensions" : { "urnType" : "IXID" }
}
--xyz
Content-Type: application/octet-stream;charset=UTF-8
Content-Disposition: form-data; name="data"; filename=”somefilename.bz2”

< bz2 compressed data >
--xyz--

Expected Status for Hashed Email

Most successful requests to this API endpoint will return a status of ACCEPTED_WITH_ERRORS. The reason is that unmatched records will count as an error, and in most cases there will be unmatched records.

GET /self and /errors

GET /self

You can get the status of a taxonomy or audience request using the /self endpoint.

The request id is returned as part of the response when the request is submitted.

https://datax.yahooapis.com/link/self/{requestid}

The response that is returned is documented in this section: Metadata.

GET /errors

If the status indicates an error, you can download the full error using this endpoint:

https://datax.yahooapis.com/link/errors/{requestid}

Polling

Partners typically poll for status at a regular interval until complete or if an error is returned.

As a best practice, we recommend that you use the following polling intervals:

  • Taxonomy requests - every 15 minutes.

  • Audience/partner match requests - every 2 hours.

Email Formatting

For purposes of email normalization, note the following:

Before uploading your JSON, you need to encrypt it with SHA256. That means, you must convert your email list to a hash which you would then place in your file.

For example: If the original email is barry@verizonmedia.com, the hashed value in the file would be d48adb3c108a657adf7597921f3bfc591ee3f00d658d2d288e0bb396ac0d5964.

Important

Your file name must be properly normalized using lowercase characters and contain no spaces.

DataX only supports pre-encrypted files. Once the files are encrypted, all the personal data that resides on the files will be protected, using the SHA256 function, so that no raw emails are ever stored.

If an email address is not hashed in the proper format, DataX will not process the audience records.

FAQs

Q. Do all endpoint urls for Partner Match use HTTPS://?

A. Yes, all of our DataX suite APIs use HTTPS:// for API calls.

Q. Is the 100 requests/hour quota shared between the partner match API and Datax? Or does it have its own 100 request limit? So in theory you could send 100 audience requests, 100 taxonomy requests, and 100 partner match api requests in 1 hour.

A. Each 100 request/hour limit is separate.

Q. If you execute multiple partner match calls, will this not overwrite the previous match output?

A. No, it is an append call, and will just add to the existing partner match IDs.

Q. Does the Line Ending on CSV body support the newline character (forward slash n) (\n) only?

A. Yes.

Q. Is a full list of all the statuses and their descriptions available?

A. Yes. The statuses and their meaning can be found in the following table

Status Code

Status Message

Description

202 Accepted

Uploaded request accepted

The upload request is finished.

400 DxInvalidRequest

<urnType> is not supported

Provided urnType is not supported.

400 DxJobNotFound

Cannot Find Job with id <request_id>

The request_id is not found in the datax db.

400 Bad Request

Bad Application Id

The application id is not correct.

500 DxInternalError

Unable to Create Job

The upload job can’t be created.

500 UNABLE_TO_PROCESS_REQUEST

Failed to process. Please try after sometime

Server is not available during the processing time.