Syntax rules for data formats

In the design of data format to be used in your API, you will definitely come across the problem on how to structure the syntax.

Thankfully, the unix shell provides us with some battle tested guidelines that we can use:

binary

binary

Provide support for comments

Invariably you will want a way to provide comments to the data that are not executable. In such a case simply use # to lead any section that is meant to be a comment.

Ignore whitespace

Now, I am a big fan of the python language, however, your data format is not where you want whitespace to matter. Spaces and tabs should be taken to mean a single space no matter the repetition or position.

This is because data structures are meant to be seen by human eyes and humans just aren’t that good at distinguishing spaces.

A corollary is that multiple blank lines should also be treated as one.

Quotations

In shell and some programming languages single and double quotes mean different things. In as much as possible avoid replicating this. Whatever you choose the quotes to mean let it be the same for both single and double.

Special characters

Support special and unprintable characters using the common \. This prevents any kind of suprised in your code. Most consumers of your data will for example take for granted that \n means newline and \t means new tab.

Keep it simple

We have already talked about the KISS principle before. It also applies in this case. Complex lexical rules violate the principle wholesale and should be avoided at all cost.

If you already use a standard data format such as JSON or XML then you may have noticed they tend to respect the general guidelines provided above.

Have you ever used a custom data format in your code before? Lets keep the conversation going.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Multipart media types

We have already talked about various data types you may use with your API. http://206.189.161.181/2015/06/common-hypermedia-types/. However at times you may need to encode binary data in your representations.

In the interest of transparency it makes sense to declare the payload as a multipart type.

A multipart message gives you the power to pass more than one media type over a single HTTP message. Each media part occupies it’s own space in the message.

Multiple medias

Multiple medias

A sample message for a user representation is shown below.

Content-type:   multipart/mixed;    boundary="multi_sep"
--multi_sep
Content-Type:   application/xml;charset=UTF-8
<user>  ... </user>
--multi_sep
Content-type:   image/png
... image   

In the above HTTP Message. We use an arbitrary boundary multi_sep to separate the two different media types.

The most common are listed below:

multipart/form-data

By far the most common. This particular multipart media type is used to encode name pair values in HTML forms which have binary data, say a profile picture upload.

multipart/mixed

This particular type mixes two or more representations in one message such as the one above where we mixed the user details data which is in XML with their profile picture which is in PNG.

multipart/alternative

In some cases one asset needs multiple representations. For example you may want to send both a HTML and a JSON representation of the same resource. In this case you need to specify the media type as above.

multipart/related

When you have various related parts in the application such as a user biographical information and their social media information, you can display both of them by using the multipart/related media type and refer to the other part using the Content-ID

You may possibly be tempted to use a binary to text encoding such as Base64 to keep your representations in the same HTTP message. This is a bad idea. To understand why check out this paper Base64 Can Get You Pwned

However on the flip side if you can avoid using multiparts all together and instead provide a rel link so that the assets are each on their own HTTP message, this would be the better option.

Lets keep the conversation going comment below. As usual don’t forget to Signup for our newsletter!

`

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Common Hypermedia Types

In the design of APIs you are quite likely to end up in a quagmire on how to represent your data. Now in some way we have already looked at the problem in the past http://206.189.161.181/2015/05/dry-up-your-api-with-microformats/. However microformats can only take you so far, you may need to design a full on API service and that is where hypermedia comes into play.

There are an innumerable number of hypermedia types so I will just stick to the main ones.

HTML (Hyper Text Markup Language)

This is by far the most familiar media type of all.

It is characterized by use of tags to differentiate document content.

An example of a representation in HTML would be:

    <ul>
      <li>
        <a href="/list/1" rel="item" class="item">
          <span class="identifier">1</span>
        </a>
        <span class="name">First item in the list</span>
        <span class="scheduledTime">2014-12-01</span>
        <span class="status">pending</span>
      </li>
      <li>
        <a href="/list/2" rel="item" class="item">
          <span class="identifier">2</span>
        </a>
        <span class="name">Second item in the list</span>
        <span class="scheduledTime">2014-12-01</span>
        <span class="status">pending</span>
      </li>
      <li>
        <a href="/list/3" rel="item" class="item">
          <span class="identifier">3</span>
        </a>
        <span class="name">Third item in the list</span>
        <span class="scheduledTime">2014-12-01</span>
        <span class="status">complete</span>
      </li>
    </ul>

You can learn more about HTML from http://www.w3.org/TR/html5/

JSON-API

This hypermedia type uses JSON (JavaScript Object Notation) to define content in your representation. It has been around for a while and is currently stable at v1 at the time of this writing.

{
  "data": {
    "type": "articles",
    "id": "1",
    "attributes": {
      "title": "JSON API paints my bikeshed!",
      "body": "The shortest article. Ever.",
      "created": 1432306588,
      "updated": 1432306589
    },
    "relationships": {
      "author": {
        "data": {"id": 42, "type": "people"}
      }
    }
  },
  "included": [
    {
      "type": "people",
      "id": 42,
      "attributes": {
        "name": "John",
        "age": 80,
        "gender": "male"
      }
    }
  ]
}

You can read more on this media type from here http://jsonapi.org/

Collection+JSON

Another media type based on JSON.

Collection-JSON can be used to do much more than just describe data. It provides query templates that instruct the client on how to properly format requests.

It provides specifications for the entire conversation between server and client.

It borrows a lot of its semantics from the Atom format

*** REQUEST ***
GET /my-collection/1 HTTP/1.1
Host: www.example.org
Accept: application/vnd.collection+json

*** RESPONSE ***
200 OK HTTP/1.1
Content-Type: application/vnd.collection+json
Content-Length: xxx

{ "collection" : { "href" : "...", "items" : [ { "href" : "...", "data" : [...] } } }
// query template sample
{
  "queries" :
  [
    {
      "href" : "http://example.org/search",
      "rel" : "search",
      "prompt" : "Enter search string",
      "data" :
      [
        {"name" : "search", "value" : ""}
      ]
    }
  ]
}

You can read more about this media type here http://amundsen.com/media-types/collection/format/

Siren

Like the above two, this media type is also based on JSON.

Siren emphasizes on your data structures (they call them entities) and the relationships between them.

The media type is however not yet stable and still in active development.

{
  "class": [ "order" ],
  "properties": { 
      "orderNumber": 42, 
      "itemCount": 3,
      "status": "pending"
  },
  "entities": [
    { 
      "class": [ "items", "collection" ], 
      "rel": [ "http://x.io/rels/order-items" ], 
      "href": "http://api.x.io/orders/42/items"
    },
    {
      "class": [ "info", "customer" ],
      "rel": [ "http://x.io/rels/customer" ], 
      "properties": { 
        "customerId": "pj123",
        "name": "Peter Joseph"
      },
      "links": [
        { "rel": [ "self" ], "href": "http://api.x.io/customers/pj123" }
      ]
    }
  ],
  "actions": [
    {
      "name": "add-item",
      "title": "Add Item",
      "method": "POST",
      "href": "http://api.x.io/orders/42/items",
      "type": "application/x-www-form-urlencoded",
      "fields": [
        { "name": "orderNumber", "type": "hidden", "value": "42" },
        { "name": "productCode", "type": "text" },
        { "name": "quantity", "type": "number" }
      ]
    }
  ],
  "links": [
    { "rel": [ "self" ], "href": "http://api.x.io/orders/42" },
    { "rel": [ "previous" ], "href": "http://api.x.io/orders/41" },
    { "rel": [ "next" ], "href": "http://api.x.io/orders/43" }
  ]
}

You can contribute and learn more about this particular media type from their official repo https://github.com/kevinswiber/siren.

UBER (Uniform Basis for Exchanging Representations)

Of all the representations above, this is the most comprehensive.

UBER can be represented in both XML and JSON. It is designed to enable the developer represent transitions in their APIs.

{
  "uber" :
  {
    "version" : "1.0",
    "data" :
    [
      {
        "rel" : ["self"],
        "url" : "http://example.org/"
      },
      {
        "name" : "list",
        "label" : "ToDo List",
        "rel" : ["collection"],
        "url" : "http://example.org/list/"
      },
      {
        "name" : "search",
        "label" : "Search",
        "rel" : ["search","collection"],
        "url" : "http://example.org/search{?title}",
        "templated" : "true"
      },
      {
        "name" : "todo",
        "rel" : ["item","http://example.org/rels/todo"],
        "url" : "http://example.org/list/1",
        "data" :
        [
          {"name" : "title", "label" : "Title", "value" : "Clean house"},
          {"name" : "dueDate", "label" : "Date Due", "value" : "2014-05-01"}
        ]
      },
      {
        "name" : "todo",
        "rel" : ["item","http://example.org/rels/todo"],
        "url" : "http://example.org/list/2",
        "data" :
        [
          {"name" : "title", "label" : "Title", "value" : "Paint the fence"},
          {"name" : "dueDate", "label" : "Date Due", "value" : "2014-06-01"}
        ]
      }
    ]
  }
}
<uber version="1.0">
  <data rel="self" url="http://example.org/" />
  <data name="list" label="ToDo List" rel="collection" url="http://example.org/list/"/>
  <data name="search" label="Search" rel="search collection" url="http://example.org/search{?title}" templated="true" />
  <data name="todo" rel="item http://example.org/rels/todo" url="http://example.org/list/1">
    <data name="title" label="Title">Clean House</data>
    <data name="dueDate" label="Date Due">2014-05-01</data>
  </data>
  <data name="todo" rel="item http://example.org/rels/todo" url="http://example.org/list/2">
    <data name="title" label="Title">Paint the fence</data>
    <data name="dueDate" label="Date Due">2014-06-01</data>
  </data>
</uber>

To read more on this check out https://rawgit.com/uber-hypermedia/specification/master/uber-hypermedia.html

This are obviously not the only formats. Checkout IANA for comprehensive list of published formats

Thats it folks, you have even more interesting formats please comment below.

Don’t forget to Signup for our newsletter

Facebooktwittergoogle_plusredditpinterestlinkedinmail