Syntax rules for data formats

In the design of data format to be used in your API, you will definitely come across the problem on how to structure the syntax.

Thankfully, the unix shell provides us with some battle tested guidelines that we can use:

binary

binary

Provide support for comments

Invariably you will want a way to provide comments to the data that are not executable. In such a case simply use # to lead any section that is meant to be a comment.

Ignore whitespace

Now, I am a big fan of the python language, however, your data format is not where you want whitespace to matter. Spaces and tabs should be taken to mean a single space no matter the repetition or position.

This is because data structures are meant to be seen by human eyes and humans just aren’t that good at distinguishing spaces.

A corollary is that multiple blank lines should also be treated as one.

Quotations

In shell and some programming languages single and double quotes mean different things. In as much as possible avoid replicating this. Whatever you choose the quotes to mean let it be the same for both single and double.

Special characters

Support special and unprintable characters using the common \. This prevents any kind of suprised in your code. Most consumers of your data will for example take for granted that \n means newline and \t means new tab.

Keep it simple

We have already talked about the KISS principle before. It also applies in this case. Complex lexical rules violate the principle wholesale and should be avoided at all cost.

If you already use a standard data format such as JSON or XML then you may have noticed they tend to respect the general guidelines provided above.

Have you ever used a custom data format in your code before? Lets keep the conversation going.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Published by

jchencha

Software Project Manager