Yet another article on YAML
How to write configuration files
By Martin Helm in Data Science ML Tools
October 28, 2021
If you have ever written a configuration file, it was probably in the YAML format. Nowadays, it is very commonly used for this task, but you can also use it as a format to store actual data. This can also be seen from the meaning of the abbreviation. Originally, YAML stood for “Yet Another Markup Language”, while in the meantime it is representing “YAML Ain’t Markup Language”. All the more reasons to look at what we can do with it!
Usually, I like to start with an example, showing all the possibilities of the format. But because YAML can do so much this would be very long, so I put it at the end this time. I will first go over some general syntax here, introducing the different structures. You can then find all the details in the example. Basically a YAML file describing itself!
General syntax
Each document start with ---
That way one can also store multiple documents in one file. If you want to store multiple documents in one file, you can simply separate them from each other by starting each with ---
. If you only have one document in a file, you would actually not necessarily need the — at the start and it would be an implicit document. But it is good practice to include them, and you also need them in case you have If you want to be very explicit you can also end each document with the end of document marker ...
but usually you dont have to.
Commenting can be done easily using #
. The comments can either span an entire line, or they can be fit after the key-value pair or.
Structures
YAML supports three basic structures, or nodes how they call it. Every node consists of a key-value pair that is separated by a :
. They keys can have white spaces in them, just remember that it might be difficult for the programming language to deal with this, therefore one usually avoids them. The white space around the colon is ignored, so it should not be used to convey information.
Scalars
Scalars are the simplest structure. They are a simple key-value pair without any further nesting. There is a large number of different scalars covering basically everything that one would expect, numerics, strings, logicals and null values. Have a look at the example YAML file further down, which describes them all!
Sequences
Sequences are ordered collections of items. As with JSON, you can either write them inline using []
or spread them over multiple lines using indentation and -
. Keep in mind that the values do not need to be of the same type
---
key: [value1, 2, .NaN]
another key:
- value1
- 2
- .NaN
This brings us to indentation, which can be any number of white spaces, but no tabs. There is no clear convention for the number of white spaces, typically 2 or 4 are used.
Sequences can also be nested:
---
World:
- Europe:
- UK
- Germany
- France
- North America:
- USA
- Canada
Maps
Finally, maps are a collection of key-value pairs. You can construct them inline using {} or again spread over multiple lines with indentation. They behave basically the same way as sequences, just that every value has an additional key. And of course they can be nested with other maps or sequences:
---
Company: {name: EvilOrg, Employees: 254}
Team:
Name: Superteam
Members:
- Mark:
age: 23
haircolor: blonde
- Jack:
age: 25
haircolor: brown
As with JSON, everything in a YAML file needs to be part of at least one map. That means one could not have a single sequence without a key. In reality this is pretty intuitive, just remember that the following would not be a valid YAML file, as it lacks a key:
---
- something
- in a
- sequence
Full example
# Comments start with a hashtag and can be before the actual document
---
# --- denotes the start of a document
key: value
strings: strings can be outside quotes
strings2: "strings can also be in quotes"
strings3: 'or in single quotes. See below how they differ'
integers: 3
decimal: +3
octal: 0o12 # Octals always start with a zero
hex: 0xC # Start with 0x
boolean: true #Previous YAML versions also supported On/Off, but they are no longer valid in the current version.
float: 3.14
exponential: 0.0314e+2
infinity: .inf #Capitalization does not matter, so inf, Inf and INF are all valid.
negative infinity: -.inf
keys can have whitespaces: true
not available: .NaN
not defined: .null
indentation: "matters. Use any number of white spaces but no tabs!"
sequences:
- can contain different types
- 1
- true
inline sequences: [Sequences, defined, inline]
maps:
types: maps can contain different types of data
a number: 5
nesting:
possible: true
also with other types:
- another
- sequence
inline maps: {key: value, one: twp}
complex_strings: "a deer\n" # \n will be converted to newline
complex_strings1: 'a deer\n' # \n will be interpreted as part of the string!
complex_strings2 : a deer\n # \n will be interpreted as part of the string!
modifiers: "> and | modify how a multiline string gets interpreted"
folded style: >
a multiline
string intrepreted
as one line
literal style |
A multiline string
interpreted as
a multiline string
chomp modifiers:
general: "+ and - modify how white spaces and the final linefeed are preserved. Use with | or <"
+: Multiline strings preserve trailing white spaces and the final linefeed
-: Multiline strings are stripped of trailing white spaces and the final linefeed
# ... denotes end of document
...
Comparison to JSON
YAML is a strict superset of JSON. That means you can do everything you can do with JSON and much more. Also, every valid JSON document should also be a valid YAML document. Let’s review the main differences between YAML and JSON:
JSON | YAML |
---|---|
Comments are not allowed | Comments are denoted with # |
Objects and arrays are denoted in curly braces and brackets respectively | Hierarchy is denoted using doble space characters (indentation). Tab characters are not allowed |
Strings must be in double quotes | String quotes are optional, and support double and single quotes. |
Root node must either be an array or an object | Root node can be any of the valid data types |
Summary
As we can see from the example, YAML is truly a very flexible markup language where you can incorporate basically any kind of data or information you can think about. But this flexibility also comes with complexity, which is especially tricky for strings.
Since YAML is usually used for config files, where one does not have complicated string manipulation, this is usually not an issue though. Most data is still transferred via JSON.
In case you still run into one the peculiarities of YAML, check out the resources below to help you!
Resources
Photo by Ferenc Almasi on Unsplash