In part one I examined the xbrl specification in a general way with a view to understanding its constituent parts. In this blogpost I take a deeper dive, looking at how one might parse a specific taxonomy.

Parsing the UK GAAP DTS

The entrance schema file to a DTS is generally a combination of the name of the DTS and the world ‘main’, e.g. uk-gaap-main-2009-09-01.xsd

Each schema imports many other schemas and linkbases which together make up the complete DTS.

Schema-location attribute urls that are not relative will need to be replaced to reference local files as the remote location may not always exist and file size, bandwidth and DTS complexity make downloading from remote sources infeasible, especially when validating an xbrl instance against the overall schema.

Newer taxonomies may include a catalog.xml or taxonomyPackage.xml that lists entrypoints and data to help with rewriting location urls to local paths.

Parsing the concepts

Financial concepts like "Gross Profit/(Loss)" are defined in the schema files (.xsd) using element nodes which may be single items or tuples. A tuple (in the context of XBRL) is an element that has many other child elements.

Only elements that have an abstract attribute set to false and a substitution group of xbrli:item can be used to tag items in an xbrl instance.

Abstract true items tend to be headers and other presentational information while domainItem and hyperubeItem items are used to organize dimensions.

Elements have a guid and therefore can be parsed and stored on the assumption of uniqueness. The id is reference in locators in each of the various linkbases to place each concept in all its applicable contexts, for example uk-bus_NameEntityOfficer

Presentation network (Hierarchical representation)

The concepts of presentation in Xbrl terms is a formula for representing financial concepts (e.g. “Sales") in a tree format that is similar to actual format of a set of financial statements, albeit one with every possible variation included.

Financial concepts themselves are unique and are defined as elements in the schemas of the DTS, but they can be repeated at different points in the presentation tree.

The “Operating Profit” concept for example, occurs in the following locations:

  • Profit and Loss
    • profit and loss account heading
  • Notes and Detailed Disclosures
    • Profits from joint venture
    • Profit grouped by business segment
    • Profit grouped by geographic location
    • Notes on cashflow statement.

Parsing:

Presentation linkbases are made up of presentationLinks, presentationArcs and locs. They have one roleType (network) called ‘parent-child’.

For each linkbase you should parse the extended links individually, as loc elements are unique within each extended link but would otherwise collide if parsed together. Walk the tree from the root node adding branches and nodes as appropriate.

The root node (or nodes) are locators that do not have a “from” arc referencing their label.

Leafs nodes are locators that do not have a “to” arc referencing their label.

There are a few gotchas to watch out for namely handling extenstions and nodes with multiple parents.

For a full example of how to parse presentation linkbases refer to this module.

Definition networks (Dimensional representation)

While presentational representation gives us a basic structure with which to label rows of data, Financial information is multidimensional (i.e. it has many possible columns for each row of information).

The definition linkbases and their associated networks are used to label items according to their dimension. Think of the columns of a report:

  • Current year
  • Prior year
  • Motor vehicles
  • Land and Buildings
  • Ordinary Share Class A

...etc.

Definition linkbases are made up of definitionLinks, definitionArcs and locs.

Locs are unique, scoped to each extended xlink (definitionLink element).

Definition arcs have many roleTypes (networks). They are named as follows:

  • All;
  • Dimension-default;
  • Dimension-domain;
  • Domain-member;
  • Essence-alias;
  • Hypercube-dimension.

Although each of these networks can be represented as trees as in the example of the ‘parent-child’ network, there is no practical use for doing so.

In a real world example, the creator of an xbrl instance will use dimensions in the context of choosing column label/s for a particular row, in a valid context.

The treeview such that it is, is only really useful when thinking about the tagged concept as the root. "Tangible Fixed Assets at cost" would be the root of a dimensional tree of Asset Classes (amongst other things) which might lead to the "Motor Vehicles" domain member. Getting from A to B in this fashion encompases the traversal of multiple networks unlike the presentation context which required the use of only one - 'parent-child'.

Connecting a financial concept like “Tangible Fixed Assets at cost” with all its possible valid dimensions it not straightforward.

Many valid dimensions can be added to a single financial concept at one time (for granularity and multidimensional analysis) but they may not be mutually valid.

As a general rule all dimensions domain members with the same parent (hypercube) are mutually valid.

Parsing part I - going up:

Starting with a concept like "uk-bus_NameEntityOfficer" find each node with that locator in the ‘domain-member’ network and recursively walk up the tree until the root node/s are reached (i.e. there is no domain-member arc with a from attribute referencing the current node’s label). This is a primary or grouping item.

There may be multiple matches for a locator label in the network and hence multiple primary items, e.g. “Finance income’ has two primary items i.e. ‘items inheriting all income data dimensions’ and ‘items inheriting operating activities dimensions’.

If multiple primary items exist in the ‘domain-member’ network graph, they each have one parent hypercube, we'll find them next.

Parsing part II - going down:

Find the hypercubes that are related to the primary items.
Hypercubes are collections of dimensions. Starting with the primary item walk down the “All” network tree to find its hypercubes.

Each hypercube has many dimensions. Starting with each hypercube parse the “hypercube-dimension” network.

Repeat the process to find each dimension’s domains (“dimension-domain” network) and finally each domain’s member (“domain-member” network).

The domain-member is essentially the column label for the concept which is the row label.

Remember the relationships between these networks aren't flat has many associations but rather linked lists. So make sure you walk the whole tree and collect all applicable members.

Use of domain members as column labels

Labels are not applied directly to columns as they might be in a report.

Rather they are defined separately as contexts which have an id linking them to row items.

Contexts require dimension domain and member pairs to be defined or else default dimensions are inferred (see next section on validity).

<!-- example of context for NameEntityOfficer -->
<xbrli:context id="instant-2016-10-31-uk-bus_Director1">
  <xbrli:entity>
    <xbrli:identifier>XXXXX</xbrli:identifier>
     <xbrli:segment>
      <xbrldi:explicitMember dimension="uk-bus:EntityOfficersDimension">
      uk-bus:Director1
      </xbrldi:explicitMember>
     </xbrli:segment>
  </xbrli:entity>
   <xbrli:period>
      <xbrli:instant>2016-10-31</xbrli:instant>
   </xbrli:period>

Validity of Dimensions:

A dimension domain member that is a descendent of a primary item, is valid for any of the financial concepts that are children of that primary item.

Multiple dimension domain members are valid in a single context as long as they share a common hypercube ancestor.

Financial concepts that have hypercubes via the all network must include every child dimension in its context to be valid. Most dimensions have default members however which are inferred in lieu of an explicit choice by the Xrbl instance preparer.

The 'All' network:

All Hypercubes in the UK and Ireland taxonomies are closed and related by an all relationship.

This means each fact must have a dimension domain member (or a default) defined for each dimension belonging to the hypercube and must not have any dimensions domain members outside of that hypercube in order to be valid.

In practice most dimension domains will have a default member so the end user will not have to define them unless they choose to do so.

Code

For a full example of how to parse presentation linkbases refer to this module and for test cases look here.

Labels and References

Labels and reference linkbases comprise of extended links that define straightforward has many associations between the elements.

A concept id will be have a locator that will be joined to many labels via a label arc. Labels have various roles including documentation and the main concept label and various languages defined by the lang attribute.

Reference linkbases are parsed similarly to labels, with the exception that reference elements may be complex, having a varying number of child elements such as name, number, paragraph, sub-paragraph etc.

Calculations

Calculations are not used in the Irish and UK taxonomies and therefore outside the scope of this document.

Sample code

Now you know in theory how to parse and validate XBRL time to dive into some code!

Below you will find links example applications that form the basis of a complete XBRL stack in the context of Irish / UK GAAP.

Taxonomy Parser
Tag picker
IXBRL document validation

For a complete solution that also incorporates US XBRL standards and presentation of XBRL instances check out Arelle.

Other reading

http://docstore.mik.ua/orelly/xml/xmlnut/ch10_04.htm
https://en.wikipedia.org/wiki/XBRL#Label_Linkbase
https://en.wikipedia.org/wiki/XML_Schema_(W3C)
https://www.corefiling.com/publications/documents/XMLflattened.pdf
http://reinout.vanrees.org/afstudeerverslag/x1407.html
http://www.xml.com/pub/a/2003/11/12/schematron.html