Config Layer
  • 27 Jul 2022
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Config Layer

  • Dark
    Light
  • PDF

Article summary

Parsers are the workhorses of the data configuration layer.

A typical Data Product will employ between 10 and 100 parsers to design & generate the collections and variables within its data model.

In order to function, parsers must be registered either from the parsers attribute in the config file or from the parsers attribute of a table object.

Each parser is affiliated with a single table object, and will only operate on the source data defined by that table object.

naming convention

All parsers for a given table object are typically combined into single text file, formatted as a JSON array. The file is conventionally named mytablealias_parsers.json, where you replace mytablealias with the value of the table_alias attribute for the table object.

Each parsers array file is conventionally located in the Data Product directory path config/parsers/mytablealias_parsers.json.

a few more details

During the Tag.bio data ingestion process, each table object will be initialized and then proceed to load all rows from its source data table, one row at a time. The parsers assigned to each table object will capture values from some of those rows and columns and convert them into collections and variables in the Data Product.

Parsers, as peer elments within a JSON array, are designed to be completely indpendent from each other - i.e. none of the parsers listed in the parsers array for a table object will cross-communicate. With this design, parsers can be added, removed, or updated without impacting the behavior of any peer parsers. This pattern also enables multithread parallelization of data parsing over both rows and columns.

In contrast to above, indiviual parsers are allowed to - and often expected to - utilize inner parsers nested within the bounds of their own JSON specification. As each row is processed from source data, the inner parsers will pipe their output to provide specific attribute values to their parent parser. This pattern enables a wide variety of conditional behavior and useful data transformations with multiple parsers combining to form a complex data loading function.

variety

There are around 40 different types of parser available within the Tag.bio system, although two specific types - categorical and numeric - comprise about 80% of the parsers utilized across all Data Products to date.

Some parsers will only load small amounts of data from the source table compared to other parsers. For example, the categorical and numeric parser types typically load data from only one column of a source data table, while the numeric-matrix parser type can easily be configured to ingest all rows and columns into a single numeric-matrix variable.


Was this article helpful?

What's Next