dirty-word

The dirty-word parser will load data from a column in a source data table and map values to other values before producing categorical variables within a collection.

The disty-word parser will tokenize text from a column into words, removing all non alphabetical characters. The tokenized words will beome the categorical variables within the collection.

{
  "parser_type": "dirty-word",
  "table_alias": "tata",
  "column": "cccc",
  "minimum": ####,
  "collection": "collection_name"
}

minimum

usage: optional
The minimum attribute defines the minimum length of word that will be included - the default is 3 characters.