dirty-word
  • 27 Jul 2022
  • 1 Minute to read
  • Contributors
  • Dark
    Light
  • PDF

dirty-word

  • Dark
    Light
  • PDF

Article summary

The dirty-word parser will load data from a column in a source data table and map values to other values before producing categorical variables within a collection.

The disty-word parser will tokenize text from a column into words, removing all non alphabetical characters. The tokenized words will beome the categorical variables within the collection.

{
  "parser_type": "dirty-word",
  "table_alias": "tata",
  "column": "cccc",
  "minimum": ####,
  "collection": "collection_name"
}


minimum

usage: optional
The minimum attribute defines the minimum length of word that will be included - the default is 3 characters.


Was this article helpful?

What's Next