- Print
- DarkLight
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
The dirty-word
parser will load data from a column in a source data table and map values to other values before producing categorical variables within a collection.
The disty-word
parser will tokenize text from a column into words, removing all non alphabetical characters. The tokenized words will beome the categorical variables within the collection.
{
"parser_type": "dirty-word",
"table_alias": "tata",
"column": "cccc",
"minimum": ####,
"collection": "collection_name"
}
minimum
usage: optional
The minimum
attribute defines the minimum length of word that will be included - the default is 3 characters.
Was this article helpful?