---
title: "dirty-word"
slug: "dirty-word-parser"
updated: 2022-07-27T23:52:44Z
published: 2022-07-27T23:52:44Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://code.tag.bio/llms.txt
> Use this file to discover all available pages before exploring further.

# dirty-word

The `dirty-word` **parser** will load data from a column in a source data table and map values to other values before producing categorical **variables** within a **collection**.

The `disty-word` **parser** will tokenize text from a column into words, removing all non alphabetical characters. The tokenized words will beome the categorical **variables** within the **collection**.

```
{
  "parser_type": "dirty-word",
  "table_alias": "tata",
  "column": "cccc",
  "minimum": ####,
  "collection": "collection_name"
}
```

## minimum

**usage: *optional*** The `minimum` attribute defines the minimum length of word that will be included - the default is 3 characters.
