Tokenize
Tokenize: This field specifies whether or not the data values in the selected table columns should be tokenized. Tokenization involves splitting a data value into smaller units or tokens, which can then be manipulated and synthesized separately.
Tokenize enables the modification of data patterns for different users based on their specific requirements.
Tokenize has the option to select YES or NO
By default tokenize NO is selected.

Once a related pattern is selected from the dropdown and tokenize is set to yes, the following options are displayed: field length (fixed or variable), sample, and columnlet to add semantic data.
If Tokenize YES is selected then the following options are newly added to the form.

Pattern: The options in the pattern will be related to the selected column.
Field length: Field length can be selected from Fixed or Variable.
Fixed selection
In Fixed, Field length Sample Data character length cannot be varied. Character length is fixed.

The "Sample" field shows the user a fixed-length string of characters that will be used in the pattern for synthesizing the data.
As an example, in the image above, the sample field has 10 characters, where the first 4 characters represent the year, the next 2 characters represent the month, and the last 2 characters represent the date with spaces in between.
To change the year constraint in the sample field, the user must select the position range 1 to 4 and assign semantics and pattern to that range.

To change the month constraint in the sample field, the user must select the position range 6 to 7 and assign semantics and pattern to that range.

After assigning semantics and adding patterns to all the fields, the "Test" button is used to preview the synthesized data based on the assigned patterns and constraints. The preview includes both the before and after synthesized data for comparison.

The test button is used to test the whole column selected and displays the before and after results of the synthesized data.
Variable selection
If Variable is selected then sample data character length can be chosen according to need.

Delimiter
In variable sampling, the delimiter is used to separate the different parts of the generated data. For example, if you are generating a Date with a variable length, you can use a delimiter such as a hyphen (-) to separate the Year, month, and date. This makes the data easier to read and understand, and can also be useful for importing the data into other systems that require a specific format.
The delimiter can be anything just to separate the characters with a hyphen, comma, space or full stop.
The columnlet are displayed based on the delimiter provided.

The Columnlet section provides separate fields to enter the name of the columnlet, assign semantics and pattern, and dependent fields for each individual columnlet. After assigning a name to the column, the user needs to select the appropriate semantics for that column. Once the semantics have been selected, the user can then add a pattern to that column based on the selected semantics. This helps to ensure that the data in that column is properly formatted and structured for easier analysis and manipulation. In the example provided, the user has selected the "Date" semantics for a particular column. This column will likely contain date values, such as dates of transactions or events. Once the "Date" semantics have been selected, the user can then choose from a variety of dropdown date format options to apply as a pattern to that column. This helps to ensure that all dates in the column are formatted consistently and can be properly analyzed and sorted. Examples of common date formats include "MM/DD/YYYY," "DD/MM/YYYY," and "YYYY-MM-DD."

After all the names and constraints are provided, Test button 1 is used to test the individual columnlet synthesization, while Test button 2 is used to test the whole column selected and displays the before and after results of the synthesized data.
Last updated