2. Assign Semantics And Synthesis Strategy

Step 2

The "Assign Semantics and Synthesis Strategy" refers to the process of determining the approach that will be utilized to generate synthetic data for the selected database tables. This involves selecting a suitable data synthesis algorithm or method that will be applied to create new data while preserving the statistical properties and patterns of the original data. The strategy may vary based on the type of data being synthesized and the intended use of the generated data.

All the Available tables are displayed on the left pane of the page.

Kindly choose the tables located on the left-hand side to designate a synthesis approach.

Available tables will be displayed based on the Database and Locale we selected for the Job.

After clicking on a table from the list it searches for all the PII (personal identification information) of that particular table and displays all the columns present in the table.

Only the columns with possible personal details will be shown here.

Once a column has been selected, the following fields will be available to be filled out.

  • Tokenize

  • Assign semantics

  • Pattern

  • Dependent field

Tokenize: This field specifies whether or not the data values in the selected table columns should be tokenized. Tokenization is the process of breaking down a piece of text or data into smaller units called tokens. Each token represents a meaningful unit of the data, such as a word, number, or symbol. Tokenization involves splitting a data value into smaller units or tokens, which can then be manipulated and synthesized separately.

Assign semantics: In this field, you can define the intended meaning or context for the data values within selected table columns. Brewdata automatically assigns semantics based on the content of the chosen column, ensuring that the generated synthetic data aligns with the original data's purpose. For instance, if you select a "Name" column, Brewdata will automatically set the semantic type to "Name," maintaining the data's intended significance.

Pattern: The Pattern field governs the structural format of data values situated within selected table columns. Here, Brewdata intelligently identifies the appropriate pattern based on the data content. For example, when a "Name" column is chosen, Brewdata automatically configures a pattern for generating randomized person names, ensuring that the synthesized data consistently mirrors the desired structure and format. This automated process optimizes the data processing workflow, promoting both efficiency and precision.

Dependent field: This field allows the user to select another field in the same table that is dependent on the current field. This relationship can be used to generate synthetic data that maintains the correlation between related fields. A dependent field is a data attribute or field that is dependent on another field in the data. For instance, a dependent field might be a person's age, which is dependent on their date of birth. The dependent field cannot be generated without first generating the field it depends on.

Last updated