Assign semantics
Assign semantics: Assigning semantics to data involves defining the meaning or context of each data attribute or field. For instance, assigning the semantic type "date of birth" to a data attribute indicates that the attribute represents a person's date of birth. Assigning semantics to data is crucial for accurate data processing and analysis.
Once a column is selected, Brewdata automates the assignment of semantics and patterns to streamline the data processing workflow. For instance, if you choose a specific column, it will automatically assign appropriate semantics and patterns based on the content of that column. This automation ensures that the synthesized data aligns with the intended interpretation and structure of the original data, enhancing efficiency and accuracy in the data processing process.
The "Assign Semantics" field is preconfigured based on the selected column, but it also offers a dropdown menu with various options (e.g., Date, Address, Phone Number, etc.) that can be manually selected to further refine the meaning and context assigned to the data values in the chosen table columns. This flexibility allows users to either accept the automated assignment or make specific semantic selections according to their data processing needs.

If "Address" is selected in the Assign Semantics field, the user can further customize the options related to address synthesis by selecting the "Address" option in the Synthesis Strategy field.
If the "Address" column is chosen from the table, then the "Address" option should only be selected in the "Assign Semantics" dropdown menu. Choosing any option other than "Address" in the "Assign Semantics" dropdown menu may result in incorrect data synthesis.
Some of the semantics are essential for synthesizing data, especially when dealing with sensitive or confidential information. So, we have provided with some semantics which provide safeguard privacy and data security.
Numeric: Numeric semantics are crucial when you need to generate numerical data, such as values, quantities, or measurements. This is common in scenarios where you want to preserve the statistical properties and distribution of the data while protecting the actual values.
Categorical: Categorical semantics are used when you have data with discrete categories, such as product types, customer segments, or any non-numeric labels. Synthesizing categorical data ensures that the relationships and proportions between categories are maintained while hiding the actual category labels.
PII and PHI Redaction: In data synthesis, PII (Personally Identifiable Information) and PHI (Protected Health Information) redaction are fundamental semantics. They involve the careful removal or replacement of sensitive personal and healthcare data, such as names, addresses, social security numbers, and medical records. Redaction ensures data privacy and security while preserving the overall structure and utility of the information. It's especially crucial in scenarios where confidentiality is paramount, such as in healthcare, financial services, or legal contexts.

Once the semantics have been selected, a pattern can be chosen based on the identified semantics.
Last updated