Pattern
In the context of data processing and synthetic data generation, a pattern refers to a specific structure or format that data values must adhere to. Patterns can be defined using regular expressions, which are a sequence of characters that represent a specific pattern of text or data.
The "Pattern" field streamlines the process of selecting a suitable data synthesis pattern. When a column is chosen, Brewdata intelligently offers an automatic pattern suggestion based on the column's content. However, it also empowers users with a dropdown menu of pattern options, such as "Fake Address," "Street Name," "Random City," "Random State," and more.
This dual functionality ensures that the selected patterns are not only contextually aligned with the chosen column but also provides users the flexibility to make specific pattern selections according to their unique data processing requirements. These patterns serve as a crucial tool for producing organized and consistent data outputs, effectively meeting the defined data processing goals.

Pattern Options for Categorical and Numeric Semantics:
When working with Categorical Semantics (e.g., product types, customer segments), you have several pattern options to choose from:
CTGAN Neural Network: CTGAN, which stands for Conditional Tabular GAN, is a neural network-based pattern that's well-suited for generating synthetic categorical data. It ensures the preservation of categorical relationships and distributions.
TVAE Neural Network: TVAE (Tabular Variational Autoencoder) neural network pattern is ideal for categorical data synthesis. It uses the principles of variational autoencoders to generate synthetic categorical data while retaining statistical characteristics.
CopulaGAN Neural Network: CopulaGAN is another neural network-based pattern that excels in generating synthetic data with complex dependencies and correlations, making it suitable for categorical data with intricate relationships.
Transformer Neural Network: The Transformer neural network pattern is known for its adaptability and capacity to model various types of data. It's particularly effective in generating synthetic data with both categorical and numeric attributes, offering versatility.
NoGAN: In cases where you don't require any neural network-based modeling and prefer a simpler approach, you can opt for "NoGAN," which is a pattern that doesn't involve complex neural networks. It might be suitable for straightforward categorical data synthesis.

For Numeric Semantics (e.g., numerical values, quantities), the same pattern options can be employed. These patterns are designed to handle the complexities of numerical data, ensuring that the statistical properties and distributions are preserved while safeguarding the actual values.

Pattern Option for PII and PHI Redaction Semantic:
When you choose the PII and PHI Redaction Semantic, you can opt for the "Redact PII PHI" pattern. This pattern is specifically designed to ensure the secure handling of personally identifiable information (PII) and protected health information (PHI).
The "Redact PII PHI" pattern is a robust mechanism for data redaction. It replaces any PII or PHI information in the data with standardized terms such as "PII" and "PHI." This approach effectively conceals sensitive details while maintaining the structure and format of the data.
This pattern is vital in scenarios where data privacy and compliance with regulations (such as GDPR or HIPAA) are of utmost importance. It safeguards sensitive personal and health-related data, reducing the risk of privacy breaches and data exposure.

In the event that a desired pattern is not present, users can click the "Add New Pattern" button to create a new pattern with constraints.
Only selected semantics have an option for adding a Pattern
When the user clicks on "Add New Pattern," a pop-up window will appear with the option to create a new pattern. The user can select a semantic group from the dropdown menu, and then select a strategy. The system will display options for adding constraints to the allocated fields, such as gender, religion, and nativity. Finally, the user will be able to add a name for the pattern in a designated field.

Once the "Save" button is clicked, the newly created pattern will be saved and added to the dropdown list of available patterns for selection.
Last updated