Bias detection and removal
In the ever-changing world of data synthesis, one crucial aspect that stands out is the integration of strong mechanisms to detect and eliminate bias. This chapter unravels a scenario that explains the steps taken to address bias within our application, using the adult_bias_train dataset.
1. Dataset Selection and Column Configuration:
The process begins with the selection of the adult_bias_train dataset, the original dataset without bias handling. Users are then prompted to choose the columns for synthesis. For columns deemed sensitive in terms of bias, the recommendation is to set them as either CTGAN or Copula for subsequent unbiased analysis and handling. Additionally, if a 'target' column exists, it should be selected as CTGAN or Copula.
2. Targeted Columns and Model Selection:
In the context of a specific dataset, illustrated here with the example of the adult_bias_train dataset, special attention is given to specific columns that might contain sensitive attributes. The goal is to conduct a thorough and detailed synthesis process, particularly in addressing attributes prone to bias, such as race, sex, age, and income—where income is specifically marked as the target for the synthesis model. These columns are crucial for model training and analysis, ensuring a comprehensive assessment of potential biases.
3. Bias Check and Correction:
In the third step, users are prompted to check for bias, with income selected as the target. The system generates insightful graphs for a visual representation of any observed bias. Following this assessment, users are empowered to correct bias by selecting the decision attribute value (e.g., >=50k) and specifying the sensitive attribute (e.g., sex) along with its corresponding value (e.g., female).
Upon reviewing the graphs, users have the option to click on "Correct Bias," initiating a user-friendly interface for precise intervention.
3.1 Correcting Bias:
Upon selecting "Correct Bias," users can specify the sensitive attribute that exhibits bias. The system prompts users to identify the underprivileged value associated with the sensitive attribute and the desired value that they aim to either increase or decrease.
3.2 Underprivileged Value and Desired Value:
Underprivileged Value: This represents the existing value of the sensitive attribute that is deemed underprivileged or subject to bias.
Desired Value: Users can then articulate the desired state, either to lessen or augment the bias. For instance, specifying that the underprivileged value should be less than or equal to the desired value.
3.3 Initiating Correction:
Upon confirming the correction parameters, users proceed to execute the correction. This dynamic correction process enables users to actively address biases in real-time, ensuring that the synthesized data aligns with fairness and ethical considerations.
4. Decision Attribute and Sensitivity Selection:
A crucial aspect of bias handling involves specifying the decision attribute value (e.g., >=50k) and designating the sensitive attribute (e.g., sex), with the recommendation being the value of female.
5. Saving and Job Submission:
Finally, users save their configurations and submit the job for bias detection and removal.
Usage Scenario:
Consider a scenario where a financial institution aims to synthesize data for income prediction. Recognizing the potential biases associated with attributes such as race, sex, age, and income, the organization utilizes our application.
Dataset Configuration:
The financial institution selects the original dataset, adult_bias_train.
Columns related to race, sex, age, and income are chosen for synthesis.
Model Training:
The chosen columns are set as CTGAN or Copula for unbiased analysis.
The target column, income, is designated as CTGAN or Copula.
Bias Detection and Correction:
The system performs a bias check, focusing on income as the target.
Visual graphs are presented for users to identify and understand biases.
Users correct bias by specifying decision attributes and sensitive attributes, fostering fair data representation.
Decision Attribute and Sensitivity Configuration:
The decision attribute value is set, for instance, as >=50k.
The sensitive attribute, sex, is designated with the recommended value being female.
Saving and Job Submission:
Configurations are saved, and the job is submitted for bias detection and removal.
In this scenario, the financial institution ensures that the synthesized data for income prediction is devoid of biases related to gender. This not only aligns with ethical data practices but also facilitates fair and unbiased decision-making processes. The robustness of bias detection and removal in Brewdata application empowers organizations to synthesize data that upholds principles of fairness and equality.
Last updated