Artificial Intelligence (AI) Machine learning (ML) has the potential to revolutionize the way we solve complex problems and make decisions in various domains, from healthcare and finance to social media and autonomous vehicles. However, as with any technology, ML is not immune to bias and unfairness, which can lead to serious consequences for individuals and communities, particularly those already marginalized or vulnerable.
In its simplest terms, bias is when the model consistently predicts distorted results because of incorrect assumptions. When we train our model on a training set and evaluate it on a training set, a biased model produces significant losses or errors. It can be introduced at different stages of the machine learning process, from data collection and preprocessing to model training and evaluation.
For example, let's consider the case of Amazon's recruitment tool, which used machine learning to sort through job applications and identify the most promising candidates. The tool was trained on a dataset of resumes submitted to Amazon over a 10-year period, which predominantly featured male candidates. As a result, the tool learned to prioritize resumes that contained terms or phrases that were more commonly used by men, such as "executed" or "captured". Any ML model is trained to be a generalized query solver. But because of the high male population in the I.T. industry as well as the keywords used in their resume, the model was misguided and led to believe that if the gender is Male, the resume is more likely to be selected - purely because of data imbalance and keyword association. This led to the tool discriminating against female candidates, as their resumes were less likely to contain these terms and, therefore, less likely to be selected.
Types of bias in Machine learning:
- Sampling bias: Refers to a situation in which the data used to train a machine learning model is not representative of the population or the problem being studied. This can happen when the sample data used is not randomly selected or when certain groups are overrepresented or underrepresented in the sample.
- Selection bias: Refers to a situation in which the data used to train a machine learning model is systematically biased due to the way in which the data was collected or sampled. This can happen when certain samples are selected based on preconceived notions or when data is collected in a non-random way.
- Measurement bias: Refers to a situation in which how the data is collected or measured leads to errors or inaccuracies in the data. This can happen when the measurement tools or techniques used are flawed or when the data is subject to human error. Measurement bias can also be caused by differences in measurement techniques or tools used across different samples or groups.
Relevance of bias
Bias in machine learning can have significant real-world consequences, particularly when it affects decisions that have important implications for people's lives, such as hiring, lending, or healthcare. There are several reasons why bias in machine learning is a critical issue:
- Model accuracy: When bias occurs, errors are magnified in the final analytical results rendering the ML model inappropriate and ineffective. Inappropriate predictions or skewed outcomes can have serious consequences, especially in domains like healthcare and finance.
- Legal issues: Unfair or incorrect treatment of certain groups of people based on their personal characteristics or historical patterns can lead to organizations facing legal challenges since they can be held accountable for discriminatory outcomes. This can also significantly impact the organizations reputation.
- Societal impact: When some aspects of datasets with equal significance are given more weight or representation than others, it can significantly impact people’s lives. Biased models may perpetuate unfair practices and impact loan approvals, job applications, credit card applications, etc.
Therefore, it's essential to be aware of the potential sources and manifestations of bias in machine learning and to use appropriate techniques to detect and mitigate it.
Regulatory purview around bias
The regulatory impact of bias in machine learning is growing, with regulatory bodies increasingly focusing on ensuring that the models are ethical, accurate, and unbiased.
When Apple launched the credit cards with Morgan Stanley in 2019, it was observed that there was bias (Vigdor 2019) in underwriting and preferring males compared to other genders. This was quickly highlighted and resulted in a PR mess. It was later investigated by New York's Department of Financial Services (BBC News 2019).
The AI-use case most prevalently regulated in the US was AI in recruitment or employment. For example, New York joined several states, including Illinois and Maryland, to regulate automated employment decision tools (AEDTs) that leverage AI to make or substantially assist candidate screening or employment decisions.
Local Law 144 of 2021 prohibits employers and employment agencies from using an automated employment decision tool unless the tool has been subject to a bias audit within one year of the use of the tool, information about the bias audit is publicly available, and certain notices have been provided to employees or job candidates.
Employers must make a decision by July 5, 2023 to determine whether they use an AEDT to make employment decisions. If they choose to use it, they must commission an independent bias audit, publish a summary of such audit report, provide notice to applicants and employees of the tool’s use and functioning; and notify the affected individuals that they may request an alternative selection process or accommodation.
Similarly, in 2021 the Equal Opportunity Employment Commission (EEOC) launched an initiative on “algorithmic fairness” in employment. As an initial measure in this initiative, the EEOC and the Department of Justice jointly issued guidance on using AI tools in employee hiring. The initiative aims to guide employers, employees, job applicants, and vendors to ensure that these technologies are used fairly and consistently with federal equal employment opportunity laws.
This serves as a reminder for companies to remain responsible for hiring decisions made by the AI they use.
In India, RBI issued recommendations for digital lending to support efficient and scalable underwriting in lending. It talks about the critical components required to use algorithmic underwriting: A snippet (Para 4.4.2.3 of the WGDL Report) of the Recommendations accepted for Immediate Implementation states:
i. REs to ensure that the algorithm used for underwriting is based on extensive, accurate and diverse data to rule out any prejudices. Further algorithm should be auditable to point out minimum underwriting standards and potential discrimination factors used in determining credit availability and pricing.
ii. Digital lenders should adopt ethical AI which focuses on protecting customer interest, promotes transparency, inclusion, impartiality, responsibility, reliability, security and privacy.
The EU Agency for Fundamental Rights (FRA), in December 2022, published a report, ‘Bias in algorithms – Artificial intelligence and discrimination’. Given the findings, it is necessary to conduct a comprehensive assessment of algorithms. FRA called on the EU institutions and EU countries to:
- Test the algorithms for bias
- Provide guidance on sensitive data to assess potential discrimination
- Assess ethnic and gender biases
- Consider all grounds of discrimination biases
- Strive for more language diversity, and
- Increase access for evidence-based oversight
As we navigate the increasingly pervasive role of artificial intelligence, bias is a critical issue that demands our attention. A biased model can perpetuate discrimination, reinforce inequalities, and have profound implications for individuals and communities.
Tackling bias effectively requires active collaboration among policymakers, technology developers, researchers, and civil society to continuously refine and enhance regulations and standards. A proactive approach from organizations and individuals to ensure responsible data collection, robust algorithmic design, and ongoing monitoring of AI systems for bias is also necessary.
Related articles
AI Broadcast Series #3: IP & copyright challenges for ‘AI’ solutions and the future of ‘AI’ regulations
See how Arya helps scale
AI in your organization
Learn how to strategise and deploy AI, explore relevant use cases for your team, and get pricing information for Arya.ai products.