(Originally published in ABA Risk and Compliance, July/August 2023)
Bill Fair and Earl Isaac created the first modern credit scoring system, the Credit Application Scoring Algorithms, in 1958, and the first FICO score was introduced in 1989. Since then, the use of credit algorithms has become ubiquitous in the lending industry. These algorithms use statistical models to predict the likelihood of a borrower defaulting on a loan, which then informs the lender’s decision to approve or deny the loan application. While credit algorithms offer potential advantages, such as improving consistency of applicant treatment and increasing access to credit, they also present significant potential fair lending risks, including continuation or acceleration of bias against protected classes of borrowers and rapid propagation of risks to broad consumer credit markets. Concerns about algorithmic bias and fair lending are being renewed by the growth of artificial intelligence (“AI”) and machine learning (“ML”) models and the use of alternative data.
These concerns were highlighted by a recent Joint Statement on Enforcement Efforts against Discrimination and Bias in Automated Systems (“Joint Statement”)[i] issued by the Consumer Financial Protection Bureau (“CFPB”), the Department of Justice (“DOJ”), the Equal Employment Opportunity Commission (“EEOC”), and the Federal Trade Commission (“FTC”). Referring to automated or AI systems, the Joint Statement noted that “Private and public entities use these systems to make critical decisions that impact individuals’ rights and opportunities, including fair and equal access to a job, housing, credit opportunities, and other goods and services. . . Although many of these tools offer the promise of advancement, their use also has the potential to perpetuate unlawful bias, automate unlawful discrimination, and produce other harmful outcomes.”
Compliant Algorithms
The CFPB has provided guidance on complying with the Equal Credit Opportunity Act (“ECOA”) and Regulation B, its implementing regulation, when using complex algorithms in credit processes. For example, Consumer Financial Protection Circular 2022-03[ii] concludes that adverse action notices (“AAN”) for credit decisions based on complex algorithms must include statements of specific reasons a credit application was denied. As the Official Interpretations of Regulation B[iii] discuss:
“If a creditor bases the denial or other adverse action on a credit scoring system, the reasons disclosed must relate only to those factors actually scored in the system. Moreover, no factor that was a principal reason for adverse action may be excluded from disclosure. The creditor must disclose the actual reasons for denial (for example, “age of automobile”) even if the relationship of that factor to predicting creditworthiness may not be clear to the applicant.”[iv]
Regulation B also mandates requirements for a credit scoring system to be deemed an “empirically derived, demonstrably and statistically sound, credit scoring system.” These requirements include:
- Training data must be based on creditworthy and non-creditworthy applicants who applied during a reasonable preceding period of time;
- The algorithm was developed for the purpose of evaluating the creditworthiness of applicants to meet legitimate business interests of the creditor(s) using the scoring system.
- The algorithm was developed and validated using accepted statistical principles and methodologies.
- The algorithm is periodically revalidated using appropriate statistical principles and methodologies and adjusted as needed to maintain predictive ability.[v]
In a recent panel discussion at the National Community Reinvestment Coalition’s Just Economy conference, Patrice Ficklin, the Associate Director of the CFPB’s Office of Fair Lending and Equal Opportunity, reminded the audience of regulatory expectations for model validation and usage in the context of fair lending. In her comments, Ficklin noted, “Rigorous searches for less discriminatory alternatives are a critical component of fair lending compliance management. I worry that firms may sometimes shortchange this key component of testing. Testing and updating models regularly to reflect less discriminatory alternatives is critical to ensure that models are fair lending compliant.” She added, “I want to express the importance of robust fair lending testing of models, including regular testing for disparate treatment and disparate impact, including searches for less discriminatory alternatives. And of course, testing should also include an evaluation of how the models are implemented, including cutoffs and thresholds, and other overlays that may apply to the model’s output.”
Maintaining a Strong Fair Lending CMS for Models
The Joint Statement noted algorithmic bias risks may arise from many sources, including unrepresentative, imbalanced, erroneous or biased data; lack of model transparency, especially with “black box” algorithms; and inappropriate design and usage.[vi] Robust systems for model risk governance and compliance management can address these and other issues. As a component of both a robust fair lending compliance management system and strong model risk management, model validation and revalidation should include testing for fair lending risk. In addition to common model validation and governance techniques,[vii] such testing should include at least the following scope to address fair lending risk:
- Purpose evaluation: Is the model being used for its intended purpose? If not, has it been validated for the current use case? Using a model outside of its intended purpose without validating its appropriateness and efficacy for the new use case increases both fair lending and model risk.
- Documentation review: The model documentation adequately explains model purpose, conceptual soundness, and reasons for each variable’s inclusion. In addition, the documentation or the validation report should incorporate the assessments of training data for bias, accuracy, and completeness, results of testing for discrimination, and consideration of less discriminatory alternatives.
- Regulatory requirements: Does the model conform to Regulation B’s “empirically derived, demonstrably and statistically sound” standard and other regulatory requirements? If not, credit decisions made with the model are effectively judgmental decisions for the purpose of fair lending risk.
- Outcomes testing: The models should be tested to determine whether models are found to result in disparities in outcomes on a prohibited basis. If use of the model has disparate impact on a prohibited basis, the creditor must determine whether use of the model serves a legitimate business need and, if so, whether there is a less discriminatory alternative to meet that need.
- Assessment of training data: If the answer to any of the following three questions is no, the choice of training data may pose both model and fair lending risks. The training data should be evaluated carefully and may need to be supplemented or enhanced.
1. Are all demographic groups and geographies subject to the model sufficiently represented in the training data?
2. Is data quality, accuracy, reliability, sufficiency, and availability similar for all demographic groups?
3. Have the data been validated for its current use? - Evaluation of model features: Model variables should be evaluated to determine whether any of the following statements are true. Use of such variables may be restricted under certain federal or state laws or regulations. Even when not prohibited, additional scrutiny is warranted, as inclusion of model variables with these characteristics increases fair lending risk.
1. The model features contain any prohibited basis variables;[viii]
2. Any independent variables serve as a proxy for membership in a protected class;
3. The model features have an unclear nexus with creditworthiness;[ix] and
4. Any predictive variables pose a substantial fair lending risk. Some examples of such variables include, ZIP code, social media data, neighborhood demographic and socioeconomic factors, student loan Cohort Default Rate (“CDR”), criminal history, type of lender used in the past (prime v. subprime, payday or title lenders, etc.), cell phone provider, college or university attended, college major, job title.[x] - Assessment of model overrides and overlays: It is critical that any fair lending review of credit risk models includes an assessment of any situations where the model decision may be overridden or altered by human intervention. If model overrides are permitted, fair lending monitoring and testing should include assessing both high-side and low-side overrides[xi] for disparities. Similarly, if there are credit policy overlays or “knock-out rules,” those should also be evaluated for the impact on fair lending.
Less Discriminatory Alternatives
As Ficklin noted, for any model with potential disparate impact, lenders must search for less discriminatory alternatives (LDAs) both before implementation and periodically throughout the model’s life. The concept of less discriminatory alternatives arises from the Discriminatory Effects Rule[xii] promulgated by the U.S. Department of Housing and Urban Development (“HUD”) under Fair Housing Act authority, but has also been applied to fair lending more broadly.[xiii] Under the Discriminatory Effects Rule, disparate impact is assessed with a three-pronged approach:
- Does the policy or practice have an unjustified disproportionate adverse effect on a prohibited basis?
- If there is a discriminatory effect, is the policy or practice necessary to achieve a “substantial, legitimate, nondiscriminatory interest?”[xiv]
- If there is a substantial, legitimate, nondiscriminatory interest, can that interest be served by another practice that has a less discriminatory effect?
In the case of models, the search for LDAs may be considered a form of champion-challenger testing that considers both model performance and fairness. Developing challengers using debiasing approaches may involve intervention at the pre-processing, model training, or post-production stages.
Removing bias at the pre-processing stage involves techniques to increase the representativeness of training data or reducing the correlation between protected classes and other data points. For example, sampling, augmenting, anonymizing sensitive data, or reweighting data may reduce bias if the available development data are skewed, imbalanced, or lack diversity.[xv] If the available data reflect evidence of historical discrimination or bias, other techniques, such as suppression of features highly correlated with protected class membership, relabeling similar target and control observations,[xvi] or transforming variables to reduce correlation with demographic groups may reduce bias.[xvii] Adversarial training, where an additional neural network is trained to detect and mitigate biases in the training data, may also be used.
In-processing debiasing techniques involve the introduction of fairness constraints during model training. Some of the in-processing techniques that may reduce disparate impact are dual optimization for predictive power and fairness, regularization of the model’s parameters or sensitivity to specific features, reweighting model features, dropping explanatory variables that have low importance and high contribution to disparate impact, and adversarial debiasing. Meta-algorithmic approaches that use a model suite and dynamically adapt the model to increase fairness are another in-processing debiasing method. Cost-sensitive learning, which involves assigning different misclassification costs to various groups and optimizing for lowest cost instead of greatest accuracy, is also in in-processing technique.
Reducing bias during model development is costly, however, and may require many iterations of retraining. It also requires access to the complete algorithm and training data, and, for some techniques, may require incorporation of protected attributes into the model. These techniques are unlikely to be available in the case of vended models. In addition, incorporation of protected attributes, even for the purposes of model debiasing, raises ECOA concerns.
Post-processing, debiasing may involve evaluating cut-off scores and reconsidering knock-out rules and model overlays. Model-predicted probabilities can also be recalibrated post-processing to reduce bias. In addition, Reject Option Classification, also called prediction abstention, can be used to add the ability for the model to refuse to make a prediction in observations where model confidence is low. These methods do not require rebuilding the model or understanding the details of complex algorithms and can be used on vended models. Other techniques include equalized odds post-processing and individual plus group debiasing.
Conclusion
The Joint Statement emphasized federal resolve to monitor AI development and implementation and to use the full authority of its signatory agencies to protect individuals’ rights.[xviii] Given the regulatory focus on both model risk and fairness, lenders are on notice that fair lending testing of models and the search for LDAs are mandatory in a robust fair lending CMS. Regardless of the bias-reduction approach used, data scientists, model developers, and fair lending specialists must consider fairness as well as predictive power when developing and deploying credit models.
ABOUT THE AUTHOR
Lynn Woosley is a Managing Director with Asurity Advisors. She has more than 30 years’ risk management experience in both financial services and regulatory environments. She is an expert in consumer protection, including fair lending, fair servicing, community reinvestment, and UDAAP.
Before joining Asurity, Lynn led the fair banking practice for an advisory firm. She has also held multiple leadership positions, including Senior Vice President and Fair and Responsible Banking Officer, within the Enterprise Risk Management division of a top 10 bank. Prior to joining the private sector, Lynn served as Senior Examiner and Fair Lending Advisory Economist at the Federal Reserve Bank of Atlanta. Reach her at lwoosley@asurity.com.
[i] https://www.ftc.gov/system/files/ftc_gov/pdf/EEOC-CRT-FTC-CFPB-AI-Joint-Statement%28final%29.pdf
[ii] https://www.consumerfinance.gov/compliance/circulars/circular-2022-03-adverse-action-notification-requirements-in-connection-with-credit-decisions-based-on-complex-algorithms/.
[iii] https://www.ecfr.gov/current/title-12/chapter-X/part-1002/appendix-Supplement%20I%20to%20Part%201002
[iv] 12 CFR Part 1002 (Supp. I), sec. 1002.9, para. 9(b)(1)-4
[v] 12 CFR Part 1002 §1002.2(p)
[vi] https://www.ftc.gov/system/files/ftc_gov/pdf/EEOC-CRT-FTC-CFPB-AI-Joint-Statement%28final%29.pdf
[vii] See, for example, Federal Reserve SR 11-7 and the Comptroller’s Handbook on Model Risk Management.
[viii] Age may be included as a variable in a model as long as applicants ages 62 or over receive more favorable treatment. For example, an underwriting algorithm may include age as long as the age of an older applicant is not a negative value or factor.
[ix] In general, the weaker the nexus with creditworthiness, the greater the fair lending risk. See Carol A. Evans, “Keeping Fintech Fair: Thinking About Fair Lending And UDAP Risks,” Consumer Compliance Outlook, 2017 (second issue).
[x] For example, in 2021 the FDIC referred a bank to the Department of Justice for potential fair lending violations related to the use of CDR in originating and refinancing private student loans. CDR is typically higher at Historically Black Colleges and Universities (HBCUs), so the FDIC determined use of CDR had a disparate impact on the basis of race, given that graduates of HBCUs are disproportionately black. See Consumer Compliance Supervisory Highlights, March 2022, page 9.
[xi] High-side overrides decline applicants that were approved by the model, while low-side overrides approve applicants that were declined by the model.
[xii] https://www.hud.gov/sites/dfiles/FHEO/documents/6251-F-02_Discriminatory_Effects_Final_Rule_3-17-23.pdf
[xiii] See, for example, the discussion of Disproportionate Adverse Impact Violations in the Appendix to the Federal Financial Institutions Examination Council’s Interagency Fair Lending Examination Procedures.
[xiv] https://www.hud.gov/sites/dfiles/FHEO/documents/DE_Final_Rule_Fact_Sheet.pdf
[xv] Kamiran, F., Calders, T. “Data preprocessing techniques for classification without discrimination.” Knowledge and Information Systems, 33(1):1–33 (2012). https://doi.org/10.1007/s10115-011-0463-8
[xvi] Kamiran and Calders.
[xvii] Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. “Certifying and Removing Disparate Impact.” Proceedings of Knowledge Discovery and Data Mining ’15, 259-268 (2015). https://dl.acm.org/doi/10.1145/2783258.2783311
[xviii] https://www.ftc.gov/system/files/ftc_gov/pdf/EEOC-CRT-FTC-CFPB-AI-Joint-Statement%28final%29.pdf