Fair and Transparent AI for Hiring: Evaluating Resume-Job Matching, Bias Mitigation, and Human-in-the-Loop Auditing
Keywords:
algorithmic hiring, resume-job matching, fairness, bias mitigation, human-in-the-loop, CareerBERT, AIF360Abstract
Automated hiring systems promise efficiency but risk perpetuating bias. We examine resume-job matching using embedding models and fairness audits. We reproduce a CareerBERT-style shared-embedding model trained on the ESCO taxonomy and a large job ads corpus. We compare it to TF-IDF and SBERT baselines, measuring retrieval utility (MRR, Recall@k) on an ESCO/EURES-derived dataset. We then apply IBM’s AIF360 fairness toolkit to evaluate demographic parity, equalized odds, and error-rate metrics across protected groups. We experiment with pre-processing (Reweighing), in-processing (Adversarial Debiasing), and post-processing (Reject Option) interventions. In simulation, we find typical trade-offs: e.g., reweighing can reduce selection-rate disparities by ~10-20 pts at the cost of a few percentage points drop in Recall@10, while adversarial training maintains utility but only partially closes gap. We implement a prototype HR auditing dashboard with group metrics and example rationales. In a human evaluation, auditors preferred the fair-re-ranked shortlist in X% of cases (p<0.05) and reported higher perceived equity. All code, data splits, and figures are released for reproducibility.
References
Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilović, A., et al. (2018). AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. AAAI/ACM FAccT, 5. (See Fig.1 pipeline).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. (Introduced transformer embeddings).
Fabris, M., Brougham, D., Fotheringham, B., Licht, A., Liu, B., Mines, M., Sokol, K., Xing, B., Zhou, X., Zheng, Y., & Kang, J. (2023). Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey. arXiv:2309.13933. (Survey of hiring AI, hazards and opportunities).
Geyik, S. C., Ambler, S., & Kenthapadi, K. (2019). Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. KDD. (Proposed ranking re-ordering for diversity; deployed at scale with 3× fairness improvement).
Kaya, M., & Bogers, T. (2025). Mapping Stakeholder Needs to Multi-Sided Fairness in Candidate Recommendation for Algorithmic Hiring. CHI. (Discusses multi-stakeholder fairness; observes much hiring fairness work is conceptual).
Li, T., Raymond, L. R., & Bergman, P. (2020). Hiring as Exploration. NBER Working Paper No. 27736. (Views hiring as exploration vs. exploitation; finds exploration increases diversity and candidate quality).
Raghavan, M., Barocas, S., Kleinberg, J., & Levy, K. (2020). Mitigating bias in algorithmic hiring: Evaluating claims and practices. FAccT, 2020. (Audits real hiring tools; notes use of formal fairness metrics; cites decades of audit studies on bias).
Rosenberger, J., Wolfrum, L., Weinzierl, S., Kraus, M., & Zschech, P. (2025). CareerBERT: Matching Resumes to ESCO Jobs in a Shared Embedding Space. ESWA (Expert Systems with Applications). (Combines ESCO and EURES for job corpus; reports superior recommendation accuracy).
Yu, X., Xu, R., Xue, C., Zhang, J., & Yu, Z. (2025). ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining. arXiv:2502.12361. (Introduces LLM-generated “reference resumes” and hard-negative mining; achieves +13.8% Recall and +17.5% nDCG).
IBM AI Fairness 360 Team. (2024). AI Fairness 360 (AIF360) Toolkit. GitHub. (Includes implementations of reweighing, adversarial debiasing, reject-option, etc.).