FairGBM: Feedzai’s Experts Discuss the Breakthrough

Feedzai recently announced that we are making our groundbreaking FairGBM algorithm available via open source. In this vlog, experts from Feedzai’s Research team discuss the algorithm’s importance, why it represents a significant breakthrough in machine learning fairness beyond financial services, and why we decided to release it via open source.

Why is FairGBM a major breakthrough?

Pedro Saleiro: In many high-stakes domains, like financial services or healthcare, we train machine learning models with millions of events – truly big data. These machine learning models are used to compliment humans and help them make better decisions.

Pedro Saleiro: In fraud detection, sacrificing a few percentage points of model accuracy can have huge implications and severe monetary losses. Responsible AI is still in its early stages. There are still no “go to” tools that are simple to use with good results. Either you have to sacrifice a lot of model performance to guarantee fairness, or these tools cannot scale to millions of data points. It becomes very cumbersome and very hard to use.

Pedro Saleiro: With FairGBM, Responsible AI can be integrated in any machine learning operations. It optimizes for fairness, not just for performance. We leveraged the capabilities of LightGBM, meaning it can scale to millions of data points, is very fast to train, and it has performance guarantees. We really expect that FairGBM can become a standard in Responsible AI. From now on, there are no excuses to not optimize for fairness when developing machine learning models.

What was the problem you were originally trying to solve when developing FairGBM from scratch?

Catarina Belém: When we first thought about FairGBM, we realized that fairness is not being used in practice in machine learning, and especially in industry settings. This is of utmost importance because there are currently machine learning systems that are helping other humans make decisions about humans like you and me.

Catarina Belém: For example, there are some machine learning systems that will dictate whether or not I get the mortgage on my house that I need or if you will get a loan that you need. There’s no guarantee these systems will not discriminate against certain groups. This is very harmful and we really wanted to change that. Particularly, for industries.

Catarina Belém: Although there are several algorithms (and believe me, there are quite a few) they were typically either not efficient and were kind of slow. They also introduced some buggy frameworks. They deteriorated the performance significantly to a point that companies had no confidence in deploying those models.

Catarina Belém: That’s exactly the reason why we developed FairGBM. It’s built on top of LightGBM, so it’s blazing fast. FairGBM also guarantees the resulting machine learning model will make less biased predictions, while also keeping good performance. It also works for several groups, so we are pretty excited. We are making it open source, and we hope this helps boost the application of fairness systems in practice.

How would you describe FairGBM?

André Cruz: FairGBM is a lightweight model that enables you to achieve state-of-the-art performance, with very high levels of fairness. For a slightly more technical view, it is based on the popular LightGBM algorithm and essentially adds fairness constraints to the training process. In other words, the model cannot discriminate based on gender, race, or any other sensitive attribute. We know that starting off with Fair ML can be daunting, but with FairGBM, we are aiming for an easy-to-use, plug-and-play approach to improve machine learning fairness on any machine learning pipeline.

How does it compare with other Fair ML methods?

André Cruz: Other state-of-the-art Fair ML methods can be excellent in achieving high fairness. But they do so with steep drops in predictive performance or take exceedingly long to train. FairGBM is actually over 10 times faster than the most popular Fair ML method available. What this means in practice is that even though other state-of-the-art methods may lead to reasonably good results, they were seldom used in the real world. This is because they were either too slow or they sacrificed too much performance to be of use.

Personally, how was your experience working on this innovation?

Catarina Belém: It was definitely a long and complex project, but it certainly paid off. It’s very rewarding to see such an endeavor reaching production, getting deployed, and having an impact in the real world. Not only that, we are also open sourcing this fundamental research that we started on a piece of paper. It can help millions of people. We are super excited and we invite you to use it in your applications to change the world for the better.

Why are we making the source code available?

Pedro Saleiro: When we started this project, our goal was to develop an in-processing Fair ML algorithm that gives you performance guarantees, is easy to use, and can scale to millions of data points. In the process of researching it, we realized that we were developing a general purpose solution, not something specific just for financial services.

Pedro Saleiro: When we started evaluating the results of FairGBM in real-world situations, the results were so good in all datasets. We realized this could really become a standard algorithm, a reference that everyone could just go through and try when developing machine learning systems that are also optimized for fairness. We realized that we had to share these capabilities with the world. That’s why we made FairGBM source code available on GitHub. So we invite others – nonprofit, research groups, universities, other organizations – to try it, to contribute to it, and to contact us if they’re interested in using this algorithm in their own products and services.

Are you ready to see how Feedzai’s FairGBM can deliver model fairness? Schedule a demo with our team to get started.