Aiming when you can't see the target, with Clare McCaffery and Jacobus Eksteen
When you want a model, you typically start with a target variable in mind. But sometimes that's not possible. Perhaps you're entering a new market or expanding into a new niche, you could start lending on a small scale and wait to see what happens. We used to call that a deep risk test, often somewhat euphemistically when a cohort was accidentally approved through a gap in the system. But even if you're happy to take a chance, that takes time - there's no way to shortcut a 12-month outcome.
Enter the u-score, Matogen AI's novel approach to applying an analytical layer to the old-fashioned expert model. In today's episode, I'm speaking to Jacobus Eksteen to learn more about how they did this with Open Banking data from DirectID. This provides a way to jumpstart an Open Banking modelling project, but Open Banking has many arrows in its quiver, so I also speak to Clare McCaffery about the state of the landscape.
Clare is Chief Commercial Officer at DirectID and can be found at https://www.direct.id/
Jacobus is CEO at Matogen Applied Insights and they can be found at https://ai.matogen.com/
And of course, they're both on LinkedIn at https://www.linkedin.com/in/clare-mccaffery-045a7113/ and https://www.linkedin.com/in/jacobuseksteen/
And while you're there, come and find and connect at https://www.linkedin.com/in/brendanlegrange
As mentioned more than once in this episode, my action-adventure novels are on Amazon, some versions even for free, and my work with ConfirmU and our gamified psychometric scores is at https://confirmu.com/ and on episode 24 of this very show https://www.howtolendmoneytostrangers.show/episodes/episode-24
If you have any feedback or questions, or if you would like to participate in the show, please feel free to reach out to me via the contact page on this site.
Keep well, Brendan
The full written transcript, with timestamps, is below:
Clare McCaffery 0:00
I have a relative who is just going off to university and he says to me, "I don't understand why I have to take credit out to get credit. Why don't you just look at my bank statement, you can see how I'm behaving?"
Well, that's great. That's exactly what we do. At Direct ID we focus on categorising transactions that have particular interest risk decision makers.
Jacobus Eksteen 0:21
So what we were able to do is to bridge the gap from unsupervised learning, where we don't have any labels, to supervised learning, where we do have a label.
So the ideas lenders can use this from today already, and make a decision and set their cut offs but we are also busy getting more and more performance data. And with performance data, we can build the next iteration of the model.
Brendan Le Grange 0:49
I'm going to be honest with you, it's the end of the year, and I'm pretty tired. I produced this show alongside my real job, alongside speaking and consulting and training work, alongside my wife's new job and her new master's degree, and alongside the seemingly ever growing pile of admin that accumulates.
But there is a silver lining, at least for me. And that's that, as you listen to this, I will already be in South Africa enjoying a warm Christmas holiday for the first time in a while. The kids will be horse riding on the beach and swimming with penguins, we will be drinking nice wine and eating in the sort of restaurants we couldn't afford in the UK. And soon we'll be sitting with my younger brother in his backyard, watching the giraffes walk by while we braai. I
t's going to be a fantastic break. And it's going to be a real break as well. Previously, I've stockpiled episodes and released them while I was away, but not this time. It's all pencils down. So we have an episode today, of course, and it's a good one. We're looking at open banking and how to create credit scores in an environment with very little outcome data. And next week, we've got a great episode, we'll be rounding out 2023 with a look at agile decision systems. But then I'm taking all of January off.
Please don't unsubscribe from the show. We are going to be back strong on the 1st of February with a look at lending in Japan. In the first I've another exciting series of interviews that I am already lining up. So do stay for those! Welcome to How to Lend Money to Strangers with Brendan le Grange.
Clare McCaffrey, Chief Commercial Officer at DirectID Welcome to the show. Maybe as a start to our discussion here, could you give us an overview of what DirectID is doing?
Clare McCaffery 2:48
Direct ID is an open banking platform specialises in credit and risk decisions. It's been established about 12 years and actually direct ID built the first aisp licence in the world and have petitioned for open banking regulation. So it's highly regulated in the UK. This means that most banks have to connect the consumer to the aisp. If the consumer and the end user gives consent,
Brendan Le Grange 3:14
in terms of open banking, when I describe it, I probably don't do it justice, we can think of it in the very simplest way of just providing permission for somebody to look directly at your bank account. So you don't need to be a middleman printing out statements and handing them over. But in doing so, there's also a lot more data than the statement would hold a lot fresher data. So it's more than just replacing statements from the old days, what are we seeing coming out of open banking, where we talked about the data products that it enables?
Clare McCaffery 3:46
Yeah, pulling of bank statements is a very basic use case, just to get that just to get speed of an application process, for example, where that manual intervention by the end user is removed. So the consumer can go through a journey with an aisp, click on consent. And then all of that bank statement data is, is pulled into our ecosystem, they give consent generally for 90 days. And then they can extend that consent through a prompt and they can revoke consent at any time. But actually, I think raw data is become a little bit of a commodity. So actually what we do with it, so we create something that's meaningful to a risk decision maker, predicting likelihood of future financial impairment, and then the lender can take early intervention, you can see whether a portfolio is declining over time, and then you can take early intervention to prevent that. And then of course, there's some basic insights, if you like, like categorization where the consumer is spending their money or the SME in fact, and how they're spending it. So there's lots of insight driven aspects to what we do at direct ID. And then ultimately, I Our mission is to create your credit score. And we starting with a credit risk score at the point of application. But we have an aspiration to build scores throughout the credit lifecycle. So we can build something for account management collections, a score to predict will consumer or the SME self cure, for example, so you don't need to spend any time and money in the collections process. So right throughout the credit lifecycle, we can use bank statement data to predict a an intention, or a future action from the consumer.
Brendan Le Grange 5:32
Yeah, and we're going to talk a lot about credit scoring, because we're at the credit scoring and credit control conference here in Edinburgh. But for me, one of the big things with open banking is that credit cards developed as a credit product, by necessity, because it took 30 days to get the details of the transaction, bundled up, send it as a paper statement to the customer have them send a check back, we needed that time to get everything done. And therefore it was a credit product. But actually to the user. For many people, it was just a spending tool. And now we're seeing e wallet, debit cards and such take over. And a lot of the potential value particularly on good customers who don't miss payments, is at risk in some markets more than others of moving off credit cards and disappearing from the traditional credit world. So I think it is anyway the future. But there's also more opportunities with that there can also some of that systematic, you don't need a middle party that your bank reports the data to and makes agreement to you give permission, and the remaining data is there. And I think if we bring it to kind of the topics that was presented here at the conference, looking at other ways to use this data to fill in gaps that would have traditionally prevented us from modelling. I'll turn over to your Kobus nxdn is CEO at Metrojet consulting, who have got a wide range of experience in traditional and new ways of credit scoring. Thanks.
Jacobus Eksteen 6:59
Yeah, my background is address and I did my MBA. So I see myself as kind of a bridge between the business world and the technical credit scoring world. So we do general decision sounds consulting across different industries like financial services, agriculture, healthcare, industrials, but my bread and butter and passion has always been the credit scoring, and trying to see how we can make a difference using that. So I mean, it's amazing to have partners like direct ID that have the vision to do that as well. So the big challenge we had was the lack of outcome data. So we understand what the variables are for a person, but we don't understand how they will play alone. So we had to develop a technique to try and determine how to score this person if we don't have any outcome. So generally, what people do is they have experts look at both the variables, and how the variables should be weighted. But that brings bias into the process, because it's very subjective. So in this technique, we call the use score, we were able to tell the expert that they should just identify a variable, and just identify the direction in which it should rank. For example, say the ratio of debits the credits, if that ratio is higher, that is bad, high income is good. And that is all of the expert needs to do. And then this technique then ranks the entire population and gives them a score that you can really use your decisioning. But more than that, if you really want to, you could say I expect the bad rate of 10%, you can flag the bottom 10% is bad, then you can have a binary outcome for a traditional credit risk model approach, a so called supervised model, but also in terms of things like fraud using the open banking data, because you can use this tool on any data. So it is really contributing to the larger credit scoring journey at the world is on
Brendan Le Grange 8:56
in the old way of doing things that were just built in time delays, you could not escape. So if you wanted to build a score, you would need a to get outcomes. And you would need maybe a bit more than that. Because you want to get the volumes up, you want to look for a few months. This is a way to hit the ground running which the question that they have, like how do I build that first scorecard? And what's interesting to me is, is that mix of experts and statistics, so talk to me a bit, if you don't mind without scaring people off with the statistics about what this you score. What it is
Jacobus Eksteen 9:27
we take all the variables that we know from expert experience with the predictive like income like ratio of debits the credits, like variance in income, diversity in income sources, and then we rank the entire population and say, person A is strictly better than Person B on all accounts. And I think we could potentially add more detail to the podcast link. But the idea is then that you assign a score to the entire population in that way. And you know this person is better that person given these variables or has a lower probability of default, and then you can rank the entire population and use that score directly or choose a certain cutoff. But you can also link it to something else. So what we have done a direct ID was to link it to a negative account balance in 90 days as a form of validation. And that worked really well in predicting that. Even though that outcome of a negative balance in 90 days is not performance alone, it's still an outcome that you might say that is a bad. So it just allows you to go from absolutely no labels to a label that you can still use your judgement to determine what that label link to the use score should be.
Brendan Le Grange 10:41
But what happens when you start rolling out that first model, and you start gathering actual performance data.
Jacobus Eksteen 10:48
So this model will rank really well. And we know it'll predict, for example, negative balance in 90 days really well. But first prize is actual performance data to be able to link it to a probability of default. So the ideas lenders can use this from today already, and make a decision and set their cut offs. But we are also busy getting more and more performance data. And with the performance data, we can build the next iteration of the model at linked to a specific probability of default. So this first version of the model, we've seen ranks well, and it is predicting a negative balance in 90 days very well. But the next step is to use actual performance data and calibrated to actual probability of default. So the current model is usable. But if there is a certain lender that wants the accurate probability of default in the portfolio, that is the next step. And they will be a generic model that does that. But also bespoke models that focus on the risk appetite of a certain lender could be done as well, absolutely flexible. But the point is, I mean, we can be flexible, to be able to provide descriptive, predictive and prescriptive solutions to anybody that can gain from open banking data. And to be honest, I cannot think of any institution that won't be able to get in. We have worked with a corporate bank for Africa with transactional fraud, but they could only identify at accounts where they knew it was fraudulent. So we were able, with the use code to build a model, and then validate how well it works identify the 80 accounts, even though they weren't enough to build a model. But I think generally, once you have the open banking data, you can do anything that you will be able to do with something like bureau data. But a lot of the value that you see at the end from your model is just based on the upstream journey, making sure you have good clean data and making sure they have good categorization, making sure you have good features that were built. And making sure the model doesn't overfit.
Brendan Le Grange 12:51
You don't want to label a customer's fraud, right. So there's a consumer who took a loan out, there's no evidence ever intended to pay it. But to label that account as fraud would introduce a whole lot of risk from a reputation and a legal point of view. So we would enhance know that this is in all likelihood, a customer who acted fraudulently but perhaps we haven't taken all the way to court and proven it legally. So we wouldn't mark it in a in a system. But we may want to model for that. And that's having a tool like this is great for that. Because we don't then need to flag and say you are definitely fraud in our opinion. But we could still use that internal expertise and say, Yeah, let's have a look at these cases that are suspicious. What do we think makes them suspicious and started modelling? So there are many ways beyond just building the first scorecard in an emerging market that this approach could help.
Jacobus Eksteen 13:49
Yeah, so what you often see is that fraud alerts are rules based. And there are many flaws in that approach. Yes, you need black and white rules. But often the rules become too complicated. And then what many banks end up with is volumes that are so high, that are flagged as high risk of any money laundering or fraud, but the team just can't investigate it. So this allows you to almost have a risk of fraud and you can set your cutoff so that your team can investigate it. And it is acceptable. You can have different cutters for different use cases. And I'd have a one size fits all approach. And that is why for example, the probabilities and the confidences in the open banking classification is so important as well, because for one use case, you might be happy with the 50% confidence and another one you need 90% So it's not just about the classification or not just about the fraud flag but also about the level of confidence and allowing the client to be able to choose it and select it and have that level of control. We are at a credit control conference. as well. But
Brendan Le Grange 15:01
I think everybody's got some experience either in a new market where they have no data or a change of strategy. And the data we wanted just wasn't captured, by clarify, move it back to you quickly. There's a lot of potential in the open banking space. So where are we seeing the biggest take up in March.
Clare McCaffery 15:21
So I think it's an addition to improve models, you know, the more data the better. So this is an additional data source. But there is additional information that you can find on the bank statement data that you can't find on a credit bureau file, right apart from just thin files. So for example, not all BNPL lenders contribute data to a credit bureau. And yet, you know, the young population is reliant more and more on those facilities. And so they become over leveraged. It is also in real time, and lenders can refresh this data four times a day. So they can see the Delta from nine o'clock till 12 o'clock, you can also see movements that perhaps is an indication that somebody's moving into a distressed situation, are they spending less money on going out? You know, are they having less disposable income? Are they moving supermarkets or moving products around brands, so there's lots of information that I think you can get in real time. But clearly, you can infer missed payments from bank statements, but you're not getting that information. It's an inference rather than something solid from the credit bureau. So I think they work hand in hand,
Brendan Le Grange 16:26
somebody has missed a payment. I mean, that's called standard credit risk data to say this is a risky person. But if we imagine a typical credit card situation, I might be paying off my full balance, I go into the store, I make that one last purchase, that was the one that took me over, I then changed to minimum payment, I started paying minimum payments, until that becomes unbearable. And finally I miss a payment and then it gets rolled up into a cycle and finally reported to the credit bureau and incorporated into a score, there could be six months, 12 months between when my behaviour changed and when that's reflected in my credit score. I think what's also interesting is that the characterising other spend it was a dream when I was in credit card issuer, oh if only we could see what the person had spent from we knew the merchant code, but groceries are gonna be many things, but actually being able to see right what's happening. Yeah,
Clare McCaffery 17:19
and a direct ID we focus on categorising transactions that are of particular interest to risk decision makers is someone's spend on gambling as a proportion of their income increasing, that's a red flag, the spend on collections agencies, we also provide a confidence score to each transaction to each category. So we say actually, we're 80% confident that that transaction fits into this category. And all of that data can feed into a predictive model, right? So you could weight the category by the confidence level.
Brendan Le Grange 17:53
Yeah, and the UK is kind of one of the homes of open banking. But actually, if you look at it from a credit bureau point of view, it's got one of the richest credit bureau datasets is already current accounts on the bureau telcos on the Bureau of BNPL, some of them at least, the Bureau, and even then we see an uplift. Most markets around the world have much thinner bureau data, and then for the Bureau coverage as well. So direct ID is doing work with over banking around the world. Are you seeing a similar approach around the world? Or how does this change, we're
Clare McCaffery 18:28
really strong in the UK in the US, we have direct connections, and we do have the ability to connect data in 46 countries across the world. And there are different use cases for the consumer or the SME, it's about the value exchange, the price reduction on the insurance product, or the speed to get hold of a mortgage. And then in other markets like India, South Africa, for example, there's a large, thin fire population. So the access to credit more than the uplift
Brendan Le Grange 19:01
is just a great way in those markets to leapfrog some of the the structural issues where we can bring people in from outside and I think that that makes it really interesting. And maybe you're close back to the modelling side. How do you look at open banking data? Is it a different problem to traditional data or the same thing once you start modelling?
Jacobus Eksteen 19:22
So generally, when I look at any data source, I look at value and cost and friction. So we can control some you can't. In South Africa, for example, the banking data is more friction because there are fewer API's. So more is done with screen scraping. So there you might have to pull the cost lever to make it more valuable for lenders. That also determines we're using it in your decisioning. In general, any data set that is predictive and has a low degree of correlation with another data set should be useful in modelling. So it's very much dependent on the use case on the bad Church on the left, and the friction. But I'm very excited about the potential for open banking data in conjunction with a credit bureau, because the credit bureaus tell you what someone is not doing. And open banking data tells you what they are doing. And you need to be able to see that entire picture to make a fair decision. And
Brendan Le Grange 20:19
Cloud just want to finish with one last thing I probably should have started with your covers talked about friction. And I guess what we've seen the UK is that people have been very comfortable. But do we have any numbers that talk to how comfortable consumers have been in adopting open banking and allowing lenders or institutions in general to see their data via in banking,
Clare McCaffery 20:40
we spent a lifetime telling people don't give your bank details away. And then his direct ID thing, login with your bank detail. Actually, we don't see the bank details. Of course, we don't see their how they log in, we just see the transactions. But I think it does come back to that value exchange, what's in it for the consumer. But it varies from God, Geo. And it varies on factors like How reliable is the connection with the bank. But in the UK, we can see conversion rates around 80%. Actually, we work with one global lender, but in the UK, the conversion rate is 80%. Clear,
Brendan Le Grange 21:15
if people are listening and wanting to learn a little bit more about open banking and direct ID and the work you're doing the UK but around the world where it's a good place for them to go and learn more get in touch. Direct
Clare McCaffery 21:28
us ID, you can come see me on LinkedIn. And we're at those credit risk events. Yeah, or just registered on the website and one of our guys will get in touch. Great. And
Brendan Le Grange 21:39
your corpus, same thing. If people want to speak to management, how do they do that? And I guess in particularly if they want to look at this research you've done on the use score and creating a scorecard without performance data, where can they go to do that? Thank
Jacobus Eksteen 21:54
you. And we're calling this use score approach supervising learning. And there is an abstract that we sent to the credit scoring control conference in a whole presentation. And they can reach out to direct if they want to know more about the credit score. And they can go to our website ai.mitogen.com For more information or to reach out to us or just to have a discussion or add us on LinkedIn. I've been in the industry for 16 years in thinking about good people and bad people that it's actually all good people, but some are just in in bad situations. Like they say
Brendan Le Grange 22:29
great. Well, thank you both for joining me. It's been really interesting. And I think for me, the more I've got to know about open banking, the more I love it, but this is the first time I've thought about scoring, or building a score portfolio without performance data, but also in a structured mathematical way.
And thank you all for listening.
Please do look for and follow the show on your favourite podcast platform and share the updates widely on LinkedIn where lending nerds are found in our largest concentration. Plus, send me a connection request while you're there.
This show is written and recorded by myself Brendan le Grange in Brighton, England and edited by Fina Charleson of FC Productions.
Show music is by Iam_wake, and you can find show notes and written transcripts at www.HowtoLendMoneytoStrangers.show and I'll see you again next Thursday.