Advanced Analytical Models, With Joseph Breeden

Dr Joseph Breeden is an entrepreneur and a researcher. or perhaps I should describe him as an entrepreneur and a discoverer, so keen is his curiosity. And, with 30 years spent at the sharp point of credit analytics innovation, he is also a fantastic speaker on some of the most important topics of the day: machine learning, AI, and managing economic cycles among others. In fact, just yesterday, he delivered a phenomenal keynote address on the topics at the Credit Scoring and Credit Control Conference in Edinburgh.

I was there, but if you weren't, don't worry, we have those insights and more on today's episode of HTLMTS.

Deep Future Analytics is at home on the internet at: https://deepfutureanalytics.com/

Joseph himself can be found on LinkedIn, https://www.linkedin.com/in/josephlbreeden/, or, if it is his research you're after, the root to all of that starts here: https://www.researchgate.net/profile/Joseph-Breeden

Not content with just that, Jospeh is also the associate editor of two journals you may want to add to your reading list: https://www.risk.net/journal-of-risk-model-validation and https://www.risk.net/journal-of-credit-risk

Oh, and his work on forecasting the auction prices of fine wine can be found here: https://auctionforecast.com/

As mentioned, yesterday saw Joseph on the stage at a packed Credit Scoring and Credit Control Conference in Edinburgh, delivering the opening keynote. It's an exceptional event for those in the industry and while you're busy missing this year's edition, it will be back in two years https://www.crc.business-school.ed.ac.uk/conferences

You can also find and follow me on LinkedIn (or connect) at: www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=brendanlegrange

Otherwise, my action-adventure novels are on Amazon, some versions even for free, and my work with ConfirmU and our gamified psychometric scores is at https://confirmu.com/ and on episode 24 of this very show https://www.howtolendmoneytostrangers.show/episodes/episode-24

If you have any feedback or questions, or if you would like to participate in the show, please feel free to reach out to me via the contact page on this site.

Regards, Brendan

The full written transcript, with timestamps, is below:

Joseph Breeden 0:00

There's a reason people make fun of the ability of economists to predict the future: it's practically impossible.

Because anytime you predict a recession, there are government agencies working very hard not just to prove your forecast is wrong, but to make your forecast wrong! The Fed's job is to prevent that recession, you might be predicting.

But stress testing models can still be very valuable. And in fact, in 2010 and 2011, especially the smaller lenders, were still setting their loss reserves based on the losses they experienced in 2009. And the economy was already improving. The arrow on unemployment was down. They should have been pricing for an improving economy, but they were bracing for the economy they'd just lived through.

Model risk management was basically created after the 2009 crisis. Everybody said, well, we must have been using a bunch of bad models, we got to have a process, right. So you're playing catch up, and you put in a basic process, and then machine learning models start to get used, and there again, we're trying to figure out how we're going to catch up from a model validation perspective.

Now we're getting into generative AI, and we can dismiss them as not very good yet, but before we know it, there'll be one show up that is good. And so I think there's an opportunity for model risk management to act first, so that banks can be pre approved, in a sense when these new technologies come out.

Brendan Le Grange 1:27

Max Lorenz was an American economist known for his work on income distribution through a population, and subsequently by us in the credit world for the application of that work to measuring risk distribution through a population - the Lorenz curve being what we use to determine the Gini Coefficient, of course.

Edward Norton Lorenz, no relation to either the actor or the aforementioned economist, was an American mathematician and meteorologist, who is best known as the founder of modern chaos theory. The old butterfly flapping its wings in Beijing, precipitating a tornado in Texas.

You know it. And if you're of a similar vintage to me, you probably know it because of James Gleik's book Chaos: Making a new science, first released in the late 1980s.

I have always been a well, let's just start doing it and see what happens kind of guy. I mean, I can't go shopping with more than two days worth of meals in my head, so I couldn't hope to keep track of all the interactions that single gust of wind might have on its 11,000 kilometre journey. Luckily, there are more organised minds than mine in the industry.

I was too young to be influenced by it at the time, but I remember discovering it in high school a decade later, and being convinced that understanding fractals would somehow change the world. But that's about where my understanding of the underlying concepts ran out.

Welcome to How to Lend Money to Strangers with Brendan le Grange.

Dr. Joseph Breeden co-founder and CEO at Deep Future Analytics, and I just read while I was researching this, the inventor of vintage modelling, welcome to the show.

Joseph, your background before deep future looks to be an interesting one with a foundation of eight years studying physics all the way up to your PhD, then followed not by a career in academia, or in a lab underground at somewhere like CERN, but a string of venture building. So let's start there with that early career.

What was the path that you followed to get to where you are today?

Joseph Breeden 3:42

Well, when I was in graduate school, one of the hot topics of the day was chaos theory. So I joined a group that was focused on that. And my particular interest, given my background, was seeing if we could identify chaotic systems from experimental observations.

And so the methods that we were using: state space reconstructions, genetic algorithms, tree based methods to identify pockets of predictability, all of these things now get rolled into what we call machine learning.

I would say I am a researcher at heart. I don't know if that's the same as being an academic, but I love the discovery process. But I got hired into a think tank that Citibank had created jointly with Los Alamos National Lab to explore these things.

Brendan Le Grange 4:30

And then yeah, 10 years ago, and really with a number of successful business experiments under your belt, successful ventures, you founded Deep Future Analytics. What was the market gap there that you noticed and wanted to fall with this particular business?

Joseph Breeden 4:44

It came from what we saw that we couldn't do in my previous company. I had founded Strategic Analytics based on a technology that honestly we thought we had invented at the time for doing forecasting and stress testing based on vintage analysis.

I learned some years later that a German psychology professor background 1898, came up with the idea. So I'll just say that we innovated on using this in banking.

But, you know, that method worked very well and has continued to work through many crises around the world four doing long range forecasting and stress testing. But in 2009 our models, our vintage based models, were predicting that there would be a crisis in lending, even in a flat economy just because of the credit quality and the maturing of the loans.

And so back in 2006 and 2007 our models were saying some of the biggest lenders would have 10 times their normal loss rate. And that prediction turned out to be quite accurate, actually. But if you're going to deliver news that bad, you have to be prepared for some pushback, right? A couple of lenders immediately said, who are these accounts that are gonna go bad? You know, it's not enough to tell them disaster is coming. They want to know who and why.

First of all, we needed something that went to account level so that our clients could drill in. And secondly, what I saw through all that is that I've worked with a lot of finance companies of all sorts, as subprime as you can get, there's no such thing as a loss rate that you can't price for. The problem is if you don't price for the loss rate you're going to have, and so what was completely obvious was that they simply hadn't priced for those losses.

Most people just would take FICO score, you know, rank order their risk, and say, given this, we'll set a cut off, and then we'll estimate what price we want, usually based on market competition, and it was all very approximate and heuristic, what we felt was missing was actually predicting an account level probability of default, given a FICO score as an input, and an economic scenario, and then you can optimise price.

So really, that's where we started

Brendan Le Grange 6:58

At www.deepfuture analytics.com I see you say that 'oour one simple software suite handles all your loan model needs' - How does that promise come to life? When we think of the nuts and bolts and tools on our computer, what does that look like?

Joseph Breeden 7:12

Yes, to get beyond the the marketing taglines, right.

When we created this technology - and one way to explain this would be to say what we did was to merge the vintage modelling view of the world with behaviour scoring. Behaviour scoring is usually one observation per account, did it go bad within a certain window? Now we're setting this up as a panel of an observation every month, what are the attributes? What's the economic environment? Where is it in the lifecycle, we did that so we could generate the cash flows and get to yield.

But when we had just finished creating this technology, the new guidelines came out for IFRS9 and CECL and these are about predicting losses... and predicting losses at the account level with economic scenarios and doing it for the life of a loan, we looked at that and said, well, we just did that. If I take out the revenue part, that's what we just built.

That's when we realised we had built more than just a solution for pricing that it could then fold into economic capital and campaign tracking. And so it doesn't do everything a bank would do. It's not fraud. It's not anti money laundering. But if you look in the realm of credit risk, it is loss forecasting, stress testing, pricing, optimization, economic capital, quite a range of things in an integrated solution.

Brendan Le Grange 8:34

We've had a period of stable interest rates pretty much year after year, a nice straight line to follow into a world where that's completely different. So I'm sure the current environment just underlines again, this need to have models that don't just say the past perfectly reflects the future. Yeah, it's really interesting.

I see that you're also not just talking, you know, for big banks, with hundreds of analysts and teams of coders in there. You've got, you know, a deep connection with smaller community banks and credit unions as well. So talk to me about who you see, using this tool. Is this the cutting edge just for those companies that can afford the best of the best modellers? Or is this something anybody who's in lending can and should be using?

Joseph Breeden 9:18

You hit on a very good point, which is that this has to move beyond the heroic lift by the one superstar modeller, it has to turn into a reproducible library where these things can be created in volume.

What that means is robustness, robustness to noisy data to thin data to thin segments. That's where we put years of effort was creating libraries that had that kind of robustness and reproducibility.

So this is simultaneously scoring and stress testing, and how do you do those things in combination? So there's some different lessons we had to learn there. The end result is a library of algorithms that we use or larger lenders, we often work together in consulting projects, we can even licence those libraries.

But for smaller lenders, they often have no analysts at all. And even if they have enough data, they would want something that isn't just a modelling tool. They want a user interface where they've got some buttons and dials to optimise price to compute their CECL reserve, not just a toolkit, they can do the consulting project for the larger lenders with a system they want it to drop into.

We also offer web hosted software, where we are the analysts for them and roll this together. And for the smallest lenders who don't have enough data to build models. If you're a credit union, you may have some commercial lending, but not much. If you're a community bank, you may have some credit cards, but not much. And so there we also have a shared data pool, where we've got 300 clients contributing their data. And we can leverage that to build models for all.

Brendan Le Grange 11:03

I also just want to - before we sort of talk on some other topics - is to understand what good stress testing looks like. And I want to focus on stress testing, just because on the show I've previously in the olden days, you know, a year or two ago, before interest rates went crazy, I was speaking to some mortgage lenders, and there was a fixed +3% onto APR stress test that needs to be done in the affordability checks. And that was really making it hard for first time buyers, or it was one of the many things making it hard for first time buyers to get onto the property ladder. And so there were lenders there that were pushing back against the stress test, then we've seen you know, a year later, interest rates have gone well beyond their 3%, they almost seem to fail on two fronts, that while times were stable and good it kept people away from loans they could have afforded and probably should have been able to take and now that times have turned, it hasn't really helped anyone because it wasn't enough.

And I guess that that is the big challenge that many of us lenders face. What does the world look like where I can do something more accurate, more sophisticated when it comes to stress testing?

Joseph Breeden 12:08

Any stress testing activity is really two parts: it's the model that takes economic scenarios and it's the selection of an economic scenario or set of them.

And the models that predict what your portfolio will do given an economic scenario, those can be quite accurate, they can also be quite inaccurate, because it requires economic cycles. And we often don't have a lot of economic cycles in our data. Many clients show up with less than one cycle. So that can be a challenge. But that is a separate challenge from choosing an economic scenario.

And there's a reason people make fun of the ability of economists to predict a future it's it's practically impossible.

Because anytime you predict a recession, there are government agencies working very hard to make you wrong. Not just to prove your forecast is wrong, but to make your forecast wrong, that's the Fed's job is to prevent that recession, you might be predicting, or if you're predicting things are going well and they say well, they're going too well, I'm gonna make a little recession to slow you down.

They're always working against what your model says it's gonna happen.

So too often stress testing is assumed to the last forecasting with a severe scenario. Really, everything we do should have a stress test element to it, whether it's predicting prepayments or profitability on loans or any activity, you need to consider what the economy will do that. And then you have to make an intelligent choice of well, where do I think the economy is going? I don't want to price for the wrong economy.

And in fact, one of our use cases is that in 2010 and 2011, especially the smaller lenders, were still setting their loss reserves based on the losses they experienced in 2009. And the economy was already improving, the arrow and unemployment was down. They should have been pricing for an improving economy, but they were pricing for the economy they just lived through.

This is a case where stress testing would say, drop your rate, generate more volume, grow your business, now is the time.

And right now, we're debating will it be a recession later this year, early next year, and banks are very cautious about that? Well, it's just as important later next year to say, well, this session is behind us, we need to price for being aggressive. So both of those are stress testing.

Brendan Le Grange 14:25

On www.deepfutureanalytics.com there's some great content, but one of the points you've got in on your pricing there is with an agricultural lender and incorporating things like expectations of commodity prices.

Risk based pricing can be complicated enough, and I don't think I've ever even conceptualised this idea of like looking at the price for the future market. And looking at that in context of things like what do I think the customer's business is going to be looking like?

So I firstly encourage everybody to go to the site and have a look and read through some of the stuff themselves. But when you talk about forward-looking metrics in your price matrix. Just what sort of data and data sources are you bringing into there beyond? The usual ones we might think of?

Joseph Breeden 15:09

It's another excellent question, because I think there's a lot of discussion about big data and AI and machine learning. And they go together well, big data and machine learning, but they're not the same thing.

A lot of what gets done with machine learning in our industry is applying very nonlinear methods to the same old data. In fact, everything I've talked about so far has been more intelligent use of the data you've always had, Building Better models of your business and of your product.

If you have unique data, that's great. And often we find unique datasets in finance companies, where they're doing some kind of specialty lending.

You know, one of my favourites for a long time was a group that was looking at point of sale loans for cruise ship tickets. And if they have access to your loyalty data from prior travel, well, that's interesting, maybe they can make a model that isn't just based on what's your FICO score, it's not, then about the losses, it's about the potential revenue, if you can predict that this is a high revenue passenger, then you may want a discount or the cruise ship want may want to buy some of the points on that loan to encourage the customer to go, there's a lot you can do there with the ultimate data.

And that's still not about the model.

Where you might need machine learning or other methods is if that data is not structured text fields of various sorts that need to be pre processed. I think a lot of the most interesting uses of machine learning in our industry have been creating the factors that go into models that don't really look much different. But getting things into factors is often the challenge.

Brendan Le Grange 16:49

Yeah, and I think there's many an analyst who's had to struggle to explain that to to their boss, when the end result looks fairly similar. And they've spent all their time building those back end pieces.

And, Joseph, I want to bring you around sort of back I guess more to you. I mentioned earlier in the introduction, you know that with a PhD in your pocket, one path would have been towards academia, but instead you went into the business world, but it seems that's not entirely true. You've kept her foot in at least what I would consider sort of the world of academia when it comes to to journals, and particularly the Journal of Risk Model Validation and the Journal of Credit Risk where you are associate editor.

Joseph Breeden 17:28

I think where I'm perhaps less of a business person is that they say that the classic way to start a company is to find a an existing business process, and optimise it, I like to find interesting questions and solve them, which I realise is more of a research mentality, but then try to turn that into a solution.

So there's a kind of academic research that I call butterfly collecting, it's cataloguing lots of information with no clear intended purpose. And it's not my kind of interesting, I like solving problems that have value. And so as soon as you do that, if nobody else is going to make a product out of it, you have to so I, I feel like living in between those two worlds, solving problems from a research perspective, and then using them for something.

And maybe this comes from my background in chaos theory, I'm somewhat allergic to hype, there was an awful lot of hype with chaos. And there was a book I saw at one point, kind of at the height of the chaos cycle where it said, leveraging chaos in business, which pretended to be using chaos theory, but really, it was just stealing a word. And there's nothing to do with that.

Obviously, there's a fair amount of AI and machine learning hype right now. And, and we get to do a lot of model validation. And I've seen plenty of models, which could just as, as well have been done in a linear regression context, you know, we need to get past the hype and use the model that's right for the context.

There are things we do that can integrate with, you know, the the vintage modelling integrated with behaviour scoring, where we also know how to integrate that with Stochastic Gradient boosted trees or neural networks. That's interesting research, we would do it if we needed it, we don't often need it. And I think a lot of the business gain comes from understanding your cash flows and pricing loans, right, as opposed to nuances in a AI model. But there's still plenty of cases where AI is important.

So I don't bill us as an AI company, even when we use it because we want to solve problems.

Brendan Le Grange 19:37

Would you say that you're seeing good implementations of AI in that world at the moment,

Joseph Breeden 19:43

I'll go back to the specialty finance area where I think when you have unusual data, then you can make interesting models. If what you're doing is an auto loan to a prime customer. It's a commodity to be quiet frank, you know, having an AI model on the same bureau data for the same product, if there's lift, it's probably negligible.

Really, it's about understanding your business and your cash flows. And, and in fact, one of the areas that I'm most interested in related to AI is not its use on the individual account decision. But a strategic perspective.

You know, if you think back to most of us hated was Microsoft's Clippy. You remember that? The paperclip, who told you how to do things you had been doing for 10 years, and was always sitting on top of what you needed to get to. That was the right concept, but it wasn't smart enough.

And AI will get to the point where it can be an advisor to an executive, a strategic adviser instead of just a better scoring model. And in fact, you can go to any large language model today, give them your business problem and ask for a recommendation, you'll get an answer, what you'll get is the industry average answer it found on the internet, that's not the same as knowing your business and giving you the best answer your business.

But someday it will exist. And in fact, one of my goals is to make that exist!

But if you do that, we've been talking a lot about AI ethics. And usually the AI ethics conversation is around how to not discriminate how to not be biassed against protected classes, very important. But if my recommendations are strategic, then there's no individual in the loop. There's nobody to assess that kind of bias against really, I'm looking collectively on the impact to my customers, my community, my world.

And this gets into population ethics and a number of other things. And yeah, one of my most recently accepted papers is on scoring, AI policy recommendations for ethical implications. And I think it's a fascinating topic. I hope we'll get to lead in that area as well.

Brendan Le Grange 22:01

Yeah, it's one of those topics that the closer you look at it, almost the more complicated it becomes, as you realise there's more and more of these pieces to think about the speaking of fascinating topics.

One of the reasons I reached out today was that you are a keynote speaker at the credit scoring and credit control conference that will be happening in Edinburgh as this episode is released.

So without giving too much away at what is it that you'll be speaking about at that event

Joseph Breeden 22:29

Anybody who sees this will have already missed it, but I would give a plug for this conference: its my favourite. And the topic of my talk is around model risk management and challenges with AI. If you build an AI model, what can go wrong? And how do we address this from a validation and risk management perspective.

There's a lot to say, in that area.

What I'm trying to create is a collection of a dozen small insights, a dozen small things to think about. And explainability using local linear approximations, LIME and SHAP and stuff like that - everybody talks about that. And people talk about ethical bias against protected classes, like we just mentioned. So therefore, I won't talk about either of those two. But I want to talk about some of the more subtle things like nonlinear models are really about finding pockets of highly predictive data. And then bigger regions that are less predictive and so my confidence intervals are rapidly changing as I go through this multi dimensional space.

So what am I doing with that? It's like assuming that the confidence interval around a FICO score is a constant everywhere? Well, it may be it's been kind of linearize. But in my nonlinear model, it's definitely not. And so I need to look at you and say, Well, yeah, you've applied for a loan.

But I don't know anything about you. I may say that your average, but my confidence interval is this where somebody else applied, and I know a lot about them. And I can say this is it. And so now I need to put that in decisioning. And we don't consider uncertainty in our underwriting and pricing. And that's going to bite us before long or to put differently, whoever solves that has a market opportunity. And then there are a bunch of little things that I'll leave for somebody to watch the talk.

But I'm going to conclude that talk by saying that model risk management was basically created after the 2009 crisis. Everybody said, Well, we must have been using a bunch of bad models. We got to have a process, right. Okay, so you're playing catch up, and you put in a basic process, and then machine learning models start to get used. And there again, we're trying to figure out how we're going to catch up from a model validation perspective.

Now we're getting into generative AI and a next generation of AI models. They're really the kind of scoring machine learning models we've used so far and these models for usually are not very good yet. And we can dismiss them as not very good. But before we know it, there'll be one show up, that is good.

And so the finance companies that may be, they're not always, but they may be less regulated, they may be able to be first movers in the space, they're gonna go ahead and use that latest technology where the banks are stuck, trying to fight their way through model validation and audit and regulation and everything else.

I think there's an opportunity for model risk management to act first.

If model risk managers are going to really help their organisations then they need to figure out what is the validation, monitoring, audit, etc, that's going to go on top of anything that uses these large language models and these other technologies. How do we generalise what we're learning about machine learning beyond that, so that banks can be pre approved, in a sense, when these new technologies come out, instead of letting the finance companies eat their lunch?

Brendan Le Grange 25:58

Yeah, I've worked in a number of projects that not knowing how to get the regulator to sign off on something stopped us trying new techniques. So clients would specifically say, I don't want machine learning, I don't want AI, I need to explain to the regulator how the model works. And the regulator understands regression models so please only use those. And you're right like that can can hold you back for years, as regulators slowly sort of get up to terms and read through the research and it becomes a chicken and egg, they want to see things in practice, and you don't want to do it.

And it also got me thinking about a second problem. And I guess I said, as these tools become more and more easy to use, there's a new risk of the tool being in the hands of somebody who doesn't quite understand the context of it. And if I think back to my own career, SAS had just brought in a module where it was clicking forget for building decision trees. And I was building some CHAID decision trees for a collections process. I had done some statistics, I had a vague idea of what was happening, but a very, very vague idea. And that sped things up that helped us build better processes that, you know, I wasn't really fully capable of controlling that tool.

But it wasn't a very strong tool. Now somebody like me might be given something far more powerful to build something far more complex. Joseph, you've clearly created some great analytical products, you put out great analytical content.

If anyone listening here would like to read some of that, see your work in more detail, or maybe hear you speak somewhere, whether that's in the form of Deep Future Analytics, or in the form of Dr. Joseph Breeden in your own right. Where should they go to learn more?

Joseph Breeden 27:38

Well, where I go to find people, Google Scholar, you'll find almost all of my stuff there. Researchgate.net, you just look for my name. Or you can write to me breeding@deepfutureanalytics.com

And my usual line is that advice is free. I charge if you asked me to do it for you. So yeah, happy to share ideas. With anyone interested,

Brendan Le Grange 28:04

Perfect.

I'll put those in the show notes as well. But just before I let you go completely, I think is worth discussing one last quiver in your bow there. I've previously used my limited modelling skills are little things like trying to build a predictive model for rugby results to be my friends in sort of fantasy rugby, leagues around world cap time with very little success.

But you've taken it a step further, you've taken all this knowledge, you've got off predictive modelling, and turned it towards the problem of fine wine pricing. So tell me about auction forecast and what turned your eye in that particular direction.

Joseph Breeden 28:42

Back when I was formulating the idea of Deep Future Analytics, I was also looking for data to test the algorithms on and there was one auction data out there. And so sort of creating scrapers and downloading data and building models and gathering more data and building more models. And eventually it turned into a website and we started working with some of the auction houses. And yeah, www.auctionforecast.com has long range forecasts of prices for fine wine.

We've got about two and a half million auctions. And it's it's just fun, you know, and as you mentioned, the things you've done.

Before I was hired into that Think Tank, Citibank created. I actually worked for a few years as a consultant for some professional gamblers in Vegas. And so basically, I was creating models to predict the outcome of pro basketball, college basketball, hockey, golf, things like that.

And I have to say, I learned more about the practical use of data analysis and statistics from the gamblers than I did. anywhere else. It was a great education.

Brendan Le Grange 29:50

I've had to get a bit more smarter than before I try them in Vegas myself. But, Joseph, it's been fantastic speaking to you, especially after all the technical problems, thank you for putting up with hose.

Joseph Breeden 30:00

Thank you. Thanks for your time.

Brendan Le Grange 30:02

And thank you all for listening.

Please do look for and follow the show on your favourite podcast platform and share the updates widely on LinkedIn where lending nerds are found in our largest concentration. Plus, send me a connection request while you're there.

This show is written and recorded by myself Brendan le Grange in Brighton, England and edited by Fina Charleson of FC Productions.

Show music is by Iam_wake, and you can find show notes and written transcripts at www.HowtoLendMoneytoStrangers.show and I'll see you again next Thursday.


Previous
Previous

Cross-border BNPL in Brazil, with Vinícius Vieira

Next
Next

Resilience and speed, with Svitlanka Sergiichuk