Unleashing CreditPy, with Ayhan Diş
In the world of data science, there are people like me, who have signed up for an online Python course but just not got around to starting it yet, and then there are people like Ayhan Diᶊ, who, given a spare month, create a Python package to succeed the highly successful CreditR by simplifying tasks such as variable analysis, model development, and calibration and validation with unparalleled efficiency 😂
Ayhan Diş is a Quant Professional renowned for shaping sophisticated data-driven solutions across diverse sectors, and we discuss what AI and ML can bring to the world of credit risk modelling, a future with Generative AI tools further improving our toolbox, and what regulators have to consider in their rulings.
You can explore CreditPy on PyPI (https://pypi.org/project/creditpy/) and GitHub (https://github.com/ayhandis/creditpy)
You can reach out to Ayhan directly on LinkedIn at https://www.linkedin.com/in/ayhandis/
I'm on LinkedIn and always open to new genuine connections - https://www.linkedin.com/in/brendanlegrange - please do reach out, and follow the show's page, too
Meanwhile, my action-adventure novels are on Amazon, some versions even for free, and my work with ConfirmU and our gamified psychometric scores is discussed at https://confirmu.com/ and on episode 24 of this show https://www.howtolendmoneytostrangers.show/episodes/episode-24
If you have any feedback or questions, or if you would like to participate in the show, please feel free to reach out to me via the contact page on this site.
Keep well, Brendan
The full written transcript, with timestamps, is below:
Ayhan Diş 0:00
Last month, I had some free time and so I decided to convert CreditR into Python, including some functionalities to develop credit risk scorecard, a PD model, basic data analysis, and to determine which features are going to be passed to the predictive model.
It also generates an automated model framework that is actually searching for the best predictive model across the different feature sets that potentially can be used during the model.
Brendan Le Grange 0:42
Python is a genus of constricting snakes native to tropical and subtropical regions of Asia and Africa, and, if you're in the same corners of the Instagram algorithm as I am, found in Florida, too, where they're the top target of the barefooted 'yoink' guy.
When I lived in Hong Kong, I always wanted to see a Burmese python in the wild. Alas, the closest I got was seeing a neighbourhood Facebook post showing one slipping into Henry Chamberlain's unattended garden.
But Python is also a high level general purpose programming language. First released in 1991, it has really come to prominence in the machine learning community. And I'm claiming, based on gut feel and a single response from Google, is now the official coding language of modern data science. Welcome to How to Lend Money to Strangers with Brendan le Grange.
Ayhan Diᶊ, quan professional, welcome to the show. Ayhan, I copied your LinkedIn headline and on one level 'quant professional', it doesn't seem to do enough to display the wonderful array of skills that you bring, but on another level, it really does capture very well that career of yours.
So before we get too deep into the real topic of today, which is CreditPy, let's start by taking a step back. And if you don't mind, talking to me a little bit about your roots into data science and from there into credit risk.
Ayhan Diş 2:22
I was studying economics during my Bachelor's, and I was working as an intern in the budgetary reporting area. And I met with a guy who is doing boutique consultancy, and he told me that you might have a future in data science, if you want to work together with me.
We had a good coffee. Afterwards, I actually mainly focused on developing my capabilities in the data science area, and also credit risk.
Brendan Le Grange 2:43
Now our main topic for discussion today is the recently launched CreditPy, which simplifies tasks such as variable analysis, model development, and calibration and validation with unparalleled efficiency.
Now, there's a segment of my audience that really loves these deep technical discussions. I certainly get more LinkedIn messages and connection requests after them then after any other topic, but I feel a little trepidation because these days, my hands are soft and calloused by any hard modelling work so I'm stepping a little beyond my comfort zone.
I started in credit risk analytics in 2002. Just as the bank was realising that Access databases probably weren't good enough anymore, and we were learning SQL, and if you were a scorecard builder, SAS as well. A few years later, I changed roles and for the first time got to work with a drag and drop SQL assistant and a one click chain analysis type decision tree builder in SAS - my peak in terms of technical know how.
A year or two after that is when I first heard of R - I was managing a team and for the first time ever, we were recruiting graduates who had studied credit risk analytics, and they had been trained on R because it was free. We used to train it out of them, the first few months being spent teaching them how to code in SAS instead. 10 years later, so by which time I was living in Hong Kong, we did the reverse exercise where we trained all our SAS analysts how to do R and gave up the expensive licences but really the point here being that I've lost track with what the state of the art is in terms of credit and modelling.
So let's start with a little history lesson. I know that credit pie is underpinned by credit are so let's start there at the beginning. What was credit are and what happened that made you decide you needed to update it with a CreditPy.
Ayhan Diş 4:49
Five years ago, I was working on a project related to credit risk model validation, where the need was to validate around 70 different predictive models. And at that project, we were using a commercial licence tool, which was not allowing to do us parallel processing, and credits, or it was an answer to that need. And I started to develop all the functions to validate a model inappropriately.
And after developing it, I also decided to publish it on GitHub. And it has been welcomed by the credit risk professionals - I actually sold it in many companies from Australia, United States, Turkiye to buy Belgium and also Netherlands, use the package or used small parts of the package to improve their businesses are is a good programming language, but which is also not answering the implementation needs or the to the advanced needs of the companies.
And last month, I had some free time. And I actually decided to convert CreditR into Python.
I love to hear that a whim and a month is all it takes that if you've got the skills to do such a big piece of work!
Brendan Le Grange 7:32
So set the scene there, what is CreditPy and what all is included in there.
Ayhan Diş 6:22
Actually, the CreditPy is including some functionalities regarding to develop credit risk scorecard, a PD model, basic data analysis, and it checks the informative variables in an automated way to determine which features is going to be passed to the predictive model. It also generates an automated model framework that is actually searching for the best predictive model across the different feature sets that potentially can be used during the model development.
And after this, there are actually many functions that has been defined to create the rating scale. And also, after creating the rating scale, its offers to do some validation, like univariate gini check, information value checks, basic multicollinearity checks, stability checks on the futures to see if there will be any drift on the predictions on the auto sample set bit applies a basic rate of evidence transformation on the data.
And finally, it allows the user to validate the created rating scale, predictive power of the model and the calibration. So I defined a regression calibration, and also Bayesian calibration methodology in the package that is actually doing this tasks in an automated way.
Brendan Le Grange 7:55
It sounds like everything somebody needs to model building, but I don't need to create multiple models, multiple projects, and then see which model is the best for my needs. This can all be run in parallel so I can pick the most suitable model for for my needs in any one situation.
Ayhan Diş 8:12
CreditPy has functionality like that, but its uses basically the logit models at the back end. And after developing the CreditPy, I also decided to develop the CreditPy Pro version, which will actually include the many sophisticated methods to determine the best predictive model in an automated way like neural networks, XG Boost models, Naive Bayesian models or a combination of XGBoost and Neural Network.
And I think in the Pro version, I will be focusing on adding much more advanced functionality, especially also focusing the machine learning and the deep learning side of practice.
Brendan Le Grange 8:58
Yeah, well, I want to talk to you a little bit about deep learning and machine learning now. But first, this is one of those things I think people can best understand hands on. You say it's on GitHub, if somebody's listening now, and already wants to have a look at credit pie, where can they go to see it and to maybe experience some of that functionality.
Ayhan Diş 9:19
It's an example data set in the Python library. And it also added a function that is calling that CSV file. A user can directly check the GitHub link to see the example application of the candidate by
Brendan Le Grange 9:37
Excellent. I'll put those links in the show notes as well so anybody listening who is on their technical side can jump straight over and see that at work.
But yeah, let's come back to the broader concept there of machine learning and AI. Obviously, two hot topics at the moment. It's some situations overestimated in some situations underestimated, but what is the reality of it, machine learning and artificial intelligence applications in credit risk today?
Ayhan Diş 10:04
According to my experience, these applications already came to the area, for example, most of the FinTech companies are using the machine learning, which actually gives a very strong advantage to the companies in terms of predictive power.
Because it was also my experience that I recognised during my master's where I was comparing the classical model development methodology with the machine learning and deep learning methodology, I recognise that a minimum 10% more predictive power can be taken from the machine learning applications, which can be really crucial advantage for a financial institution.
And also ECB has recently published the discussion paper on machine learning applications on the IRB methodologies. I think in the future, we will be discussing on how to embed these types of models, and also generative AI parts to the industry
Brendan Le Grange 11:10
And if we think about it from the practitioner point of view, you know, somebody who's grown up building traditional credit scorecards knows structured credit data from the Bureau, application form, or the bank's databases, we're now in a world where there's a whole lot more data available of all sorts of different types.
Talk to me a little bit about how, if you were going to be doing a machine learning or an AI model building programme, the sort of data you're incorporating now might be different? Or I guess the same question from the other way around. If you've got data, like social media data, or geolocation data, or gaming or psychometric data, how does having access to machine learning tools help you build a model with that in a way that an old fashioned logistic regression wouldn't be able to do?
Ayhan Diş 11:58
It's a good question, actually, that classical approaches is not capable to handle, let's say, the 1,000s of different futures. When you, let's say, try to add psychological questionnaires or social media network data, or the gaming experience of a user to understand how the player is behaving, these type of sophisticated data sources, with machine learning and deep learning models, I think we will be more capable to approach the difference behaviour of the customers to lend the money in a smarter way.
However, the problem is the market is currently using much less sophisticated methods.
Brendan Le Grange 12:43
And I guess this is also where Python comes into its own.
Because when we moving away from having data extracted from our own internal databases, to a world where we're gathering data from a million different sources, Python is a language that most people would be doing this sort of work in. So incorporating your credit models in that world seems to make a lot of sense today.
Ayan, if I was to use credit pi, how does it work with the data flow? Do I load my data up and run it through credit pie? Or do I host credit pie on my side? What does that look like from a sort of architecture point of view?
Ayhan Diş 13:29
It's a PyPi package is so user can directly instal it, the the user can define the data of another institution or working data or study data then its can pass directly that data to CreditPy to experiment with functionalities.
Brendan Le Grange 13:49
Ayan, you mentioned earlier, a word that is in all the headlines today: generative AI.
And you know, when we see those headlines, in the stories, we're normally talking about, like text to image or text to video, something flashy, that gets us on the TV screens, but if it's not ready yet, how is it going to be working, we're able to help us in the world of credit modelling?
Ayhan Diş 14:12
I think generative AI is gonna be a major player on the data generation part. Because the banks are currently faced with a data problem, people are not able to really reach to the most valuable data source that they can use in the predictive modelling part. And not yet, but probably in the future, it will also evolve and the generative AI can help us to actually command the information that we got from the client.
Or we can ask generative AI, hey, we have a missing value in this data source. How do you fill those missing values by giving all the other information that we have to generative AI then we can expected to really generate a well designed data module for us to use it in the predictive modelling. Or besides that, let's assume there is engineering to AI, that is already trained with the psychological data, articles, books that we have. And we can ask it to make comments about the psychology of the customer. And then we can feed that commands into the machine learning model to make a prediction about how the customer is going to behave if you're going to learn the money to that customer.
So I think generative AI is really going to be crucial in 5 or 10 years.
Brendan Le Grange 15:48
And it's really interesting, because obviously, there are people working today with synthetic data, we sabotage protect privacy of customers, sometimes just to bulk out their data they have available, there's some risks in the app that we know of in terms of being involved on both sides of the equation, but Gen AI would be great in their space, creating data when we need it, and understand what a customer would do, because we've sort of got the affordability check in place. Now we will look like do they have the money? But this is taking it a step further, they wouldn't actually be beneficial to the customer? Or are they likely to do something that would damage them, even if they can afford it?
It does bring out some concerns or some questions around regulation the EU is coming out with AI legislation. I was talking to someone recently in Singapore, the government's got a whole programme there on AI and how they they've they sort of sign off and approve of AI projects. It is a question in that space: what do you see as their the push and pull at the moment in terms of regulating these new approaches to credit risk modelling across the world.
Ayhan Diş 16:57
The regulatory authorities is mainly focusing on the explainability of the machine learning applications in the area as they already got enough pain during the economical crisis, and the wrong decisions made during the crisis, I can understand the concerns on the machine learning part by accepting it's directly a BlackBox application on the decision making.
However, within the last few years, many developments has been made by the professionals working in the area that's already developed, for example, Shapley values, or different graph methods that is alluding to user on how to make a decision via a machine learning in an open way.
Of course, considering this statistical approach, it is not going to be that much easy to understand how the machine learning is making a decision. But on that part, also, regulatory authorities also, I think, started to understand that making a complex and a smarter decision is much harder to explain comparing to a statistical approach.
So let's say if you are using your 10 different features in a model, explaining it with a linear relation using just a logit probit function is really going to help us to understand why the decision is made. But if you imagine that it is going to give, let's say 70% accuracy on the final product. Versus if you get let's say the 90% with 1,000 features that is not as easy to explain.
I think the trade off for the regulatory authorities is gonna start at this part.
Brendan Le Grange 18:54
You know, the history of of AI models and machine learning models is now growing and growing. And their confidence is probably there and well established. Now that is not quite as scary as it was a few years ago to hear the terms. And as you said, you can you can prove it works.
And I guess the more times we see it working in more different ways, the more confident we are. I add we've covered a lot there. We did mention earlier but just a reminder, if anybody's interested in credit pie, where can they go to learn more about that and see it in action.
And likewise I found you on LinkedIn but where is actually the best place for people to go to get in contact with you be at a bad credit pi or a bad credit modelling more broadly?
Ayhan Diş 19:36
They can add me via LinkedIn and also they can directly send an email I already received some emails, which also made me feel excited to develop to credit by foot.
Brendan Le Grange 19:50
Yeah, and I saw a great response when you launch it. I add earlier I said recently launched this very recently launched in great response so far.
As you said there's demo data there for people to play around with. So they really can jump right in there and get their hands dirty and see what's involved. You said, CreditPy Pro on the way, do you have any idea of the timings for that?
Ayhan Diş 20:13
My plan is actually to produce CreditPy Pro in two months.
Actually, the AI module is already set. But there are some small issues that I am fixing. And beside the AI module, I actually want to put also a broader model validation structured. And also I allow the modeller to use just basic functions to create an end to end credit this decision model including also the model deployment.
In the last weeks, I mainly focused on making much more efficient deployment pipeline, and now just capable to deploy the models below seconds, which is also kind of an advantage for, let's say, fintechs for example, if you are working in a very fast environment, then you really need to offer your money to customers in a very fast way.
Brendan Le Grange 21:17
We'll keep an eye out for that and announce it on the page as well when it goes live. And if you also have a number of interests outside of the city, these are AI and analytics space. So what else are you doing?
Ayhan Diş 21:29
I'm working on my individual projects. One of them is artificial intelligence base stock commodities and cryptocurrency trading. Besides that, I am also working on developing artificial intelligence based lawyer assistant, and also with some of my colleagues, we are also thinking about moving into the healthcare analytics bots to create some automated Doctor assistance first, then afterwards focusing on some other original problems in the area.
Brendan Le Grange 22:04
Yeah, I mean, it's a fascinating world we're in now. And I think it's fascinating to hear you turn your attention to industries like the medical field, like the legal field, where traditionally those would be very different.
But I guess by applying AI and analytics and understanding data, you're able to immediately make an impact. So some fascinating projects on the way I am. It's been really interesting for me a reminder that I should refresh on the current tools that are out there and maybe finally learned some of that Python that a few courses keeper, trying to teach me.
So thank you for making the time. And yeah, I look forward to seeing how this gets picked up and the benefit it provides to listeners of the show and also we'll keep an eye out for CreditPy Pro.
Ayhan Diş 22:53
Thank you so much, Brendan. It truly It was very nice to do the podcast with you. I really enjoyed it.
Brendan Le Grange 23:01
And thank you all for listening.
Please do look for and follow the show on your favourite podcast platform and share the updates widely on LinkedIn where lending nerds are found in our largest concentration. Plus, send me a connection request while you're there.
This show is written and recorded by myself Brendan le Grange in Brighton, England and edited by Fina Charleson of FC Productions.
Show music is by Iam_wake, and you can find show notes and written transcripts at www.HowtoLendMoneytoStrangers.show and I'll see you again next Thursday.