Gradient Boosting Machines – Beyond accuracy

Machine Learning

Bringing Machine learning techniques to insurance pricing was the original aim of Pricing Frontier. They train quickly, can be more accurate, and the code can easily be adapted to build multiple models with different responses, the main trade-off being interpretability.

In terms of applications for pricing teams, this is one area of the data science field that hasn’t changed a great deal in the last decade. Given the majority of modeling is supervised learning on tabular data, Gradient Boosting Machines are still nearly always the most accurate model.

Neural networks are mentioned occasionally, however these are suited for when the features used don’t have much information by themselves, but when combined have information. An example being image data, each pixel doesn’t contain much information, but combined form a picture. Neural networks strength lies in their ability to generate new features, and applications are generally with non-tabular data or unsupervised learning.

Accuracy

Whenever I have compared a GBM to a GLM, the GBM has always performed better, but the gains are usually slight, and never to the extent that the GLM would be viewed as inaccurate. GBMs are also more prone to overfitting.

Pricing analysts generally have a deep understanding of GLMs, the mathematics behind them, and how to adjust them to adhere to regulation, through plenty of study, experience, and learning from colleagues. Teams of analysts can peer review models, and the time spent building them is not wasted as this builds familiarity with the data. The model building process of a GLM is usually rigorous which will result in an accurate model.

In comparison, the depth of knowledge on GBMs in the insurance industry is still quite shallow. A pricing analyst will still get accuracy gains by using a GBM, but is much more likely to overlook part of the process or not understand some of the model behaviors. 

Time Saving

GBMs essentially automate the model training process, as a user all you need to do is prepare the data, decide on a couple of the settings, run the training process, and validate the result. It’s not required to pick which factors to fit, and how to fit them. 

GLMs also train fast, the difference is there’s more work up front to decide what factors to fit, and how to fit them. However, when doing this, it’s also a good opportunity to assess your data, the quality, the distributions, etc. When training GBMs it’s still important to do this, it just happens outside of the model training process. 

An overlooked benefit of this is that when training models don’t have a large time commitment, it means they can become a tool used regularly simply for data analysis. For example, if I wanted to assess the impact of a new rating structure, I could model the difference between the current and proposed rates and this would tell me the key factors affected by the price change, and I could have this list very quickly. 

One of the main use cases I see for GBMs in rating is when we want to be refreshing models regularly, for example with competitor models, the market is changing constantly, and we would expect model performance to deteriorate quicker than other models and so being able to retrain a model quickly and often would improve model performance. 

The bottleneck is usually the time to implement the new model, but that could be automated if your rating environment is able to make API requests. You could host your model so that it is accessible through an endpoint (during the rating process, rating factors are sent to the model that is hosted externally, and the prediction is returned), this means you could have a pipeline that retrains the model, validates performance, and updates the endpoint automatically, returning updated predictions to your rating engine.

Interpretability

The main trade-off that is considered when comparing a GLM to a GBM. A GBM is more accurate because it is more complex and thus can capture complex relationships within data, but then it’s hard to understand what exactly is happening. You can use techniques such as SHAP to understand the main relationships, interactions and can see how the prediction for a single observation is roughly formed, but it’s not as transparent as having the actual relativities. 

Anecdotally, optimisation techniques don’t get the same criticism, and it is much harder to understand values output from an optimisation algorithm than a GBM, in fact I have used GBMs to model optimisation outputs to understand which are the main factors. 

For a rating structure that uses all GLMs for the modeling and then passes the values through an online optimisation algorithm, the overall result is still hard to interpret. Whereas, you could use all GBMs for initial models, determine optimal rates offline and model these with a GLM and have interpretable prices.

Rating engine support

This is the main reason I have come across for not implementing GBMs is that even if there is the desire to use GBMs in rating, it may simply not be feasible based on the rating engine used.

Ideally GBMs are built with code rather than software as this allows for much greater flexibility (this applies to a lot of things). Software options may not support the loss function, hyperparameters, hyperparameter tuning options or other aspects of a GBM that you wish to use, some software solutions I have tried have been very slow and inefficient with RAM, forcing the use of a small number of trees. So if your rating engine doesn’t support an externally trained GBM, and doesn’t have GBM training functionality then your choices are limited.

Should your team use GBMs?

Being able to build GBMs is a useful skill, particularly for analytical tasks to quickly identify the key factors. However, their use in live rating depends on your team’s expertise, objectives, data pipelines, and rating engine capabilities.

Generalising the skillsets of pricing analysts, GLM skills are typically strong, but it is very common to see complicated, time consuming and manual data processes within pricing teams.

If your goal is improved accuracy, GBMs will offer some enhancement, but if the improvements are marginal, the trade-offs may not be worthwhile.

For speed, using GBMs is not likely to speed up the workflow significantly if it takes several days to put together a new modelling dataset. If this is the case, time is better spent on improving your pricing architecture and pipelines rather than modelling techniques. Whereas if your modelling data process follows repeatable analytical pipeline principles, building a GBM is a very quick process.

If your rating engine doesn’t support externally built GBMs but includes its own GBM functionality, the accuracy and speed gains may not be as substantial. Even if direct implementation isn’t possible, GBMs can still be valuable for indirect uses, such as inputs to offline optimisation processes or the creation of postcode or vehicle files.

Finally, without a deep understanding of GBMs, your team may overlook important aspects or potential pitfalls.

more insights