
Project information
- Category: Regression
- Project date: Dec, 2019
- Kaggle: Kaggle Competition
Auto Insurance Portfolio Loss Ratio Prediction
The goal of this project is to predict the natural logarithm of the loss ratio of a portfolio of auto insurance policies. The testing data contains a set of 330 policy portfolios, each having at least 1,000 auto policies. The training data contains of a set of auto policies including a number of policy level attributes as well as Annual Premium and Loss Amount.
Project Requirements
The Loss Ratio of a policy is just the Loss Amount divided by the Premium. The Loss Ratio of a portfolio of policies is the sum of all the Loss Amounts of all the policies in the portfolio divided by the sum of all the Premiums in the portfolio. Your target is the natural logarithm of the the loss Ratio of a portfolio.
Skills Required
Python
Pandas
numPy
seaborn
matplotlib
Linear Regression
RandomForestRegressor
Ridge Regression
Lasso Regression
AdaBoostRegressor
DecisionTreeRegressor
winsorize
StandardScaler
Techniques
Major Tasks
- Performing data cleaning, fixing missing values, looking for data summary, correlation, etc.
- Creating visualizations to explore most important features for better prediction rate.
- Performing feature transformation and OneHotEncoding to transform categorical variables into numerical.
- Comparing results of various supervised regression algorithms to achieve lowest possible Mean Absolute Error.
This project's Jupyter Notebook is not available on my Github.