Portfolio Details

Project Details

Home
Project Details

Project information

Category: Unsupervised
Project date: April, 2019
Project URL: Github

Photo by Ian Parker on Unsplash

Identify Customer Segments

Apply unsupervised learning techniques to identify segments of the population that form the core customer base for a mail-order sales company in Germany. These segments can then be used to direct marketing campaigns towards audiences that will have the highest expected rate of returns. The data and design for this project were provided by Arvato Financial Services and represents a real-life data science task. This project is completed under Udacity Data Scientist Nanodegree Requirement.

Project Requirements

Perform given methods to identify customer segments.

Unsupervised Learning: Follow UnSupervised techniques to derive value from given dataset.
Clustering: Learn the basics of clustering Data, Cluster data with the K-means algorithm
Principal Component Analysis: Reduce the dimensionality of the data using Principal Component Analysis.

Skills Required

Python

pandas

Matplotlib

numPy

Seaborn

PCA

Imputer

StandardScaler

KMeans

Clustering

Techniques

Major Tasks

Preprocessed missing data.
Encoded variables for required features with categorical values.
Performed feature selection, transformation and data cleaning.
Performed dimensionality reduction techniuque PCA to keep maximum variance with minimum possible features.
After running PCA for 70 components, I realised that 52 components are enough to reach 99% of variance.
Achieved 14 clusters using Kmeans clustering method
Compared the customer and demographic cluster distributions(across 14 cluster) to see where the strongest customer base for the company is.

Completed using Jupyter Notebook.