Project information

  • Category: Unsupervised
  • Project date: April, 2019
  • Project URL: Github
Photo by Ian Parker on Unsplash

Identify Customer Segments

Apply unsupervised learning techniques to identify segments of the population that form the core customer base for a mail-order sales company in Germany. These segments can then be used to direct marketing campaigns towards audiences that will have the highest expected rate of returns. The data and design for this project were provided by Arvato Financial Services and represents a real-life data science task. This project is completed under Udacity Data Scientist Nanodegree Requirement.

Project Requirements

Perform given methods to identify customer segments.

  • Unsupervised Learning: Follow UnSupervised techniques to derive value from given dataset.
  • Clustering: Learn the basics of clustering Data, Cluster data with the K-means algorithm
  • Principal Component Analysis: Reduce the dimensionality of the data using Principal Component Analysis.

Skills Required

Python
pandas
Matplotlib
numPy
Seaborn
PCA
Imputer
StandardScaler
KMeans
Clustering

Techniques

Major Tasks

  • Preprocessed missing data.
  • Encoded variables for required features with categorical values.
  • Performed feature selection, transformation and data cleaning.
  • Performed dimensionality reduction techniuque PCA to keep maximum variance with minimum possible features.
  • After running PCA for 70 components, I realised that 52 components are enough to reach 99% of variance.
  • Achieved 14 clusters using Kmeans clustering method
  • Compared the customer and demographic cluster distributions(across 14 cluster) to see where the strongest customer base for the company is.

Completed using Jupyter Notebook.