Project information

  • Category: NLP
  • Project date: Feb, 2020
  • Project URL: Github
  • Project Website: BigView
Photo of me taken at Pearl Hack Event

Pearl Hack - Annual hackathon at UNC Chapel Hill

Simplifying Daily life buying decisions by getting insights from online reviews faster.

Example: You are in a mood to eat Butter Chicken from an Indian Restaurant but that restaurant rating is 4.1. People have given some bad reviews recently on some vegetarian menu about taste. In that moment you only care about butter chicken reviews, Now imagine instead of going through all the reviews you can use this functionality where you can see whether butter chicken is in positive reviews category or negative and what people exactly saying about it. This is what we are trying to achieve through this BigView idea, "Making Decision Faster" for everyone.

Project Requirements

Shopping online is part of our daily life and looking at online reviews to make a decision to buy that product or not is always overwhemling. That's why we came up with an Idea to bring insights from textual data. Whether you want to read Google Reviews before going to a Restaurant or choosing a product online on Amazon.com, reading reviews either make our mind or change it. But most of the time online ratings can not tell the whole story of "how a product is doing". Adding a new functionality called BigView where a user would be able to see insights from reviews and make a decision faster rather than going through hundreds of reviews.

Skills Required

Python
Pandas
Matplotlib
numPy
seaborn
NLP
WordCloud

Techniques

We are implementing this idea on "Cheesecake Factory" yelp reviews. Number of reviews are 465 and you can see the output of our implementation in python code.

  • Fetch only cheesecake reviews from huge yelp dataset. Performing data cleaning and transforming data using pandas.
  • Text Cleaning, Normalization, Tokenization, Stopword removal, Bag of Words etc.
  • Generating Word Clouds on textual reviews, Barplots on Bag of Words and overall timeline of average ratings.

We are using NLP library to bring insights from Textual reviews data. Python as backbone coding language, pandas for data cleaning and transforming. WordCloud library to generate wordclouds. Matplotlib and Seaborn for data visualizations.

Challenges

  • Getting real reviews data. Picking a sample dataset with enough reviews from huge yelp dataset.
  • Cleaning and transforming text using NLP.
  • Creating a front-end to bring insights from text.

Completed using Jupyter Notebook.