Amazon Fine Food Reviews Classification using Natural Language Processing
Abstract :
Amazon is a Commerce website with 178bn USD net sale in 2017. In this blog, I will be explained Natural Language Processing Step by Step and also live Implementation on Amazon Fine Food Reviews. Actually, in this data set, there have lots of lots of reviews which is really impossible for a human to read all reviews one by one and classify them, so this is the problem which occurs and solved with Natural Language Processing (NLP).
Dataset :
This dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.
Data includes:
- Reviews from Oct 1999 — Oct 2012
- 568,454 reviews
- 256,059 users
- 74,258 products
- 260 users with > 50 reviews
I got the dataset from kaggle.com if you want to download this dataset then simply click this kaggle.com link or else type kaggle.com in Google.
An overview of this blog:
- Understanding of Natural Language Processing Briefly
- Data Cleaning and Exploratory Data Analysis (EDA)
- Finding top and most common positive reviews and Negative Reviews
- Finding the relation of the word only by the frequency of their appearance in reviews.
- Final predicting reviews sentiments using Natural Language Processing
What is Natural Language Processing?
Natural Language Processing (NLP) is a computer programming ability to understand human language. NLP is a component of Artificial Intelligence (AI). Developing NLP is really a challenging problem for Computer Programmer. NLP follows some important conceptual mathematical statements and formulas for understanding the human language. I will be explained below that math behind NLP and how NLP actually works. You will see much time and also use this in your daily life, Google Assistant is one of the most popular NLP implementations, when you speak in front of your mobile, google assistant detecting your voice with the help of NLP. Syntactic analysis and semantic analysis are the main techniques used to complete Natural Language Processing tasks.
The Techniques used in NLP :
Converting Text into Vector:
Converting Text into a vector is the first and most important techniques in NLP. Before explaining this let me give you an example for your better understanding. Suppose I have a dataset with three lines it like this
R1 — This food is tasty
R2 — This food is tasty and affordable
R3 — This food is not affordable and not so much tasty
So after reading this dataset, you will understand that R1 & R2 is very much similar than R1 & R3. So then we can write (R1, R2) > (R1, R3). Now if I put this into D- Dimensional space we can say that vector of R1 and R2 must be closure than R1 and R3, and also if the reviews are similar then the distance will smaller, this process is called Simulation. This is the reason why text converted into Vector.