Wine Quality Prediction

After getting the fundamentals clear from the courses, I knew it was time to jump right in into a machine learning project. But what exactly was in a machine learning project?

After looking at the above video by Google, I started off on my project right away.

The first step was to gather and clean the data for my project.

Fortunately, the wine data is available in the SKLearn library and so I just had to import it from there.

Thereafter, I wanted to better understand the data so I put it in a pandas dataframe and saw the first 5 data points in the dataset. The wine dataset has many characteristics which include alcohol, malic acid, ash, alkalinity of ash, magnesium and any more. The wine dataset also had this column called Target with values only of 1 to 3. Reading up on the dataset, each number represented a class of wine with certain characteristics. Each data point in the wine dataset belonged to one of the classes.

After looking at the graphs of the data points of each characteristic of wine, I realised that the alcohol levels of the wines were very distinctly putting each wine to 3 separate classes.

Each of these classes had values 0,1 or 2.

So, the problem had now become on predicting if the wines belonged to classes 0, 1 or 2.

So, now I split the data into training and test sets and scaled the 2 sets to each other.

I then used the k neighbors classifier algorithm by letting k=6.

It turns out k=6 had highest accuracy. I fit the model to the training set and tested it out with the test set. I turns out, this gives an accuracy of 0.83 or 83%.

I was now able to predict the classes of wines with pretty good an accuracy.

To conclude off, I just want to roughly share what my understanding is of the k neighbors classifier. How the k neighbors classifier works in this case is that, when a new point is given to the classifier, it plots it onto a horizontal 1-dimensional graph where the previous alcohols were already plotted on based on its alcohol level. It then finds the k nearest points to it on the horizontal graph, depending on what k you give it, and gives a class that is the majority amongst the "neighbor points".

So, that is my first machine learning project. I am eager to try new machine learning algorithms.