33
Detect fake news headlines with python
First things first, import these libraries:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
For this machine learning project, I'll be using this dataset for training our model to detect real or fake news headlines.
Now to start working with the data, load in the dataset and define the x and y variables
data = pd.read_csv("news.csv")
x = np.array(data["title"])
y = np.array(data["label"])
x will be defined as the news headlines which we'd like our model to be trained and tested on
y will be the label( Fake or Real ) which we are going to predict
y will be the label( Fake or Real ) which we are going to predict
Next, add these lines of code to your script:
cv = CountVectorizer()
x = cv.fit_transform(x)
"WTH?" you might ask
To put it simply:
To put it simply:
To make, train and test our model, add these lines of code to your script:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = MultinomialNB()
model.fit(xtrain, ytrain)
Let me explain...

Now to predict wether a news headline is real or not, add these lines of code to your script:
news_headline = "Atlantis discovered under the Atlantic Ocean!"
data = cv.transform([news_headline]).toarray()
print(news_headline)
print(model.predict(data))
Now let's take a random news headline from bbc news and see if our model classifies it as real:
news_headline = "Kathy Hochul: Who is New York's first female governor?"

Now of course, this model is not perfect
News headlines change all the time, and even though the dataset which we are using to train our model is a whopping 30MB worth of plain text, it is only about 50% accurate.
If you add
News headlines change all the time, and even though the dataset which we are using to train our model is a whopping 30MB worth of plain text, it is only about 50% accurate.
If you add
print(model.score(xtest, ytest))
to your script, you'll see that the accuracy score is ~80%, even though I've tested 40 news headlines from last week and got a 50% to 60% accuracy, that's because news headlines, news headline vocabulary and news headline topics change all the time.If you're a beginner who likes discovering new things about python, try my weekly python newsletter

Byeeeeeđź‘‹
33