34
Sentiment Analysis With Python. Making Your First Sentiment Analysis Script.
Do you want to perform sentiment analysis with Python but don't know how to get started? Not to worry. In this article, I'll demonstrate and explain how you can make your own sentiment analysis app, even if you are new to Python.
If you've been following programming and data science, you'll probably be familiar with sentiment analysis. If you're not, here the definition:
The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.
Sentiment analysis programs have become increasingly popular in the tech world. It's time you make one for yourself!
Before I get on with the article, I'd like to recommend Educative for learners like you.
Why Educative?
It is home to hundreds of development courses, hands on tutorials, guides and demonstrations to help you stay ahead of the curve in your development journey.
You can get started with Educative here.
Let's make a simple sentiment analysis script with Python. What will it do?
It will:
- Scrape news headlines from BBC news.
- Get rid of unwanted scraped elements and duplicates.
- Scan every headline for words that may indicate it's sentiment.
- Based on the found words, determine each headline's sentiment.
- Aggregate the headlines into different arrays based on their sentiment.
- Print the number of scraped headlines and number of headlines with a positive, negative and neutral sentiment.
Create a new Python file with your favorite text-editor. You can name it however you want, but I'll name the file main.py for this tutorial.
Before writing the main code, make sure to install(if not already installed) and import the following libraries.
import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
A sentiment analysis script needs a dataset to train on.
Here's the dataset that I made for this script. I've tested it and found it to work well.
To work with this tutorial, make sure to download this dataset, move it into your Python file's directory and add the following code to your Python file.
df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
If you take a look at this dataset, you'll notice that it's just over 100 lines long. Each line contains a number, 1 or 0 and a word.
The number just gives a way for the Python file to paddle through each word, the word is what is going to indicate a headline's sentiment, and the 1 or 0 indicates whether the word has negative(0) or positive(1) sentiment.
This isn't a lot, but it is enough to perform accurate sentiment analysis on news headlines, which are typically only about 6-10 words long.
Here's the code that is going to scrape the news headlines:
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
As this is not a web scraping tutorial, you don't have to understand what's happening here. In case you are interested in how this works, here's a tutorial on how to scrape news headlines with Python in <10 lines of code.
Before performing sentiment analysis on the scraped headlines, add the following code to your Python file.
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
The unwanted array contains elements that will be scraped from BBC news, that are not news headlines.
Full Code:
import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
It's time to write the code which will perform sentiment analysis on the scraped headlines.
Add the following code to your Python file.
neutral = []
bad = []
good = []
for x in headlines:
if x.text.strip() not in unwanted and x.text.strip() not in news:
news.append(x.text.strip())
Here's what this code does:
- First, it defines the neutral, bad and good arrays.
- While paddling through every scraped headline element, it checks if it's not inside the unwanted and news array.
- It appends the headline to the news array.
The reason why it checks if the headline is in the unwanted and news array is to exclude non-headline elements and prevent duplicate headlines to be analyzed more than once.
Full Code:
import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
if x.text.strip() not in unwanted and x.text.strip() not in news:
news.append(x.text.strip())
Now, let's perform sentiment analysis on the news headlines by adding the following code to the if x.text.strip() not in unwanted and x.text.strip() not in news:
condition.
for i in range(len(df['n'])):
if sen[i] in x.text.strip().lower():
if cat[i] == 0:
bad.append(x.text.strip().lower())
else:
good.append(x.text.strip().lower())
Here's what this code does:
- First, the
for i in range(len(df["n"])):
loop makes sure to search the headlines for any of the words in the sentiment.csv dataset. - If a word from the dataset is found in the headline using the
if sen[i] in x.text.strip().lower():
condition, theif cat[i] == 0:
condition then finds if the found word has a negative or positive sentiment and adds the headline to either the bad or good array.
The lower()
function converts all the letters inside the headlines to lowercase. This is done because the word search algorithm is case sensitive.
Full Code:
import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
if x.text.strip() not in unwanted and x.text.strip() not in news:
news.append(x.text.strip())
for i in range(len(df['n'])):
if sen[i] in x.text.strip().lower():
if cat[i] == 0:
bad.append(x.text.strip().lower())
else:
good.append(x.text.strip().lower())
Add the following code to the end of your Python file.
badp = len(bad)
goodp = len(good)
nep = len(news) - (badp + goodp)
print('Scraped headlines: '+ str(len(news)))
print('Headlines with negative sentiment: ' + str(badp) + '\nHeadlines with positive sentiment: ' + str(goodp) + '\nHeadlines with neutral sentiment: ' + str(nep))
This will print the number of scraped headlines and the number of headlines with a bad, good and neutral sentiment.
Here's the full sentiment analysis code:
import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
url='https://www.bbc.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
if x.text.strip() not in unwanted and x.text.strip() not in news:
news.append(x.text.strip())
for i in range(len(df['n'])):
if sen[i] in x.text.strip().lower():
if cat[i] == 0:
bad.append(x.text.strip().lower())
else:
good.append(x.text.strip().lower())
badp = len(bad)
goodp = len(good)
nep = len(news) - (badp + goodp)
print('Scraped headlines: '+ str(len(news)))
print('Headlines with negative sentiment: ' + str(badp) + '\nHeadlines with positive sentiment: ' + str(goodp) + '\nHeadlines with neutral sentiment: ' + str(nep))
Now if you run your Python file containing the above code, you will see an output similar to the below.
I hope that this tutorial has successfully demonstrated how you can perform sentiment analysis with Python.
Byeeee👋
34