14
How NLP and ML Can Fight News Bias And Misinformation
Over half a billion people in the world consume news online. A number that is increasing rapidly as more individuals gain access to the internet. While open access to information has been an incredible breakthrough of the digitized world, the democratization of content creation and distribution has also led to a rapid spread of false and highly biased information. The spread of disinformation via social media has the power to change elections, strengthen conspiracy theories, and sow discord.
Using examples from the 2016 US election, H. Allcott, and M. Gentzkow, in their article Social Media and Fake News in the 2016 Election, suggested that ‘one fake news article was about as persuasive as one TV campaign ad’ and has the potential to impact close political battles. The ubiquity of social media only makes the matter worse!
As mentioned earlier, social media is where the web of fake news multiplies by several folds, which is why Facebook, Google, Twitter, and YouTube have come together to limit and eliminate misinformation regarding the coronavirus pandemic and push official guidelines on their platforms.
Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article.
Here is when technologies come into play. In the last years, we have seen the introduction of many new natural language processing innovations that will influence the way news transparency is kept. While AI technology can be used to create misinformation, it also helps to combat it.
The ongoing challenge is to develop an automated system that can help the general public assess the credibility of web content. The key caveat here is that the AI system must provide explainable results. This highlights the need for hybrid approaches that combine the best of deep learning-based approaches and symbolic knowledge graphs to build a human-like understanding of language at scale. This establishes a necessary level of trust between large platforms, fact-checkers, and the general public — as well as other stakeholders like policymakers, journalists, webmasters, and influencers.
A tried technique “Scoring web pages” is a method pioneered by the tech giant Google. Google takes the accuracy of facts presented to score web pages. The technology has grown in significance as it makes an attempt to understand pages’ context without relying on third-party signals.
After COVID-19, startups like NewsRoom and Mavin have been using AI to detect and monitor fake news in real-time with the help of Omdena.
In one of Omdena’s AI engineer’s own words: “To assist us in this process, The NewsRoom provided us with an unlabeled dataset containing ~240K scraped news articles and suggested several published labeled datasets. 240k articles!!!!!!!”
The first goal was to classify those articles into 3 categories: Hate Speech, Clickbait, and Political bias.
Then the second goal was to train Machine Learning models to assign a “Trust Score” to any given news article.
Their final product is a Google Chrome extension that flags the “Trust Score” while reading news articles online.
The chrome extension will provide an article summary that recapitulates the news article by providing an overall news score, transparency scores for Hate speech, clickbait, and political bias, and a score for claim verification (reliable information).
When a user visits a news article, the extension takes that article as input and prepares a report in the back-end. When the user clicks on the extension, it visualizes the report and provides additional options to interact with the extension.
A demonstration of how the extension looks like is in the figure below.
Looking at misinformation from another perspective, in 2019, Financial misinformation resulted in $17 billion in damages. Readers and organizations want to know what content to trust while publishers want to ban bad bots, trolls, and engage with high-profile users to drive the discussion and traffic. Another challenge in detecting misinformation at scale is that false claims can appear in countless variations over time.
Identifying such news online is a daunting task.
Fortunately, there are a number of computational techniques that can be used to mark certain articles as fake on the basis of their textual content.
To deal with this kind of misinformation and to take AI solutions to a higher technical level, using NLP is a key solution. Scraping articles, Google search results, Tweets, and other social media posts to build clean and unique keywords data. Gathering those specific keywords data and building Machine Learning models to identify a “Bias Score” in online articles.
Another kind of score is a “Perplexity Score”. This will predict whether the sentences comprising the article make sense. (e.g. “a book is on my desk” makes more sense than “a book is in my desk”, and even more sense than “a desk is in my book”).
Fake news is now a growing menace in the media world. With artificial intelligence and big data showing the way to tackle fake news items, the belief that truth will dawn on the reader gets stronger by the day. But, this is just the beginning.
We are exploring more true potentials of artificial intelligence in combating fake news. The future holds good for more sophisticated tools that harness the power of artificial intelligence, big data, and machine learning to stop fake news from making ripples in the user world.
14