Scrape Bing News using Python

Contents: intro, imports, what will be scraped, process, code, links, outro.

Intro

Well, hello there to people who came from the last Bing series! This blog post is a continuation of Bing's web scraping series and contains info about how to scrape Bing News results using Python. An alternative solution will be shown after the first block of code.

Imports

import requests
import lxml
from bs4 import BeautifulSoup
from serpapi import GoogleSearch

What will be scraped

Process

The process is straight-forward. SelectorGadget Chrome extension was to grab CSS selectors.

The following GIF illustrates how to get CSS selectors of the Title, URL, Snippet, Source website, and when news has been posted.

Code

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.bing.com/news/search?q=faze+clan', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')

for result in soup.select('.card-with-cluster'):
    title = result.select_one('.title').text
    link = result.select_one('.title')['href']
    snippet = result.select_one('.snippet').text
    source = result.select_one('.source a').text
    date_posted = result.select_one('#algocore span+ span').text
    print(f'{title}\n{link}\n{source}\n{date_posted}\n{snippet}\n')

# part of the output:
'''
FaZe Clan shows off new execute for Mirage against Furia Esports
https://win.gg/news/8521/faze-clan-shows-off-new-execute-for-mirage-against-furia-esports
WIN.gg
2h
During a match against Team Furia in the Gamers Without Borders Cup, the camera spotted an interesting interaction between ...
'''

SerpApi is a paid API with a free trial of 5,000 searches.

from serpapi import GoogleSearch
import json

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "bing_news",
  "q": "faze clan"
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results['organic_results']:
    print(json.dumps(result, indent=2, ensure_ascii=False))

# part of the output:
'''
{
  "title": "FaZe Clan shows off new execute for Mirage against Furia Esports",
  "link": "https://win.gg/news/8521/faze-clan-shows-off-new-execute-for-mirage-against-furia-esports",
  "snippet": "During a match against Team Furia in the Gamers Without Borders Cup, the camera spotted an interesting interaction between ...",
  "source": "WIN.gg",
  "date": "2h",
  "thumbnail": "https://serpapi.com/searches/60d82f308ccee022b4ab7525/images/62e054f4209c882415dd75f5245f96d23bd4c1538d707fb513a0918671c831d7.jpeg"
}
'''

Link

Outro

If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.

Yours,
Dimitry, and the rest of SerpApi Team.

21