If you are interested in building a news aggregator with Python, you are at the right place. In this tutorial, we will learn how to create a news aggregator that scrapes news articles from multiple sources and consolidates them into a single feed.
Before we dive into the coding aspect of creating a news aggregator, let us first understand what a news aggregator is.
What is a News Aggregator?
A news aggregator is a tool that collects and consolidates news articles from multiple sources into a single feed. News aggregators are a great way to keep up with the latest news and current events without having to browse multiple websites.
Now that we understand what a news aggregator is, let us look at how to build one using Python.
Step 1: Install Required Libraries
To build a news aggregator, we will be using the following libraries:
– BeautifulSoup: For parsing HTML and XML documents
– Requests: For sending HTTP requests to fetch webpages
– Feedparser: For parsing RSS feeds
To install these libraries, open your terminal and type the following commands:
$ pip install beautifulsoup4 $ pip install requests $ pip install feedparser
Step 2: Scrapping News Articles from Websites
Now that we have installed the required libraries, let us look at how to scrape news articles from websites using Python.
In this tutorial, we will be scraping news articles from Reuters and BBC News websites. The code snippet below demonstrates how to scrape news articles from the Reuters website using BeautifulSoup and Requests:
import requests from bs4 import BeautifulSoup url = "https://www.reuters.com/" page = requests.get(url) soup = BeautifulSoup(page.content, "html.parser") articles = soup.find_all('h3', class_='story-title') for article in articles: print(article.text.strip())
Similarly, we can scrape news articles from BBC News website using the following code:
import requests from bs4 import BeautifulSoup url = "https://www.bbc.com/news" page = requests.get(url) soup = BeautifulSoup(page.content, "html.parser") articles = soup.find_all('h3', class_='gs-c-promo-heading__title gel-paragon-bold nw-o-link-split__text') for article in articles: print(article.text.strip())
Step 3: Consolidating Scraped News Articles
Now that we have successfully scraped news articles from multiple websites, let us look at how to consolidate them into a single feed.
In this tutorial, we will be consolidating the scraped news articles into an RSS feed using the Feedparser library. The code snippet below demonstrates how to create an RSS feed:
import feedparser feed_urls = [ "https://feeds.reuters.com/reuters/topNews", "http://feeds.bbci.co.uk/news/rss.xml" ] feed_entries = [] for feed_url in feed_urls: feed = feedparser.parse(feed_url) feed_entries.extend(feed.entries) for entry in feed_entries: print(entry.title) print(entry.link) print(entry.summary)
Step 4: Adding User Interface
Now that we have successfully consolidated the scraped news articles into a single feed, let us look at how to add a user interface to display the feed. In this tutorial, we will be using Flask – a web framework for Python – to create a web interface to display the news feed. The code snippet below demonstrates how to create a basic Flask web application:
from flask import Flask app = Flask(__name__) @app.route("/") def index(): return "Hello World!" if __name__ == "__main__": app.run(debug=True)
Step 5: Displaying Consolidated News Articles on Web Interface
Now that we have successfully created a basic Flask web application, let us look at how to display the consolidated news articles on the web interface.
In this tutorial, we will be using Jinja2 – a modern web template engine for Python – to display the consolidated news articles on the web interface. The code snippet below demonstrates how to display the consolidated news articles on the web interface:
from flask import Flask, render_template import feedparser app = Flask(__name__) feed_urls = [ "https://feeds.reuters.com/reuters/topNews", "http://feeds.bbci.co.uk/news/rss.xml" ] feed_entries = [] for feed_url in feed_urls: feed = feedparser.parse(feed_url) feed_entries.extend(feed.entries) @app.route("/") def index(): return render_template("index.html", entries=feed_entries) if __name__ == "__main__": app.run(debug=True)
In this tutorial, we have learned how to create a news aggregator using Python. We have learned how to scrape news articles from multiple websites, consolidate the scraped news articles into a single feed, and display the consolidated news articles on a web interface.
Want to learn more about Python, checkout the Python Official Documentation for detail.