News Article Aggregation

Photo by Andrew Neel on Unsplash

News Article Aggregation

How I built Blokfeed: Part 2

ยท

2 min read

Hi Everyone!

Welcome back to part 2 of how I built Blokfeed. In my previous article I covered why I built Blokfeed, but now let's get into the how.

Let's jump right in... ๐Ÿšดโ€โ™‚๏ธ

Before I could start building out the features for Blokfeed I had to get data. Luckily many news sources use RSS (Really Simple Syndication) feeds to syndicate their articles. Which means I simply need to poll the RSS feeds for the latest news articles.

Awesome - I have a way to collect news articles... Next question how often do I want to pull articles and how am I going to schedule this? ๐Ÿ’ก AWS Lambdas. Lambdas have the ability to run on a schedule which allows me to determine the frequency of data collection. Perfect. Let's get coding.

I personally use the Serverless Framework to help spin up lambdas. You can find more information on their site.

Now that I have a framework I started working on a parser for the rss feed... I Found this rss-parser soon after starting to help with the parsing.

Ok - building blocks are in place... let's diagram what I am going to build to start aggregating news articles.

image.png

As you can see it's a fairly straight forward ETL process where we fetch (extract) the data from the RSS feeds, process (transform) and upsert (load) into our DB.

While transforming the RSS feed data into a structure Blokfeed uses, I enriched the data for monitoring and filtering purposes.

Following the data transformation, I upsert the article information meaning the system will either update or insert a record depending on various conditions. An easy way to do this is to create a hashId which would be unique to that article.

There you have it - a simple ETL process to start collecting news articles.

What's next? ๐Ÿค” I have articles, how am I going to view them?

Part 3 - Building a website to view articles.

Let me know in the comments what you think of these overviews?

Interested in code? If there is enough interest I can start working on examples of how you can build a similar ETL process using AWS Lambdas.

ย