Scraping Reddit's API in NodeJS with Snoowrap

📅 August 30, 2019

👷 Chris Power

This is part 2 of a series of posts about scraping various sites/APIs with Node. You can find part 1 here.


I’m still working on my side-project where I’m gathering information around the web. I’m eventually going to use this information in a weekly aggregate newsletter for Real Estate Investing and Property Management. If you’re curious, The Newsletter is Here. For this part of the project, I’m going to scrape some of Reddit’s API to find interesting Real Estate and Landlord Posts.

The Tooling

There is only one package you need to successfully scrape the reddit API in NodeJS: snoowrap.

Snoowrap is a “fully featured javascript wrapper for the Reddit API” — quote taken from the github repo’s index page. Snoowrap is really great, and it allows you to query posts, comments, scores, etc…

All of the responses are wrapped in their own little objects as well, and its all fairly well documented. Also, if you’re using an IDE like Webstorm, you can easily auto-complete the functions and classes because of really great type definitions in the project.

Installing snoowrap

Install Snoowrap just like any other npm package in NodeJS:

npm install snoowrap --save

and require it:

var snoowrap = require('snoowrap');

Setting up Snoowrap

Before making any calls to the Reddit API, you have to go through an initial setup for oauth2 to generate an app, and tokens. This is fairly straightforward, but requires a few steps.

  1. go to https://not-an-aardvark.github.io/reddit-oauth-helper/ and note the redirect URL you must use when creating your reddit app (the thing you use to call the API). As of this writing, the URL is: https://not-an-aardvark.github.io/reddit-oauth-helper/
  2. go to https://www.reddit.com/prefs/apps/ and create a new app. It should generally look like this:

new web app on rediit

Note the redirect URI


  1. Next, go back to https://not-an-aardvark.github.io/reddit-oauth-helper/, select the permissions you want, and generate your tokens.

  2. Now, you can configure the snoowrap object in your script.

  const r = new snoowrap({
    userAgent: 'A random string.',
    clientId: 'Client ID from oauth setup',
    clientSecret: 'Client Secret from oauth setup',
    refreshToken: 'Token from the oauth setup'
  });

The Script for querying RealEstate subreddit

Now that you’re all set up with snoowrap (great job, you smart developer you). You can query reddit’s API in NodeJS with a script similar to the one below:

import snoowrap from 'snoowrap';

export async function scrapeSubreddit() {
  const r = new snoowrap({
    userAgent: 'A random string.',
    clientId: 'Client ID from oauth setup',
    clientSecret: 'Client Secret from oauth setup',
    refreshToken: 'Token from the oauth setup'
  });

  const subreddit = await r.getSubreddit('realEstate');
  const topPosts = await subreddit.getTop({time: 'week', limit: 3});

  let data = [];

  topPosts.forEach((post) => {
    data.push({
      link: post.url,
      text: post.title,
      score: post.score
    })
  });
  
  console.log(data);
};

Conclusion

The ☝️ script above outputs the top 3 posts from Reddit’s RealEstate API. Pretty neat right? I thought this was a fun experience, and I really love how Snoowrap works. Now I can use this data to flesh out the newsletter I’m making, again, if your curious, you can check it out here.

Thank you, have a nice day!

Lets Work Together

We're trusted by large, medium, and small companies all over the world

Have something you're working on?

Tell Us About It