📅 August 30, 2019
👷 Chris Power
I’m still working on my side-project where I’m gathering information around the web. I’m eventually going to use this information in a weekly aggregate newsletter for Real Estate Investing and Property Management. If you’re curious, The Newsletter is Here. For this part of the project, I’m going to scrape some of Reddit’s API to find interesting Real Estate and Landlord Posts.
There is only one package you need to successfully scrape the reddit API in NodeJS: snoowrap.
Snoowrap is a “fully featured javascript wrapper for the Reddit API” — quote taken from the github repo’s index page. Snoowrap is really great, and it allows you to query posts, comments, scores, etc…
All of the responses are wrapped in their own little objects as well, and its all fairly well documented. Also, if you’re using an IDE like Webstorm, you can easily auto-complete the functions and classes because of really great type definitions in the project.
Install Snoowrap just like any other npm package in NodeJS:
npm install snoowrap --save
and require it:
var snoowrap = require('snoowrap');
Before making any calls to the Reddit API, you have to go through an initial setup for oauth2 to generate an app, and tokens. This is fairly straightforward, but requires a few steps.
https://not-an-aardvark.github.io/reddit-oauth-helper/
https://www.reddit.com/prefs/apps/
and create a new app. It should generally look like this:
Next, go back to https://not-an-aardvark.github.io/reddit-oauth-helper/, select the permissions you want, and generate your tokens.
Now, you can configure the snoowrap object in your script.
const r = new snoowrap({
userAgent: 'A random string.',
clientId: 'Client ID from oauth setup',
clientSecret: 'Client Secret from oauth setup',
refreshToken: 'Token from the oauth setup'
});
Now that you’re all set up with snoowrap (great job, you smart developer you). You can query reddit’s API in NodeJS with a script similar to the one below:
import snoowrap from 'snoowrap';
export async function scrapeSubreddit() {
const r = new snoowrap({
userAgent: 'A random string.',
clientId: 'Client ID from oauth setup',
clientSecret: 'Client Secret from oauth setup',
refreshToken: 'Token from the oauth setup'
});
const subreddit = await r.getSubreddit('realEstate');
const topPosts = await subreddit.getTop({time: 'week', limit: 3});
let data = [];
topPosts.forEach((post) => {
data.push({
link: post.url,
text: post.title,
score: post.score
})
});
console.log(data);
};
The ☝️ script above outputs the top 3 posts from Reddit’s RealEstate API. Pretty neat right? I thought this was a fun experience, and I really love how Snoowrap works. Now I can use this data to flesh out the newsletter I’m making, again, if your curious, you can check it out here.
Thank you, have a nice day!
We're trusted by large, medium, and small companies all over the world
Have something you're working on?
Tell Us About It