Collect reddit messages using Graylog

This post covers using a Python script to collect Reddit inbox messages and sending them to Graylog for alerting or reporting.

Graylog is an excellent platform for collecting and storing log data from servers, but that's not the only tasks that it can be suitable for. Objectively, any form of data can be added to Graylog. Once the data is inserted, then you are able to query, alert, and report on it.

In this example, I want to show you how you can use Graylog to monitor your Reddit inbox for messages. You can receive alerts for new messages, even receive alerts for particular words.

Setting Up The Script

I'm going to use Python for this script. You'll be needing a couple of extra libraries to make it work. I used Python 2.7.6 on OS X and it runs without any issue.

Praw - The Python Reddit Api Wrapper
Gelfclient - Simple GELF library for Python

Installing the Libraries

You'll need to run the following on the device you'll be running the Python script on.

# praw install
# read more here about the HTTPS over the API, requiring the second install
# https://www.reddit.com/r/redditdev/comments/2gmzqe/praw_https_enabled_praw_testing_needed/
easy_install praw
pip install git+git://github.com/praw-dev/praw.git@praw3

# gelf client
pip install gelfclient

Install The Script

I've uploaded a copy over at Github. It should be relatively self-explanatory, but I'll go through the highlights, just in case

myUsername - well, it's your reddit username.
myPassword - the password you use to log into reddit.
myGraylogServer - IP address of your running Graylog server. By default, it will be sending messages via UDP to port 12201. If you need to change that, check out line 17 or so (the UdpClient line). You'll change the port there.

Running the Script

When the script runs, it will look for any new messages in your inbox. If there are no new messages, then the script quits.

If there are new messages, it will split the message up into a number of fields:

message_id - the ID field that reddit reports back from API
author - who sent you the message
subject - message subject
short_message/full_message - basically, the full text of the message (to my understanding, short_message cuts off at 200 characters)
link - script-created link directly back to the individual comment

Script will mark all messages as read after it finishes reading them. Just a heads up.

Viewing in Graylog

Here's an example view from Graylog showing message received. Looks good!

Graylog Alerting

Let's say that, you're happy to get your messages into the logging server, but are only interested in certain keywords. Perhaps someone's a little upset and starts ranting? Perhaps cursing a bit? No worries. Graylog can handle that, too.

Set yourself up a message stream. Now, we'll set ourselves up a regex only to look for something we know about... How about seven dirty words?

Since we're unsure where those dirty words will come from, we'll need to make two entries, one for the subject field, and another for the full_message field. Just to make sure we have our bases covered. It will look something like this once you're done.

Once you have a stream setup, it's trivial to create an alert whenever the messages arrive.

In Closing

So, this is just a basic overview of how you can use a Graylog to capture actionable information, and even alert off of it's contents. Anything that exposes an API (or is easy to scrape) can be a source for your logging needs.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search