File Scanner has been released!

It's been a pretty long time since I've messed with Python, so I decided to pull out the old books and give it a shot.

So, after a few hours I ended up with File Scanner. From the script…


#######################################################################
# This script iterates over all files on a system (or in a directory- check out the variables), calculates a sha1 hash, and
# injects the value into a local sqlite 3 database. I was inspired somewhat by security applications like Tripwire,
# which effectively perform this to check for file system changes on a computer.
#
# While not tested, this code should compile and work on either a Windows or Linux system with no changes.
# Developed on on OSX, 10.4.8 and Python 2.5.2
#
#######################################################################

It was really the first time I've messed around with Python's SQLite capabilities, so it was more of a test to see how well I can implement it. So far, so good.

If I'm not mistaken, I believe that Python is the first programming language with its own built-in database. This saves a whole bunch of time in implementing configuration files, since you can dump all the really important stuff to its own RDBMS.

I may end up working on this some more, and adding in additional functionality (like communicating with an external web server and shipping the logs over to it for further processing, etc.). At the moment it's just a standalone, one-trick-pony sort of application.

as for performance, pretty dismal. ;)

On my current system (a 1.2Ghz G2 Mac Mini with 1GB of ram), a scan and catalog of just over 20,000 files took about 21 minutes to accomplish:


real 21m14.046s
user 1m33.620s
sys 1m53.687s

sqlite3 database.db
SQLite version 3.1.3
Enter “.help” for instructions
sqlite> select count(*) from system;
20063

While that's pretty slow, this is a pretty slow system, especially when it comes to disk reads / writes. So, a little more modern system can probably scream through it. However, it will take a bit until I migrate the code to something better to see if it does on a non-2-year-old system.

Another thought that came to me while coding it would be to call os.walk() first and grab just the top-level directories. Afterwards, the directory list can be push scans out piece-meal like across multiple threads.

Anyway, all of that is an exercise to the reader. I'm calling this one done for the moment.

Download a copy here: http://idlethreat.com/code/runner.gold.py

Cheers,

tom

Comments are closed.