Turning Your Mac Into a Search Engine
Spotlight is one of the more interesting, and (for me), useful features of Apple's Mac OSX. While it's superb with locating documents, programs, and the like, it really shines when it comes to the number if built-in and freely available plugins for use.
So, basically, you have an inbuilt system on Mac OS X which will allow you to search text files, html files, images, Word documents, PDF's, the list goes on and on. The really interesting bit (to me, at least) is that there's nothing to configure or tweak. If you are running a Mac, then everything copied over to the file system is automatically indexed, and made quickly available for searching- even from a web browser.
Setup
All that I am using here is Apache, PHP 5.2, and the command line Spotlight search application “mdfind”. All of these technologies are already built into OSX 10.5 and ready for use. All I ended up doing was creating a web interface to them.
Since Spotlight has its own database (sqlite), there is no need to anyfor any database connection. Also, since all of the importing is handled in the background, there's nothing to write or tweak. It just works. However, if you are wanting a little more control on how Spotlight searches your data, I'd recommend checking out the relevant Apple documentation or this helpful page with more information on Spotlight from the command line.
Web-ifying Search
Well, enough of the gushing about the searching capabilities, how can we turn this puppy into a document search repository?
basically, the PHP script will run a command line query and then output the results back to the screen. The configuration file (conf.php) will let you set the relevant paths so that all the correct links go to the right places.
The only caveat is that if you search for something silly (like the word “the” as the only search term), the script will take a long time for it to return any relevant results back to you. There should be a way to tie in an array of “stop words” to reject those searches, but adding that functionality in is left up to the discretion of the reader.
Install
Anyway, download a copy of it over over here and extract it.
Copy the search/ directory to the default web directory on the mac (normally, its /Library/WebServer/Documents/) so that the full path looks like:
/Library/WebServer/Documents/search
Drop some documents into the /Library/WebServer/Documents/search/documents/ directory. These can be .PDF files, Word documents, etc. Whatever you want to peek at.
Open your web browser and browse to http://localhost/search/index.php and make a search.
That really should be it. If you run into any issues, let me know.
Caveats
I seem to remember a while back having issues with search to function correctly under 10.4. This ended up being permissions issues in the httpd.conf file. 10.4's Apache run as the 'nobody' user. Switching to the 'www' user fixed the issue. If that does not do it for you, try running Apache as a standard user and see if you run into any issues with it.
Enjoy!
tom