I use searching tools a lot in my classes and during my own research.
When we discuss some issue with my classmates, for example, we sometimes come up with some wild ideas. We ponder about it; ask if anybody else has thought that before us. We google the idea if anybody else explained it on the web. But, the fact of the matter is, google has a lot of noise out there with the same keywords but has little to offer on very specific information in our field.
That is where a internal database comes to rescue. I collect as many books and articles into my disk so that I can dig them whenever i want to learn about a specific idea. The concept is known as “text mining” in a different camp of linguistics.
Right now, I have over 2000 books and articles in my disk all of which deal with Theoretical Linguistics.
If you have a collection of books and articles like me and tried to search a specific phrase into it; using Alfred, Spotlight or Devonthink, you will immediately learn that the big books always come on top rank. The reason behind it is the word count. The larger the book, the more likely that it contains the queried word multiple times. The huge Encyclopedia books are especially like a virus to my database because they are the ones which always top the search result.
Therefore, to make result balanced, and make the smaller as visible as any other material in the database, I split the large documents into smaller chunks.
I have experimented with different tools of splitting; beginning from Apple’s own Automator to a number of python and Shell scripts. Most of them work by bursting each page.
Bursting a book into single pages could be feasible when you have a smaller number of books. As the number of books grows, the bursting floods your drive with documents. In addition, the single pages won’t contain enough material to read from the search result itself (FoxTrot preview window, for example). That is where splitting to around 50 page range turn out to be very useful.
Right now, I have shell script that breaks down my books in 15 pages ranges.
Any book or article I add to Bookends library directly gets copied to another folder (using Hazel ) and gets splitted into the 50-page ranges. All the pages finally move to another folder for ultimate archival; where my searching tools such as Devonthink and Foxtrot index.
Once you have done the splitting and indexing, every small article is as visible as the large document. You will be able to find most relevant article for the searched term regardless of its size.