I have one secret tool that I bust all my class mates when it comes to digging down the nitty-gritties of small pieces of information.
When we discuss some issue with professors or classmates, sometimes we come up with some wild ideas. We ponder about it; ask if anybody else has thought that before us. What they usually do is google. I also sometimes google the ideas if anybody else thought them before us (me). But, the fact of the matter is, google has a lot of noise out there with the same keywords but has little to offer the very specific information I am looking for.
That is where a internal database comes to rescue. I collect as many books and article into my disk so that I can dig them whenever i want to find out specific ideas. The concept is known by “text mining” in a different camp of linguistics.
Right now, I have over 2000 books and articles in my disk all of which deal with Theoretical Linguistics.
If you have a collection of books and articles like me, and tried search a specific phrase into it; using Alfred, Spotlight or Devonthink, you will immediately learn that the biggest book always comes on top regardless of the quality of the material in it. The reason behind it is the word count. The larger the book, the more likely that it contains the queried word multiple times. If you collection specially contains gigantic Encyclopedia books, there is not chance that the short article comes out on top of your search result however relevant the article could be.
Therefore, to make each small article as competent as any other material; and that your search tools could pick the small articles whenever they are relevant, you need to split the books into article sizes.
I have experimented with different tools of splitting my books; beginning from Apple’s own Automator to a number of python and Shell scripts. Most of them work by bursting the book by pages.
Bursting a book into single pages could be feasible when you have less than 1000 books. As you books grow, the bursting creates too many files to manage. In addition, the single pages won’t contain enough material to read within the search result (FoxTrot for me). That is where splitting in 10-15 (article size) rage becomes crucial.
Right now, I have shell script that breaks down my books in 10 pages ranges, a script that I adopt from a South African guy (I forget his name; I met him in Acadamic.edu).
Any book or article I add to Sente library directly gets copied to another folder (using Hazel ) and gets splitted into article size page. All the pages finally move to another folder for ultimate archival; where my searching tools such as Devonthink and Foxtrot index.
I will come back to the the full workflow and the scripts I use to achieve the task in another post.