Powerful search tools in Windows & Mac

If you are an information worker (academic), having great support from powerful search tool is  crucial. Unless you have that sharp searching too, you will have trouble to pick that grain of information from the gigantic jungle of information coded in the form of data, sentences, or books.

There are great tools everywhere; but, some stand out in their capabilities than others.

The three giants in the Windows environment you might need to check are:

  1. Dtsearch (Windows)
  2. X1 search (Windows)
  3. Copernic desktop (Windows)
  4. FoxTrot Professional search (Mac)
  5. ? Devonthink (Mac)

 

Personally, I am not that much fond of Copernic mainly because it has no internal previewing tools; and, it seems to consume too much resource of my machine.

My number 1 pick is DtSearch. It is the best in its class in digging the tiniest of information. The proximity search is an invaluable tool to find associated ideas.

X1 comes closer. It is more of a document manager just like Devonthink in the mac than a specific searching tool. X1 also a wonderful application. It is cheaper than DtSearch.

 

As to FoxTrot, it is quite comparable to the DtSearch. But I like the preview system in FoxTrot even more.

The proximity search in DtSearch requires you to write the distances between the words(phrases) explicitly like Mary w/5 John (‘search Mary and John within the distance of 5 words’); while Foxtrot has a little scrolling window to search within a paragraph, within a sentence or less closer phrases.

One might put DT as a competitor to Foxtrot in the mac. But, I think FT is much superior on the search side while DT rocks for its AI and other organizational tools.

(Note, I don’t like giving links to the products because I don’t want to sound that I want to get a penny by associating them to my small, free, blog….I am dropping these notes because I believe these notes might help somebody out there; not because I have some other agenda. I used to keep these notes in my internal system; i put them out now in case somebody get sth useful out of these notes).

Why you need to split your big PDF books

I have one secret tool that I bust all my class mates when it comes to digging down the nitty-gritties of small pieces of information. 

When we discuss some issue with professors or classmates, sometimes we come up with some wild ideas. We ponder about it; ask if anybody else has thought that before us. What they usually do is google.  I also sometimes google the ideas if anybody else thought them  before us (me). But, the fact of the matter is, google has a lot of noise out there with the same keywords but has little to offer the very specific information I am looking for. 

That is where a internal database comes to rescue. I collect as many books and article into my disk so that I can dig them whenever i want to find out specific ideas. The concept is known by “text mining” in a different camp of linguistics.

 Right now, I have over 2000 books and articles in my disk all of which deal with Theoretical Linguistics. 

If you have a collection of books and articles like me, and tried search a specific phrase into it; using Alfred, Spotlight or Devonthink, you will immediately learn that the biggest book always comes on top regardless of the quality of the material in it. The reason behind it is the word count. The larger the book, the more likely that it contains the queried word multiple times. If you collection specially contains gigantic Encyclopedia books, there is not chance that the short article comes out on top of your search result however relevant the article could be. 

Therefore, to make each small article as competent as any other material; and that your search tools could pick the small articles whenever they are relevant, you need to split the books into article sizes. 

I have experimented with different tools of splitting my books; beginning from Apple’s own Automator to a number of python and Shell scripts. Most of them work by bursting  the book by pages.

Bursting a book into single pages could be feasible when you have less than 1000 books. As you books grow, the bursting creates too many files to manage. In addition, the single pages won’t contain enough material to read within  the search result (FoxTrot for me). That is where splitting in 10-15 (article size) rage becomes crucial. 

Right now, I have shell script that breaks down my books in 10 pages ranges, a script that I adopt from a South African guy (I forget his name; I met him in Acadamic.edu). 

Any book or article I add to Sente library directly gets copied to another folder (using Hazel ) and gets splitted into article size page. All the pages finally move to another folder for ultimate archival; where my searching tools such as Devonthink and Foxtrot index. 

I will come back  to the  the full workflow and the scripts I use to achieve the task in another post.