Powerful search tools in Windows & Mac

If you are an information worker (academic), having great support from powerful search tool is  crucial. Unless you have that sharp searching too, you will have trouble to pick that grain of information from the gigantic jungle of information coded in the form of data, sentences, or books.

There are great tools everywhere; but, some stand out in their capabilities than others.

The three giants in the Windows environment you might need to check are:

  1. Dtsearch (Windows)
  2. X1 search (Windows)
  3. Copernic desktop (Windows)
  4. FoxTrot Professional search (Mac)
  5. ? Devonthink (Mac)

 

Personally, I am not that much fond of Copernic mainly because it has no internal previewing tools; and, it seems to consume too much resource of my machine.

My number 1 pick is DtSearch. It is the best in its class in digging the tiniest of information. The proximity search is an invaluable tool to find associated ideas.

X1 comes closer. It is more of a document manager just like Devonthink in the mac than a specific searching tool. X1 also a wonderful application. It is cheaper than DtSearch.

 

As to FoxTrot, it is quite comparable to the DtSearch. But I like the preview system in FoxTrot even more.

The proximity search in DtSearch requires you to write the distances between the words(phrases) explicitly like Mary w/5 John (‘search Mary and John within the distance of 5 words’); while Foxtrot has a little scrolling window to search within a paragraph, within a sentence or less closer phrases.

One might put DT as a competitor to Foxtrot in the mac. But, I think FT is much superior on the search side while DT rocks for its AI and other organizational tools.

(Note, I don’t like giving links to the products because I don’t want to sound that I want to get a penny by associating them to my small, free, blog….I am dropping these notes because I believe these notes might help somebody out there; not because I have some other agenda. I used to keep these notes in my internal system; i put them out now in case somebody get sth useful out of these notes).

Why you need to split your big PDF books

I have one secret tool that I bust all my class mates when it comes to digging down the nitty-gritties of small pieces of information. 

When we discuss some issue with professors or classmates, sometimes we come up with some wild ideas. We ponder about it; ask if anybody else has thought that before us. What they usually do is google.  I also sometimes google the ideas if anybody else thought them  before us (me). But, the fact of the matter is, google has a lot of noise out there with the same keywords but has little to offer the very specific information I am looking for. 

That is where a internal database comes to rescue. I collect as many books and article into my disk so that I can dig them whenever i want to find out specific ideas. The concept is known by “text mining” in a different camp of linguistics.

 Right now, I have over 2000 books and articles in my disk all of which deal with Theoretical Linguistics. 

If you have a collection of books and articles like me, and tried search a specific phrase into it; using Alfred, Spotlight or Devonthink, you will immediately learn that the biggest book always comes on top regardless of the quality of the material in it. The reason behind it is the word count. The larger the book, the more likely that it contains the queried word multiple times. If you collection specially contains gigantic Encyclopedia books, there is not chance that the short article comes out on top of your search result however relevant the article could be. 

Therefore, to make each small article as competent as any other material; and that your search tools could pick the small articles whenever they are relevant, you need to split the books into article sizes. 

I have experimented with different tools of splitting my books; beginning from Apple’s own Automator to a number of python and Shell scripts. Most of them work by bursting  the book by pages.

Bursting a book into single pages could be feasible when you have less than 1000 books. As you books grow, the bursting creates too many files to manage. In addition, the single pages won’t contain enough material to read within  the search result (FoxTrot for me). That is where splitting in 10-15 (article size) rage becomes crucial. 

Right now, I have shell script that breaks down my books in 10 pages ranges, a script that I adopt from a South African guy (I forget his name; I met him in Acadamic.edu). 

Any book or article I add to Sente library directly gets copied to another folder (using Hazel ) and gets splitted into article size page. All the pages finally move to another folder for ultimate archival; where my searching tools such as Devonthink and Foxtrot index. 

I will come back  to the  the full workflow and the scripts I use to achieve the task in another post. 

Workflow with Sente, Devonthink, Scrivener using Hazel and Dropbox as glue: part 2

On Mirroring

In this second post, I am going to talk about a method, rather than a tool (software). I call the method “mirroring”. The method is a complementary approach for syncing. I generally like syncing files across my macs and iOS devices. The problem is: syncing is possible only when the app developers offer it. For Sente, for example, you can sync your Sente library to your Sente in IOS. But, you can not do so to other applications such as Devonthink; or Scrivener. The tags in Sente are not visible in Finder; and the notes and annotations, all are specific to the application. It is a locked application in that sense. Most reference managers are lock-down applications, unfortunately. I would be wise to avoid them; but they facilitate workflow.

 

Therefore, since I am relying on Sente and other locked applications, for my work flow, mirroring is a way around the locking weakness. What do I mirror? I mirror my projects.

My works are project based. I move from one project to another; writing small articles and developing small pieces of works for my dissertation is what I am doing, and will be doing for the next two years. I already talked about how I organize my PDF files based on projects. How do I mirror it? I mirror my project inside Sente to Finder by creating a folder. For example: if I am working on a project called “Object Shift”; i will have a tag in sente with the same name. All the PDF files that I will need to read will be tagged “Object Shift”. Look at the following picture: ppic82 When I double click the Tag, Sente hooks me to what I call the  project mood. The project mood is my favorite mood for reading in Sente. It also helps me to see the relationships and differences between the papers. ppic83

 

Now, I have all the papers I believe are important for the project. I then read and annotate them as fast as I can; and export the annotations to a Folder in Finder. The folder I create inside Dropbox is a mirrored folder; with the same name. The folder itself is inside a big folder called “Projects” which itself is inside Dropbox.  That mirrored folder (“Object Shift”) is where I keep all the notes I export from Sente  as well as the Tex file I will finally compile it to a finished paper. The “Project” folder is indexed inside Devonthink. Therefore, anything I add inside “Object Shift” is automatically available inside DT.  Now, you see I am in a good shape. My project files are in a separate folder inside Dropbox; but still in communication with the rest of my files inside Devonthink. The next step is  to develop a dozen of search algorithms (smart groups) inside DT that will hunt down all the relevant  files  to my topic. File selection and grouping in Sente is manual. Grouping inside DT is automatic. There are both pros and cons for for manual and automatic approaches of grouping files for project. I combine the two to get the best results.

 

As I have mentioned, I have “Object Shift” inside Sente, Dropbox (a folder) and Devonthink (indexed).  I also open a project under the same name inside Scrivener (I use it for some projects) and also a paper folder tagged with same name where I put all the papers relevant for the project.  That is mirroring.

It is a way of organizing myself wherever syncing is not available globally.

I mirror not only the projects and folder; but also the Statuses. The Statuses that I assign in Sente, demonstrated in the first post, are used across the board: inside Devonthink, Finder (Path Finder), Scrivener and even printed papers and books. Their application in the printed materials is actually quite interesting. I was a reading a book titled “How to Read a  Book”. In that book, the authors have a notion called x-raying the book.  X-raying a book is going through the major sections of the book, and evaluating the organization of the topics to evaluate the topics for your purpose. It is very effective method. I have developed the habit of examine the Table of Contents, the Sections and Sub-sections of the books before I read them. As soon as I finished examining the book, which takes just 2 minutes, I assign my statuses to the sections; with small notes; by attaching small stickers on them. That way, I will make sure that I l read the “Must Read” sections; and skip the “Repelling” sections (too much details or digressions) etc. As one can see from its multiple applications (on folders, projects, books and articles),  I can say that Mirroring is rather a habit; a useful habit to get things done.

You can make it your habit too.

 

Devonthink is now slow

It is unfortunate when a great software, once you swear by it, turn into a resource hug bunk in a single night. As anybody who read any of my previous posts would easily understand, I love Devonthink. It is one of the best apps that the Mac OS has ever had. I throw  all my data into it; and getting it back is as great as putting it. It is such a powerful application that I have been converted from a big PC fan into mac mainly because if Devonthink.

The new update that Devontechnologies introduced in the 2.6 update, the automatic updating of indexed folders, is unfortunate one. It turned the agile, wonder application into a resource hug crap in a night. Try it yourself if you have indexed folder in your DT database. Click the indexed folder and see what will happen. DT will try to update the index. Ok, great for the first time. Now, it is time to work on the folder; to read some of the files stored in it; to click and enjoy. At this point, every time you click your folder, DT will again fire up the update. Jess, you have to wait 5 minutes again. Well, that is what is “added” in the new update. I first was thinking it is a bug. They said then, it is a feature.   If you try to work on other, internally stored files, then, you will face the ugly spinning ball. Shit, I hate the spinning ball. I used to love Devonthink for its speed, ease of movement from one folder to another folder. Now, it is a pain to click any folder because it will bring back the nasty spinning ball in trying to update the index.

I am totally infuriated about this new addition. I hope Devontechnologies will fix it soon. I am downgrading to 2.5 for now.

DEVONagent vs Google

DEVONagent is a specialized searching tool, designed to provide more accurate results than the regular search engines. I have been trying it for a few days now. So far, my experience is,  the search results are not  better or more accurate than regular Google search. I have tried a few queries using the Web Deepest field in DEVONagent. In many instances, the search results are even weaker than that of Google, in my view. Since the tools of DEVONagent seems to work by analyzing the keywords of web pages, they are generally deluded by some wiki pages which put a lot of keywords with little actual content. Google has a way of killing these kinds of sites which put too much key works (tags) without actual valid contents, as it receives feedback from the users’ experience.

Synchronize Devonthink and Scrivener

There is a small window of opportunity to make the two applications work together.The opportunity comes from the Sync feature of Scrivener.

Even if this feature oppens the oportunity, the fact that it has limited capability to sync files complicates the relationship bewteen the two applications.

The weakness of the Sync feature is; it doesn’t support multiple folders. This means that, your heirarchical organizations based of authors or topic you use inside Scrivener Binder ( Research or Draft folder) will not be available in the Sync folder. The sync folder will have only 2 folders; one for the Draft and another called Note for the rest of the files ( all the files inside the Research binder). All the items, in whatever heirarchy you put them insider the Research Binder will be put into a single flat folder. This makes things hard to identify which note belongs to which group (folder) of the Binder. In the current system of Sync, the feature is almost useless, specially if you have built complex system inside the Research Binder. Assume that you have collected your materials and grouped them under an author name inside Research Binder. Say, you have collected 50 notes, 5 PDFs and 10 webclipings under authorX. You also have 5 notes, 2 pdf and 5 clipings in AuthorY. Each of the authors have their own binder (folder) insider the Reacher binder. When you syn, all the notes, cliping and pds will mix insider a single folder called Note, insider finder. You can not distinguis which note belongs to which author then.
In ability to maintain the folder (binder) heirarchies is the main issue of the syn in Scrivener. This again makes it hard to index these notes into Devonthik because you don’t know which note belongs to which author (folder, group). So, to tackle this problem I have developed the followed the following steps. It looks complex, but, there is not better way, as far as I know.

1. Start a new project in Scriverner ( I am assuming you start from scrach, to make things simpler. If you already have a lot of folders, you have to figure out a way of dealing with it by yourself.)

2. Don’t import anything into Scrivern. First, put your files into different folders insider Finder.
Put all the author’s files inside a folder by the names of the authors.
AuthorX
AuthorY
AuthorY
3. Now, drag AuthorX only into the Scriverner’s Research binder
4. Now, Sync the library of Scrivener ( got to File > Sync). When you sync, Scrivern will ask you to choose a folder. Creat a new folder, probably inside Dropbox. I call the new folder ScrivenerX.
5. Now, go to the ScrivenerX and have a look at the files. Scrivener has created two folders during the sync; Draft and Note folders. Draft contains all the items in the Draft binder while Note contains all the items inside the Research binder. Now, you must be able to see all the notes of AuthorX inside the Note folder.
At this point, you can tag them all by the name of the AuthorX. This makes them searchable even in Spotlight for other times. (I have Tagger in the toolbar of my finder. i use it to tag all the files at the same time)

6. Index all the files insider the Note folder into Devonthink (hold Alt+CMD and drag them into a group called AuthorX inside Devonthink). This step is very necessary. If you import and Sync AuthorY before you indexing the files of AutorX, you will not again able to identify the files inside Devonthink.

Now, you are finished syn and tagging Author X. Repeat the steps for other authors: (drag the folder Y into Scrivern > sync Scriverner >tag the newly imported files inside Note folder >Drag them to Devonthink)

Here you go, , your Scrivener and Devonthink are in Sync finally. Anything you edited in Devonthink will be appear in Scrivener; and things you edit in Scriverner will appear in Devonthink, as far as you don’t forget to Sync inside Scrivener.

Scientific research workflow, mac

I am now starting up my PhD in linguistics. I have already collected more 1500 PDF articles and books (also did my MA in linguistics). So, I am trying to build up as perfect  workflow as possible  for my future research works. The university has given me a macbook pro, so , I am no more using windows OS. Even if there doesn´t seem to exist any comparable application in Mac as MS OneNote, I am discovering quite powerful apps in mac OS too. I have already learned a lot about Devonthink, Cirus Punies Notebook, Curio, Tinderbox and the like great apps. Therefore, I will be recording my experiences with each of the apps I am trying until I come up with the final, perfect system for my work flow.   I will write a detailed review of each of the applications here in the future. But, for now , I will just put only a short summary of my experiences with them.

1. File Organizer

My first task is to properly setup files organized in a specific folder, to make them easily accessible via Spotlight (or Alfred, I prefer the latter though). For file, organization, I use two tools; Dropbox folder and Mendeley. Dropbox doesn´t require explanation. I use mendeley not only to collect references, but also rename and organize my PDFs. It is such a powerful application to do these tasks. Here I use it.

a-I set a folder in Finder, I call it “agglomeration”, to mean, a folder where I drop all newly downloaded PDFs. All the PDF I download from internet directly go there. I use a download manager  called folx to force all the pdf files to go to this folder.  You can google it.

b-I have another folder in Dropbox, call it “AllLing”. This is the folder where I keep properly organized files.

c. Then, I setup Mendeley to suck-in all the PDFs available in the  “agglomeration” folder into its library, rename and then put them all into  “AllLing” folder.

As you can see from the above screenshot, the Mendeley is organizing my PDFs into a folder, inside Dropbox. Since the files will be renamed to Author-year-title, I can search the files using any of these attributes.  I also index the folder “AllLing” into Devonthink (see next). One main reason I want to use Mendeley is the fact that it live syns Bibtex files , even if it is not as elegant as I wish it to be. Other reference managers such as Sente and Papers are great by their own, but are weaker in their integration with bibtex.

2.Database manager:

Database managers are tools to  organize files and information in a manageable manner.  I use Devonthink this purpose.  Devonthink is one of the most powerful apps I have ever seen in the  mac environment. It has an artificial intelligence which looks inside the PDF files and establish content-based relation among the PDFs. That means, if I am reading an article on “Definiteness” , the software can scan its database, find and suggest relevant articles,  articles that contain the word “definiteness” or/and other  related words in the articles for me. It is also packed with many other interesting features such as  tagging system,   notes-taking tool; organize files into different folders, smart folders, duplicate detection, replication (aliases) etc. If you are staring to use the app, the learning could be a bit steep. I definitely recommend you to watch a screen cast in  a  website (it is under a paywall unfortunately) called screencastonline. Their screencast gave me a good ground on  Devonthink. (Note: I don´t have any affiliation with any of the links I mention here). Devonthink will be an    indispensable part of  to my workflow. I have tried some of the other database apps. I think no other app as good as Devonthink for managing scientific papers. Therefore, my database agenda is closed. The challenge I am facing is to make other applications to work with Devonthink.

So, Mendeley renames and puts the files in “AllLing”; Devonthink indexes them. I then group, replicate, organize, tag the files in the Devonthink so that I could organize them for my specific projects. I am right now writing a paper about Nominalization. Hence, I search and “see also” the related papers in Devonthink, Group them in one folder; I then drag them to Sente for reading and taking notes.

3. PDF annotation and note-taking:

Macadamec has already written a great post about Sent. I recommend visiting his  post; I am not going to repeat the whole story here. I will just shortly reflect my own experience with the application and its place in my work flow.

I  am considering totally leaving Mendeley and migrating to Sente because of the fact that the application has a more elegant tools of annotating PDFs. It can directly quite, snapshot, highlight and insert all these into the Notes panel. That is brilliant. It can also rename files, just like Mendeley. The notes then could be exported to Devonthink or Scrivener using some apple scripts. brilliant!

Sente  has some fundamental flaws, unfortunately, that makes me nervous to totally migrate my data from Mendeley:

a. it fails to import PDFs from other applications,

b. the link between the note and the pdf could also be broken. Some people have experienced this problem, and I had the same issue with a few PDF annotations.  Right now, I am using it only as PDF annotation, not as a reference manager. My references and PDFs remain in Mendeley while I temporarily import the PDFs I want to read into Sente.  (just search in Alfred and drag it to Sente because the files are properly renamed by Mendeley, or go to the folder “Allling” and drag the file; but, I usually drag them from Devonthink).

c. it is also bad for Latex integration

d. Annotations are not stored in the PDF: this the problem of almost all the note-taking tools in Mac; they store the annotations in their own database. If you open a PDF from Dropbox in another PDF reader or browser, you couldn´t see the annotation done in Sente (or Mendeley or Papers) while the annotation done in Acrobat or Foxit or PDFexchange are there, everywhere you have the pdf. Storing the annotation is good for long term use, as these applications could break. But, Sente couldn´t do it, unfortunately.

4.Drafting

Here, the choice is clear. Since all the notes are exported from Sente in either OPML or RTFD format, I just import them to Scrivener.

5. Final Polishing and Publishing:

I export my draft form Scrivener in Latex format, I import the text to TexStudio, a latex editor that I use to finally polish my work. Texstudio, and also TexShop, can automatically detect and insert my references  which are stored in Jabref (in sync with Mendeley).

Finally, a shinny PDF!

End