I have been trying different tools to extract bibliography references from table of contents of a pdf book.
Assume you have the pdf format of an edited book which contains 20 articles in it. If you want to have the reference data for all of the entries, you have to go to google scholar and extract the reference data for each of them. The process is hectic. Furthermore, google scholar usually offers incomplete data. You need to go and edit each of these references. it is a lot of work.
Won’t it be easier if you can just pick the reference data directly from the table of contents of the given pdf book?
Yes, in principle.
But, in practice, you need to understand a lot of programming and under-the-hood understanding of PDF files. I have none of it. Therefore, I came up with a simpler, but, equally plausible solution= using Keyboard Maestro and Jabref.
The process is a bit complex. But, the output is much better and faster than Google shcolar or any of the reference extraction methods.
- Fill up the reference data of the main book in Jabref (from Worldcat)
- Copy the bibtex of the edited book to a specific clipboard inside Keyboard maestro. (if you are importing my macro, simply hit CTRL+C; that will copy the bibtex and make some calculations to get the publication year)
- Copy the Title, Author and page number of each of the articles of the pdf book. Each of the references must be copied in that order.
- hit a shortcut (CMD+ALT+9) that calls a window of Keyboard maestro asking me for the number of copied references. I count the number of references I copied and answer the question. I typically copy 8 references at a time.
- click OK. KM magically turns the clipboard to references; calculates the page numbers for each entry, and crossrefs them with the mother book.
I magically get a perfectly formatted reference from the copied clipboards. Once you get how it works, it is very powerful script.
You can ask if you are interested in the script.