PSA: Cut the whitespace off the edges of your PDFs to import them into TeX

These days in economics, TeX is more or less unavoidable. There’s really no better way to put equations into your documents, and it can do wonderful things with automatically numbering tables and hyperlinking to them from where they are referenced. Even if you don’t like it, your coauthors probably do. So using it is basically a must.

If you’re like me, though, you make your actual formatted tables in Excel, so you can customize them without spending hours writing and debugging code to get them to look right. This creates a problem when you want to put them into your paper: if you want to use TeX’s powerful automatic numbering features, you need to import each table as an image. You can do this using a normal picture format like PNG, but your tables will look pixelated when people zoom in. The best format to use is to import your tables as PDFs, which can scale perfectly. But PDFing your excel tables will print them to entire pages, with tons of white space. TeX will then import all the white space as well, which at best puts your table title oddly far from the content and at worst moves the table title and page number off the page.

To get around this you need to crop your PDFs from whole pages down to just the content. You can do this manually in Adobe, and I used to use a free tool amusingly titled “BRISS”. This is tedious, though, and adds another chance for human error.

Today I figured out how to use a totally automated option for PDF cropping – a Perl script called pdfcrop, which was written by Eric Doviak, an economist at CUNY Heiko Oberdiek, who is probably not an economist. Here’s how to use it in Windows. You can download the script here. Extract it and put it in the root directory of your C: drive or someplace else convenient.  pdfcrop also requires a Perl environment like Strawberry Perl (download link) and a TeX implementation like MikTeX (download link) to run. Once you get all the components installed, you will run it through the command prompt. (You can pull up the command prompt by hitting Windows+R, typing “cmd” and hitting Enter).

The command you want to run looks like this:

“C:\pdfcropscripts\pdfcrop\” “PATH_TO_ORIGINAL” “PATH_TO_CROPPED_VERSION”

If your original tables is in “C:\Dropbox\PaperTitle\Jason’s_Awesome_Tables.pdf” then your command might look like

“C:\pdfcropscripts\pdfcrop\” “C:\Dropbox\PaperTitle\Jason’s_Awesome_Tables.pdf” “C:\Dropbox\PaperTitle\Jason’s_Awesome_Tables_cropped.pdf”.

EDIT: This version of the command randomly stopped working for me. Here is one that I got working, that you might try if you hit the same error:

pdfcrop “C:\Dropbox\PaperTitle\Jason’s_Awesome_Tables.pdf” “C:\Dropbox\PaperTitle\Jason’s_Awesome_Tables_cropped.pdf”.

This works even on multi-page PDFs that contain many tables – it will spit out a single PDF file that has all the cropped tables in the same order. You can import the individual pages of the PDF into your TeX document using includegraphics. Doing this:


will give you just the 4th table, and you can change the page numbers to get the other ones.

Bonus font tip: if you’re making tables in Excel and want to match the default font used in TeX, it is called CMU Serif. The CMU fonts can be found on SourceForge here. Just download the fonts and drag them into Fonts window you can find under Control Panel.

EDIT: A friend pointed out that I had misattributed credit for pdfcrop – this is now fixed. It turns out there are two identically-named tools to do this. The one I linked to appears to be the better of the two since it takes off nearly 100% of the margins.

I also got some reports of errors running the command to call pdfcrop, and encountered issues myself. I have inserted an edited command that works as well (or instead, depending on your situation).

EDIT #2: When I imported this post to my new blog it dropped all the backslashes for some reason. I’ve edited the post to fix them and also correct some typos in the example commands.

Leave a Reply

Your email address will not be published. Required fields are marked *