Some pdftk users process hundreds of files. Performing this work on a Windows machine can yield unexpected results. The problem arises from the Windows command-prompt shell, not pdftk. The problem arises because for every long filename, Windows creates a short, DOS-compatible (8.3) filename. This short filename might end up matching a wildcard expression, even when the long filename does not. When using pdftk, the result is that you end up with more input files than you wanted.
This article offers a couple workarounds and then describes the case where this problem arose.
One workaround is to use a wildcard expression that couldn't possibly match a short, DOS-style filename. DOS-style filenames have a maximum length of eight characters and an optional, maximum extension of three characters. They look something like this: 343990~1.PDF. In the case below, using the wildcard expression 343990_* solved the problem.
Another workaround is to use a shell other than the Windows command-prompt. I use bash as packaged by MSYS.
This problem arose in a case where a directory of input files contained 448 PDFs. Their numerical names had incrementing prefixes and suffixes, such as:
343959_0011.pdf 343959_0021.pdf 343959_0031.pdf 343990_0011.pdf 343990_0021.pdf 343990_0031.pdf 343990_0041.pdf 343991_0011.pdf 343991_0021.pdf 343991_0031.pdf 343992_0011.pdf 343992_0021.pdf 343992_0031.pdf 343993_0011.pdf 343993_0021.pdf 343993_0031.pdf 343994_0011.pdf 343994_0021.pdf 343994_0031.pdf ...
When using pdftk to combine these PDF files, extra files were showing up in the output PDF. For example, running:
pdftk input343990* cat output output343990.PDF
yields 343990.PDF which includes these files in this order:
343990_0011.pdf 343990_0021.pdf 343990_0031.pdf 343990_0041.pdf 345089_0131.pdf 345688_1121.pdf
Is this a pdftk error or a shell error? Using dir shows that the shell is passing these unwanted files to pdftk:
dir 343990* 06/20/2005 03:58p 1,825 343990_0011.pdf 06/20/2005 03:58p 1,825 343990_0021.pdf 06/20/2005 03:58p 1,825 343990_0031.pdf 06/20/2005 03:58p 1,825 343990_0041.pdf 06/20/2005 03:58p 1,828 345089_0131.pdf 06/20/2005 03:58p 1,828 345688_1121.pdf
This mystery is solved by using the /X switch. This switch shows the DOS-compatible name on the left and the original, long filename on the right:
dir /X 343990* 06/20/2005 03:58p 1,825 343990~1.PDF 343990_0011.pdf 06/20/2005 03:58p 1,825 343990~2.PDF 343990_0021.pdf 06/20/2005 03:58p 1,825 343990~3.PDF 343990_0031.pdf 06/20/2005 03:58p 1,825 343990~4.PDF 343990_0041.pdf 06/20/2005 03:58p 1,828 343990~5.PDF 345089_0131.pdf 06/20/2005 03:58p 1,828 343990~6.PDF 345688_1121.pdf
Thanks to Josh Gray at Daktronics who identified this problem and worked with me to solve it.