在文件中进行调试时,我认识到多次引用相同字体的字体文件。因此,用已经查看过的字体文件项替换字典中的实际字体文件项,即可删除引用并进行压缩。到此为止,我能够将30 MB的文件缩小到6 MB左右。
File file = new File("test.pdf");
PDDocument doc = PDDocument.load(file);
Map<String, COSBase> fontFileCache = new HashMap<>();
for (int pageNumber = 0; pageNumber < doc.getNumberOfPages(); pageNumber++) {
final PDPage page = doc.getPage(pageNumber);
COSDictionary pageDictionary = (COSDictionary) page.getResources().getCOSObject().getDictionaryObject(COSName.FONT);
for (COSName currentFont : pageDictionary.keySet()) {
COSDictionary fontDictionary = (COSDictionary) pageDictionary.getDictionaryObject(currentFont);
for (COSName actualFont : fontDictionary.keySet()) {
COSBase actualFontDictionaryObject = fontDictionary.getDictionaryObject(actualFont);
if (actualFontDictionaryObject instanceof COSDictionary) {
COSDictionary fontFile = (COSDictionary) actualFontDictionaryObject;
if (fontFile.getItem(COSName.FONT_NAME) instanceof COSName) {
COSName fontName = (COSName) fontFile.getItem(COSName.FONT_NAME);
fontFileCache.computeIfAbsent(fontName.getName(), key -> fontFile.getItem(COSName.FONT_FILE2));
fontFile.setItem(COSName.FONT_FILE2, fontFileCache.get(fontName.getName()));
}
}
}
}
}
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.save(baos);
final File compressed = new File("test_compressed.pdf");
baos.writeTo(new FileOutputStream(compressed));
也许这不是最优雅的方法,但是它可以工作并保持PDF / A-1b兼容性。