Java PDF Blog

Java and PDF development - our personal experiences and discoveries

Download JPedal

Download JPedal PDF viewers

PDF to Image service

Try our PDF to image conversion service now.

Java PDF Ebook Solution

Try our Ebook solution now.

Subscribe

Your email:

Java PDF blog

Current Articles | RSS Feed RSS Feed

Learning about PDF

Posted by Mark Stephens on Tue, Aug 11, 2009 @ 06:34 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

The PDF file format is very useful and well-documented, but it is also quite complicated and it does not work how most people imagine. It is structured very differently from a Word or Excel document.

Most of the time, this is not an issue - you can just use PDF files without knowing anything about them and just enjoy the benefits. There comes a time though, when you may need to start to dabble. So this article is designed to give you some starting points.

It is worth getting to grips first with the basic idea that a PDF file is essentially a set of linked objects (so each page has a page object, which may include font objects defining the fonts, XObjects storing image data and so on). Then you can look at all the different types of objects. The PDF file contains all these objects and their locations (the references) so that they can be read as needed.

The definitive guide to the PDF file is the Adobe PDF reference guide. It is a very complete and comprehensive(and equally dull) volume which explains most of the internal working of the PDF file format. It is not designed to tell you about how to create or modify the PDF file - just to provide all the details. It is not an easy read, but the first 2 chapters do provide an excellent introduction to the PDF file format.

A slightly less technical introduction to the internals of a PDF file can be found at wikipedia. This also gives you a detailled inside into the structure of the file.

Once you have started to explore the internal guts of the PDF file format you can open up a few PDF files. It is not recommended that you directly edit this file (even adding a space can break it), but you can open it in a Text editor and view it. Much of the data is encrypted or compressed so a more useful tool is Acrobat 9. I explained how you can use this to examine the internals of a PDF file in my first posting.

To really do much with the PDF file you will need a third party library to manipulate the PDFs. We always recommend IText as a good starting point as its free and well-documented, with lots of examples.

So if you have reached the point where you want to start to explore the PDF file format, I hope this has provided some useful starting points and do feel free to post your experiences.

Tags: 

COMMENTS

Thank you very much for this article. There were definitely some things I had no idea about. 
 
Generally, PDFs are very useful and for many purposes, much better than a text document. The trouble comes with editing though. 
 
I can recommend a program that allows you to edit the PDF in Word and then convert it back to PDF -  
 
http://www.pdftodocconverterpro.com

posted @ Tuesday, August 11, 2009 9:38 AM by Jenny


There are loads of free tools to do conversion out there which work well on different files so I just try a few. 
 
There is actually a good article on the blog on PDF editing at http://pdf.jpedal.org/java-pdf-blog/bid/17370/Problems-editing-PDF-files

posted @ Tuesday, August 11, 2009 9:50 AM by dave


i want to find out how are the offsets understood by us...and when i try to change the contents of a pdf which uses(for ex..flatedecode filter) to display simple plain text using)(Tj)..the file does not get open..n i want to knw what changes in xref shud b made wen we make changes

posted @ Wednesday, November 04, 2009 5:59 AM by khushboo


Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.