Java PDF Blog

PDF solutions for big and small customers

Java and PDF development - our personal experiences and discoveries

Download JPedal

Download JPedal PDF viewers

PDF to Image service

Try our PDF to image conversion service now.

Java PDF Ebook Solution

Try our Ebook solution now.

Subscribe

Your email:

Java PDF blog

Current Articles | RSS Feed RSS Feed

The Java PDF blog is moving (action required)

Posted by Mark Stephens on Fri, Aug 27, 2010 @ 10:07 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: 

We have been writing the Java PDF blog for over a year now and have decided to move it to Wordpress. This is a much better tool and will allow us to do far more.

So, if you would like to continue to follow the blog (which I hope you will!), you will now find it at http://www.jpedal.org/PDFblog/

You can sign up to the RSS at http://www.jpedal.org/PDFblog/?feed=rss2

We will be leaving the old site as an archive but also updating and republishing the technical articles on the new site, along with lots of new material. I hope you will join us there.

0 Comments Click here to read/write comments

Eclipse PDF viewer plugin and the marketplace

Posted by Mark Stephens on Fri, Aug 20, 2010 @ 10:24 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: ,

We have always offered a free PDF viewer plugin which is a nifty little plugin to not only allow you to view and search PDF files but also to bookmark them so that you can easily access them from inside Eclipse.

Every year there is a big release of the next version of the Eclipse Operating system and we always like to do an update shortly after release.

The Eclipse release always has a number as well as a name so the latest version is Helios (Eclipse 3.6). As well as the usual collection of improvements and bug fixes, Helios has an additional rather cool feature - the Eclipse marketplace. It provides a plugins store and allows Eclipse users to search and browse a database full of software.

Here is what came up when I searched for PDF.

marketplace resized 600 

And the best bit is that you can then install the software just by clicking on the install button. The days of having to understand update sites are long gone!

So I hope you will give Helios a try and I also hope you will download and try our free plugin - let us know what you think...

1 Comments Click here to read/write comments

Annoying Java Bugs - who broke right aligned text fields

Posted by Mark Stephens on Tue, Jul 27, 2010 @ 02:12 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: ,

Bugs are an unfortunate part of a coder's life. Working in PDF, which has quite an 'elastic' specification and where often real world does not match what is supposedly allowed - Sam yelled at me last week that another PDF file we were looking at was missing supposedly mandatory values - there is lots of scope for coding and logic errors. We have three machines in the office which constantly run regression tests on Windows and Linux, so hopefully we only have to fix a bug once and we can see if a fix breaks other things. So our bugs are a controllable annoyance where we can constantly try to raise our game and improve.

No, the most annoying bugs are the ones we do not write - we can put our hands up to those and can fix them easily...

Last week we had an issue with right aligned text values not appearing correctly. It took some time to hunt this one down as we simply could not reproduce it - until we made sure that we were using the same version of the JVM as our client. It turns out that alignment of right aligned text values was broken in JDK1.6_update 10 and does not appear to have been fixed yet.

The fix is a hack, which adds a spaces to the right of all text values and adds spaces so that all the text values are the same length. Then they align nicely. So tomorrow's release of JPedal has a boolean flag in it to enable this 'hack' to ensure that fields works in all JVMs. Hopefully Sun (or rather Oracle now), will fix it as it is a big deal to anyone writing financial software in Java. It also means that you may need to tie down customers to a certain version of Java to avoid a whole nightmare of issues and work arounds.

Do you have a particularly annoying Java bug?

2 Comments Click here to read/write comments

Java Performance tuning

Posted by Mark Stephens on Thu, Jun 10, 2010 @ 11:30 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: 

One of my favourite coding activities is profiling - taking a Java application and making it run faster. Every so often we set aside some time to just focus on making our code run faster.

Don't optimise code where speed does not matter. Not only does it give no benefit, it probably makes that code harder to support and may introduce bugs.  

Before you can do this,  you need to find the bottlenecks - which bits of the code are used most. These are the sections which are worth improving. You can find these using a Profiler.

In the Java world we are lucky to have lots of profiling tools. The two we use are the one built-in NetBeans and JProfiler. Here is what is JProfiler shows...

 

Methods that are frequently called or used often are worth looking at. Other methods are not worth bothering with. If you take a method which takes 2% of the time and make it twice as fast, the code will be a petty 1% faster. If you take a routine used 30% of the time and make it 10% faster (a much easier task) you will get 3 times the benefit.

Profiling also tells you where it might be worth caching values. If you can reduce calls to some routines it will make it faster without having to change code.

You also find some surprises with Java. In one routine, we generated some data and wrote it to a ByteArrayOutputStream. It actually turned out to be 5 times faster if we called it twice instead, first to see how many bytes are created and then again with a byte[] array to store the data in... 

So if you have not tried profiling your code, try NetBeans or the JProfiler demo on some code and be amazed at what is going on.

And if you want some more speed in your PDF Viewer, try our new 4.21 release here

2 Comments Click here to read/write comments

Search in continuous mode and future plans

Posted by kieran france on Thu, Feb 18, 2010 @ 04:59 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: , , ,
For some time the Jpedal library has had the ability to search exclusively in single page mode. For our release of Jpedal 4.0 we have begun to expand this functionality to the other view modes. As a start we have added the search functionality to the continuous single page view mode with plans to expand this into the other view modes.

To allow for this new functionality we have needed to make alterations to a few of our exsisting public methods in order to allow for highlights to be assigned to or retrieved from a particular page.


On top of this the highlights are no longer stored in a Rectangle array. The highlgihts are stored in a Hashmap using the page number as the key and a Vector_Rectangle (org.jpedal.utils.repositories.Vector_Rectangle) as the associated value.


We have also moved the page text areas and text orientation into hash maps. In order to store this information it must be retrieved from PdfStreamDecoder after decodePageContent (PdfObject pdfObject, int minX, int minY, GraphicsState newGS, byte[] pageStream) is called as each call to this method will rewrite the localy stored data for the previous page.

The follow methods have changed in version 4.0 to allow for highlights of multiple pages being stored.

Commands.ExecuteCommands(Commands.HIGHLIGHT, new Rectangle[]{})
has become
Commands.ExecuteCommands(Commands.HIGHLIGHT, new Object[]{Rectangle[] areas, int page})

GethighlightAreas()
has become
GethighlightAreas(int page)

setFoundParagraph(int x, int y)
has become
setFoundParagraph(int x, int y, int page)

addHighlights(rectangle[], boolean)
has become
addHighlights(rectangle[], boolean, int page)

RemoveFoundTextArea(Rectangle)
has become
RemoveFoundTextArea(Rectangle, int page)

RemoveFoundTextAreas(Rectangle[])
has become
RemoveFoundTextAreas(Rectangle[], int page)

 

As you will notice the above methods have had a new integer added as an input called page. This value is the page number to which you wish to direct the method.


As well as the above methods the following method has also changed.

Display.initRenderer(Rectangle[] areas, Graphics2D g2,Border myBorder,int indent)
has become
Display.initRenderer(Map areas, Graphics2D g2,Border myBorder,int indent)

The above method would originaly recieve the rectangle array we used to use for highlighting. We have updated the method to accept a map as this is how the highlights are now stored.

 

Earlier in this article was mentioned that PdfStreamDecoder holds a local copy of the text areas and orientation when a pages content is decoded. In order to retrieve this data we have added the follow two methods.

Vector_Rectangle getTextAreas()

Vector_Int getTextDirections()

In the releases to follow we will be moving more functionality into the continuous single page view mode, then to the other view modes, such as highlighting with the mouse, extraction and the right click menu.

0 Comments Click here to read/write comments

Java File handling - when is a file actually saved

Posted by Mark Stephens on Fri, Feb 05, 2010 @ 04:20 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 

Consider the following code....

File ff=File.createTempFile("page",".bin", new File(ObjectStore.temp_dir));

BufferedOutputStream to = new BufferedOutputStream(new FileOutputStream(ff));

to.write(currentDisplay.serializeToByteArray(null));         

to.flush();           

to.close();

pagesOnDisk.put(key,ff.getAbsolutePath());

It stores a serialised Java Object (currentDisplay) on disk and then stores the file location so we can reuse it. So in theory,  if the value is in the Map pagesOnDisk, we should be able to retrieve the data and reuse it...

Unfortunately, that is not always the case. While Java may think the file has been written out, an attempt to immediately reuse it results in alsorts or errors arising from trying to read a File which has not fully been written out to disk.  At the OS system level, the file has been Buffered and is still being written out.

So be aware of this 'gotcha' in Java and either ensure that there is a sufficient time delay to allow the data to be written out, or include some check to make sure the data is valid.

0 Comments Click here to read/write comments

JPedalFX - a JavaFX PDF Viewer

Posted by sam howard on Fri, Dec 11, 2009 @ 01:44 PM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 

During a recent trip to Sun Microsystems' customer briefing centre in London I sat through an enjoyable overview of JavaFX and its short-term future. While I had heard of it before, it was interesting to see how easily exciting interfaces could be put together. I couldn't help but think how several things I have implemented in Java could have been implemented much more quickly in JavaFX, so I decided to spend a few hours trying it out.

There's lots of great example code at javafx.com, which provides a good starting point. As can be expected, it is very similar to Java, and any Java developer should be able to start writing code almost straight away. The obvious choice was to write a simple PDF viewer, which I then did using spare moments over the past week or so. The result is JPedalFX - an attempt at a clean and simple viewer.

 

 

My only complaint with JavaFX is the lack of ability to compile to a standard jar file which will run on the JVM. While you can do so by including the JavaFX jars in your product, it is apparently not allowed under JavaFX's licensing. In any case, deployment via Webstart is the preferred option, which (hopefully) suits JPedalFX well.

Over all, I am impressed at how quickly you can learn and develop in JavaFX, and the quality of the resultant applications. JavaFX is a much needed step towards putting UI design in the hands of designers rather than programmers, but it also provides a framework in which programmers can more easily create exciting interfaces.

Give JPedalFX a go, and see what you think. I'd love to hear your thoughts.

1 Comments Click here to read/write comments

Corrupt PDF's? Maybe this is your problem.

Posted by sam howard on Mon, Nov 30, 2009 @ 11:18 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 

On more than one occasion we have had clients coming to us reporting that JPedal tells them that their files are corrupt. This is due to a common misconception about the PDF format - unlike HTML, whitespace and special characters matter.

Different platforms, by default, use different character encodings. Because of this, when Java reads in a file it believes to be text, it makes certain assumptions. This is fantastic if you actually are dealing with text, but not so good if you're dealing with binary data.

Because PDF's often contain raw image data, and the size of each section of the file is specified at the start, if any characters are changed or removed JPedal starts reading further into the file than it should. This is what causes the error message saying the file is corrupt - because by the time it gets to JPedal it is!

Unfortunately there's nothing we can do about this slightly counter-intuitive way of doing things - it doesn't help that the Java class which converts characters is called FileReader, while the class which doesn't is called FileInputStream. Not the clearest of names!

So, for the record, the correct way of reading a PDF file into a byte array is the following:

//Set up stream
File file = new File(filename);
FileInputStream stream = new FileInputStream(file);

//Read file into byte array
int a;
int count=0;
byte[] pdf = new byte[(int)(file.length())];
while ((a=stream.read()) != -1) {

pdf[count] = (byte)a;
count++;

}
stream.close();

2 Comments Click here to read/write comments

Converting Java BufferedImage between Colorspaces

Posted by Mark Stephens on Thu, Oct 22, 2009 @ 02:15 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: ,

The Java BufferedImage class provides a very powerful 'abstraction' of images in Java. It lets you create a huge array of image types which can all be seemlessly accessed. One of its key features is that you can decide what type of image you have - black and white, grayscale or full ARGB. 

When we create an image from a PDF in JPedal, we use an ARGB BufferedImage because it is the only mode which support the color range and transparency which can be found in many PDFs.

Sometimes, you want to convert a BufferedImage from one type to another, and Java makes this very easy. We have a method in our ColorSpaceConvertor class convertColorspace(image, type) which does the conversion but the code is very simple and reproduced below. The Type is a constant from BufferedImage so making a page GRAY would need

image=ColorSpaceConvertor.convertColorspace(image, BufferedImage.TYPE_BYTE_GRAY);

/**
     * convert a BufferedImage to RGB colourspace
     */
    final public static BufferedImage convertColorspace(
        BufferedImage image,
        int newType) {

        try {
            BufferedImage raw_image = image;
            image =
                new BufferedImage(
                    raw_image.getWidth(),
                    raw_image.getHeight(),
                    newType);
            ColorConvertOp xformOp = new ColorConvertOp(null);
            xformOp.filter(raw_image, image);
        } catch (Exception e) {
            LogWriter.writeLog("Exception " + e + " converting image");

        }

        return image;
    }

 

One issue that can arise is that detail can be lost so an alternative method is to create an image in the format you need and then draw the original image onto it. Here is how you can do this

BufferedImage image_to_save2=new BufferedImage(image_to_save.getWidth(),image_to_save.getHeight(), BufferedImage.TYPE_BYTE_GRAY);
                                image_to_save2.getGraphics().drawImage(image_to_save,0,0,null);
image_to_save = image_to_save2; 

 

1 Comments Click here to read/write comments

BufferedImage raster data in Java

Posted by Mark Stephens on Thu, Aug 27, 2009 @ 10:55 AM
Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 
Tags: , , ,

Most of the time, the abstraction you get in Java is brilliant. It hides the complexity and lets you get on with real-life problem solving. This is especially true of the Image classes where you can forget about Tiffs, PNGs and Jpegs and just work with images.

Occasionally though you need to dig deeper and here you can find things are more complex. For example, you sometimes want to access the actual data inside an image (the raster data). We use this to downsample images in our PDF library - if the image in the PDF file is 5000x5000 pixels, you can shrink it down to more a manageable size. This saves a lot of memory and makes things much faster without any noticeable loss of image quality. Indeed, with black and white images, you can improve image quality!

A BufferedImage contains the actual pixel and color data. You can access these directly but then you need to starting thinking more concretely in terms of actual physical data structures. The pixel data is stored in the Raster object (actually it is an interface as you will see below). Obtaining a Raster and the data it contains is easy in Java...

BufferedImage image = myImage;//or whatever

Raster ras=image.getData();

DataBuffer data=ras.getDataBuffer();

The interesting thing here is that we are now accessing the underlying data which may be stored in different ways, depending on the type of image and even Java implementation. In particular, it may be a set of 8 bit byte values or 32 bit integer values - the DataBuffer is just an abstraction.

So the next stage is to see which is being used and then handle the data accordingly

if(data instanceof DataBufferInt)
            type=1; //its a set of ints
        else
            type=0; //bytes

 

You get a similar issue when directly creating a Raster - here is an example....

    ras=Raster.createInterleavedRaster(new DataBufferByte(newData,newData.length), newW, newH, newW*comp, comp, bands, null);

So a BufferedImage is a very useful generic object in Java. But if you want to manipulate the raw data, you need to take some care depending on what sort of image you are using.

                

 

2 Comments Click here to read/write comments

All Posts | Next Page