Wednesday, October 25, 2006

Grabbing XMP Data with XPAAJ

During Max 2006, I am helping out Gunar Penekis with a talk on XMP and demonstrating our XMP SDK toolkit written in C. The Adobe SDK uses James Clarke's Expat Parser and has some custom classes to grab XMP and manipulate it. The samples directory also has some great examples to get anyone up and running.

However....

Being a bit more of a Java head, I felt like being productive. During Matt Butler's excellent 3 hour hands on tutorial on LiveCycle, I got inspired to write an extension to the XPAAJ sample I posted earlier for getting XMP out of a PDF document. The source code is here (sorry about the formatting - email me if you want to get the real file via dnickull (at) adobe (dot) com):

import java.io.*;
import java.util.*;
import java.awt.image.DataBuffer;
import com.adobe.pdf.*;

/* XMPExtractSample
* by Duane Nickull, Adobe Systems Inc. dnickull@adobe.com
* Copyright (c) 2006 - all rights reserved
*
* Use this at your own risk and don't whine to me if it doesn't work.
* You will need to have XPAAJ.jar from Adobe.com. Written and tested
* with JDK 1.5 on a mac w/osx 10.4.7
*/

public class XMPExtractSample {

public static void main(String[] args)
throws FileNotFoundException, IOException

/* Make sure we have the correct args.length() and call PDFExtract() */
{
String inPdfName;
if(args.length != 1 )
{
System.out.println("\nCommand line format: java DuanePDFClass1 pdf-file");
return;
}
else
{
inPdfName = new String(args[0]);
PDFExtract(inPdfName);
}
}
public static void PDFExtract(String inPdfName)
throws FileNotFoundException, IOException

{
System.out.println("\nOpening PDF with DuanePDFClass1...");
PDFDocument doc = null;
boolean b = false;
FileInputStream inPdfFile = new FileInputStream(inPdfName);
try {
doc = PDFFactory.openDocument(inPdfFile);
} catch (IOException e) {
System.out.println("Error opening PDF file :" + inPdfName);
System.out.println(e);
}

if(doc == null)
System.out.println("Cannot open PDF file : " + inPdfName);
else
System.out.println( "\n" + inPdfName + " was successfully opened.");

// Export the xmp metadata from the document

try {

//Call the PDFDocument object's exportXMP method.
InputStream myXMPStream = doc.exportXMP();

//Get the byte size of the InputStream object.
int numBytes = myXMPStream.available();
System.out.println("\nNumber of XMP Bytes found is " + numBytes + "\n");

// Read into a Buffered Reader Stream.
BufferedReader d = new BufferedReader(new InputStreamReader(myXMPStream));

// Iterate through the XMP object and print each line
String xmpLine;
while((xmpLine = d.readLine()) != null)
{
System.out.println(xmpLine);
}

// Find the Physical Memory Reference of the object
System.out.println("\nXMP InputStream is in physical memory at -> " + d);

//Create an array of bytes. Allocate numBytes of memory.
byte [] MDBytes = new byte[numBytes];

//Read the XMP metadata by calling the InputStream object’s read method.

myXMPStream.read(MDBytes);

} catch (IOException e){

System.out.println("it went really bad" + e );

}
System.out.println("\nXMP Extraction has finished.");
}
}

3 comments:

  1. Would this work on image XMP data?

    ReplyDelete
  2. Yes - it should. It will extract and print out the entire XMP data set. You would have to apply some post retrieval logic to determine which is the image XMP.

    ReplyDelete
  3. UPDATE:

    Please be aware that XPAAJ is no longer supported or available from Adobe Systems.

    Those customers familiar with XPAAJ will appreciate the new Java libraries which are now part of LiveCycle ES. These Java libraries provide more functionality than before, and allow Java developers to more easily build Java applications that work with PDF documents. Because we are now offering the entire LiveCycle ES suite in a trial version, XPAAJ is no longer available.

    http://www.adobe.com/devnet/livecycle/trial/

    ReplyDelete

Do not spam this blog! Google and Yahoo DO NOT follow comment links for SEO. If you post an unrelated link advertising a company or service, you will be reported immediately for spam and your link deleted within 30 minutes. If you want to sponsor a post, please let us know by reaching out to duane dot nickull at gmail dot com.