Access different file formats in Java

Posted by Vinayak On 12:13 AM 0 comments

How do I access the XYZ file format in java?

Specifications for many file formats can be found at Wotsit. A large database of file types can be found at www.file-extensions.org. Marco Schmidt maintains several very useful lists of links about processing a multitude of document formats.

An interesting article about Microsoft's binary file formats, especially DOC and XLS, is Why are the Microsoft Office file formats so complicated? (And some workarounds) It also mentions some alternatives to dealing with those formats directly.

Access

  • JDBC/ODBC bridge - JDBC driver for ODBC databases, comes as part of the JDK; on Linux, you'll have to get ODBC up and running first: http://www.unixodbc.org/
  • Jackcess - library to read and write MDB files
  • HXTT Access - commercial pure Java JDBC driver for MS Access

CGM

  • cgmva - an applet to display CGM files; comes with source code

CHM

  • JChm - library to read CHM files

Excel

  • Ostermiller Utils, CSVObjects, CSVBeans and opencsv - libraries to read and write CSV files. CSV is not as easy to read and write as it first looks - once all the special cases are considered, one might as well use a library.
  • POI - library to read and write XLS files
  • JExcelAPI - library to read and write XLS files
  • Opinions on jExcelApi vs. POI: here and here
  • jXLS - library for writing XLS files based on templates
  • Java2Excel - library for creating Excel files based on Collections
  • It is possible to use JDBC to read Excel files
  • Obba works with Excel spreadsheets on Windows
  • OpenXLS - "OpenXLS is the open source version of ExtenXLS - a Java spreadsheet SDK that allows you to read, modify and create Java Excel spreadsheets from your Java applications."

HDF (Hierarchical Data Format)

Image files

Matlab

OpenDocument (ODF)

  • basic Java code for reading ODF files is here
  • ODFDOM is a Java library for accessing ODF files.
  • jDocument.org has an open-source library for accessing all Open Document file types.
  • Obba works with OpenOffice? spreadsheets

Office Open XML

  • These are the new XML-based Microsoft Office formats.
  • OpenXML4J
  • docx4j - create and edit docx documents using a JAXB content model matching the WordML schema
  • Apache POI 3.5 (which is in beta, but is usable already) implements these formats.

OpenOffice Java API

  • OpenOffice can read a number of file formats, and makes them accessible through its API. A starting point might be this article, this article and of course the OO developer site
  • Some introductory information about the OO file format can be found here and here
  • oooview is an OO Viewer written in Java.
  • JODConverter is a Java library that uses the OO Java API to perform document conversions between any formats supported by OO

Outlook MSG

  • The Jakarta POI project developed some code that can read the texual contents of Outlook's MSG files. This page talks about that.

PDF

  • PDF is a hard to read format. The best one can do is try to extract the text contained in a PDF file.
  • iText - library to create PDFs
  • FOP - libray to create PDFs (and other formats) from XML by using XSL-FO transformations
  • FlyingSaucer - library to convert CSS-styled XHTML to PDF
  • PDFBox - library to create PDFs; can also extract text
  • PDF Clown - general-purpose library to read/create/modify PDF files. It features a rich multi-layered object model that allows access even to each single content stream instruction. wikipedia article
  • JPedal - library to extract text from PDFs
  • PDFTextStream - commercial library to extract text from PDFs
  • Adobe AcrobatViewer for JavaBean - freeware, library to display and print PDFs; introductory article ; this library hasn't been updated in a long time and has problems displaying files that were created with recent PDF versions.
  • PDF Renderer is a more up-to-date PDF viewer that renders using Java2D. It can also be used to print PDFs.

PowerPoint

  • The Jakarta POI project developed some code that can open and (to a limited extent) edit PPT files. This page talks about it.

Project

  • The MPXJ library can work with several Project file formats.

QIF (used by Microsoft Money and Quicken)

  • Buddi and Eurobudget are Java applications that can import and export QIF files (and thus contain code you may be able to use in your application). Both are licensed under the GPL.

RTF

  • iText - library to create RTFs
  • JavaCC - is a lexer/parser for which an RTF grammar is available. From that an RTF reader can be constructed.

Visio

  • The Jakarta POI project developed some code that can read Visio files. This page talks about that.

Word

  • POI - library to read and write DOC files. (Note that according to the POI-Dev mailing list this is unmaintained code. If it works for you - great, if not, then it will likely not be fixed soon.) Slow progress is being made, though, and it can be used for extracting the text of a document.
  • docx4j - for docx files (as opposed to doc files)
  • WordApi.exe is native Windows component with a Java interface, which lets you create Word documents, and alter word templates. Some impressions about it can be found here.

Something else?

If you encounter an obscure format for which no library is available, it may be feasible to create a reader for it if you have a file format description (which may be available on Wotsit, see link above). Several libraries, so-called lexers and parsers, are available that help in creating a reader, especially if the file format is ASCII, and not binary. You will need knowledge of regular expressions, though. Some file formats that have been tackled using this approach include RTF, CSV, HPGL and PBM/PGM/PPM. Lexers are easier to start with, but parsers can do more of the work for you. All these have ready-to-use examples on their web sites.

Reactions: 

Check if this is correct:

1) All text content must be between a start and an end tag
2) The document must have 1 root element
3) every start tag must have a matching end tag

Reactions: 

The directory structure of a web application consists of two parts.
A private directory called WEB-INF
A public resource directory which contains public resource folder.

WEB-INF folder consists of
1. web.xml
2. classes directory
3. lib directory

Reactions: 

Life cycle methods of a Servlet

Posted by Vinayak On 3:46 AM 0 comments

The javax.servlet.Servlet interface defines the three methods known as life-cycle method.
public void init(ServletConfig config) throws ServletException
public void service( ServletRequest req, ServletResponse res) throws ServletException, IOException
public void destroy()
First the servlet is constructed, then initialized wih the init() method.
Any request from client are handled initially by the service() method before delegating to the doXxx() methods in the case of HttpServlet.

The servlet is removed from service, destroyed with the destroy() methid, then garbaged collected and finalized

Reactions: 

init() This method is called by the servlet container to indicate to the servlet
that it must initialize itself and get ready for service. The container
passes an object of type ServletConfig as a parameter.

service() This method is called by the servlet container for each request from the
client to allow the servlet to respond to the request.

destroy() This method is called by the servlet container to indicate to the servlet
that it must clean up itself, release any required resources, and get ready
to go out of service.

getServletConfig() Returns information about the servlet, such as a parameter to the
init() method.

getServletInfo() The implementation class must return information about the servlet,
such as the author, the version, and copyright information

Reactions: 

The wait() method lets a thread to be added to a waiting pool, unless it is invoked again to run
The notify() method is used to send a signal to one and only one of the threads that are waiting in that same object's waiting pool.
The notify() method can NOT specify which waiting thread to notify.
The method notifyAll() works in the same way as notify(), only it sends the signal to all of the threads waiting on the object.
All three methods—wait(), notify(), and notifyAll()—must be called from within a synchronized context! A thread invokes wait() or notify() on a particular object, and the thread must currently hold the lock on that object.

Reactions: 

synchronized methods prevent more than one thread from accessing an object's critical method code simultaneously.
You can use the synchronized keyword as a method modifier, or to start a synchronized block of code.
To synchronize a block of code (in other words, a scope smaller than the whole method), you must specify an argument that is the object whose lock you want to synchronize on.
While only one thread can be accessing synchronized code of a particular instance, multiple threads can still access the same object's unsynchronized code.
When a thread goes to sleep, its locks will be unavailable to other threads.
static methods can be synchronized, using the lock from the java.lang.Class instance representing that class.

Reactions: 

Sleep, Yield, and Join

Posted by Vinayak On 3:18 AM 0 comments

Sleeping is used to delay execution for a period of time, and no locks are released when a thread goes to sleep.
A sleeping thread is guaranteed to sleep for at least the time specified in the argument to the sleep() method (unless it's interrupted), but there is no guarantee as to when the newly awakened thread will actually return to running.
The sleep() method is a static method that sleeps the currently executing thread's state. One thread cannot tell another thread to sleep.
The setPriority() method is used on Thread objects to give threads a priority of between 1 (low) and 10 (high), although priorities are not guaranteed, and not all JVMs recognize 10 distinct priority levels—some levels may be treated as effectively equal.
If not explicitly set, a thread's priority will have the same priority as the priority of the thread that created it.
The yield() method may cause a running thread to back out if there are runnable threads of the same priority. There is no guarantee that this will happen, and there is no guarantee that when the thread backs out there will be a different thread selected to run. A thread might yield and then immediately reenter the running state.
The closest thing to a guarantee is that at any given time, when a thread is running it will usually not have a lower priority than any thread in therunnable state. If a low-priority thread is running when a high-priority thread enters runnable, the JVM will usually preempt the running low-priority thread and put the high-priority thread in.
When one thread calls the join() method of another thread, the currently running thread will wait until the thread it joins with has completed.

Reactions: 

Once a new thread is started, it will always enter the runnable state.
The thread scheduler can move a thread back and forth between the runnable state and the running state.
For a typical single-processor machine, only one thread can be running at a time, although many threads may be in the runnable state.
There is no guarantee that the order in which threads were started determines the order in which they'll run.
There's no guarantee that threads will take turns in any fair way. It's up to the thread scheduler, as determined by the particular virtual machine implementation. If you want a guarantee that your threads will take turns regardless of the underlying JVM, you can use the sleep() method. This prevents one thread from hogging the running process while another thread starves. (In most cases, though, yield() works well enough to encourage
your threads to play together nicely.)
A running thread may enter a blocked/waiting state by a wait(), sleep(), or join() call.
A running thread may enter a blocked/waiting state because it can't acquire the lock for a synchronized block of code.
When the sleep or wait is over, or an object's lock becomes available, thethread can only reenter the runnable state. It will go directly from waiting to running (well, for all practical purposes anyway).
A dead thread cannot be started again

Reactions: 

Threads can be created by extending Thread and overriding the public void run() method.
Thread objects can also be created by calling the Thread constructor that takes a Runnable argument. The Runnable object is said to be the target of the thread.
You can call start() on a Thread object only once. If start() is called more than once on a Thread object, it will throw a RuntimeException.
It is legal to create many Thread objects using the same Runnable object as the target.
When a Thread object is created, it does not become a thread of execution until its start() method is invoked. When a Thread object exists but hasn't been started, it is in the new state and is not considered alive.

Reactions: 

Both of these java.util classes provide
A sort() method. Sort using a Comparator or sort using natural order.
A binarySearch() method. Search a pre-sorted array or List.
Arrays.asList() creates a List from an array and links them together.
Collections.reverse() reverses the order of elements in a List.
Collections.reverseOrder() returns a Comparator that sorts in reverse.
Lists and Sets have a toArray() method to create arrays

Reactions: 

Sorting and Searching Arrays and Lists

Posted by Vinayak On 3:06 AM 0 comments

Sorting can be in natural order, or via a Comparable or many Comparators.
Implement Comparable using compareTo(); provides only one sort order.
Create many Comparators to sort a class many ways; implement compare().
To be sorted and searched, a List's elements must be comparable.
To be searched, an array or List must first be sorted

Reactions: 

ArrayList: Fast iteration and fast random access.
Vector: It's like a slower ArrayList, but it has synchronized methods.
LinkedList: Good for adding elements to the ends, i.e., stacks and queues.
HashSet: Fast access, assures no duplicates, provides no ordering.
LinkedHashSet: No duplicates; iterates by insertion order.
TreeSet: No duplicates; iterates in sorted order.
HashMap: Fastest updates (key/value pairs); allows one null key,many null values.
Hashtable: Like a slower HashMap (as with Vector, due to its synchronized methods). No null values or null keys allowed.
LinkedHashMap: Faster iterations; iterates by insertion order or last accessed;allows one null key, many null values.
TreeMap: A sorted map.
PriorityQueue: A to-do list ordered by the elements' priority.

Reactions: 

Collections

Posted by Vinayak On 2:45 AM 0 comments

Common collection activities include adding objects, removing objects, verifying object inclusion, retrieving objects, and iterating.
Three meanings for "collection":
* collection: Represents the data structure in which objects are stored
* Collection: java.util interface from which Set and List extend
* Collections: A class that holds static collection utility methods

Four basic flavors of collections include Lists, Sets, Maps, Queues:
* Lists of things Ordered, duplicates allowed, with an index.
* Sets of things May or may not be ordered and/or sorted; duplicates
not allowed.
* Maps of things with keys May or may not be ordered and/or sorted;duplicate keys are not allowed.
* Queues of things to process Ordered by FIFO or by priority.

Four basic sub-flavors of collections Sorted, Unsorted, Ordered, Unordered.
* Ordered Iterating through a collection in a specific, non-random order.
* Sorted Iterating through a collection in a sorted order.
* Sorting can be alphabetic, numeric, or programmer-defined

Reactions: 

Overriding hashCode() and equals()

Posted by Vinayak On 2:37 AM 0 comments

* equals(), hashCode(), and toString() are public.
* Override toString() so that System.out.println() or other methods can see something useful, like your object's state.
* Use == to determine if two reference variables refer to the same object.
* Use equals() to determine if two objects are meaningfully equivalent.
* If you don't override equals(), your objects won't be useful hashing keys.
* If you don't override equals(), different objects can't be considered equal.
* Strings and wrappers override equals() and make good hashing keys.
* When overriding equals(), use the instanceof operator to be sure you're evaluating an appropriate class.
* When overriding equals(), compare the objects' significant attributes.
* Highlights of the equals() contract:
Reflexive: x.equals(x) is true.
Symmetric: If x.equals(y) is true, then y.equals(x) must be true.
Transitive: If x.equals(y) is true, and y.equals(z) is true,then z.equals(x) is true.
Consistent: Multiple calls to x.equals(y) will return the same result.
Null: If x is not null, then x.equals(null) is false.
If x.equals(y) is true, then x.hashCode() == y.hashCode() is true.
If you override equals(), override hashCode().

*HashMap, HashSet, Hashtable, LinkedHashMap, & LinkedHashSet use hashing.
* An efficient hashCode() override distributes keys evenly across its buckets.
* An overridden equals() must be at least as precise as its hashCode() mate.
* To reiterate: if two objects are equal, their hashcodes must be equal.
* It's legal for a hashCode() method to return the same value for all instances
(although in practice it's very inefficient).

Highlights of the hashCode() contract:
* Consistent: multiple calls to x.hashCode() return the same integer.
* If x.equals(y) is true, x.hashCode() == y.hashCode() is true.
q If x.equals(y) is false, then x.hashCode() == y.hashCode() can be either true or false, but false will tend to create better efficiency.
* transient variables aren't appropriate for equals() and hashCode().

Reactions: 

Serialization

Posted by Vinayak On 2:14 AM 0 comments

The classes you need to understand are all in the java.io package; they include: ObjectOutputStream and ObjectInputStream primarily, and FileOutputStream and FileInputStream because you will use them to create the low-level streams that the ObjectXxxStream classes will use.
* A class must implement the Serializable interface before its objects can be
serialized.
* The ObjectOutputStream.writeObject() method serializes objects, and the ObjectInputStream.readObject() method deserializes objects.
* If you mark an instance variable transient, it will not be serialized even thought the rest of the object's state will be.
* You can supplement a class's automatic serialization process by implementing the writeObject() and readObject() methods. If you do this, embedding calls to defaultWriteObject() and defaultReadObject(), respectively, will handle the part of serialization that happens normally.
* If a superclass implements Serializable, then its subclasses do automatically.
* If a superclass doesn't implement Serializable, then when a subclass object is deserialized, the superclass constructor will run.

Reactions: 

String objects are immutable, and String reference variables are not.
If you create a new String without assigning it, it will be lost to your program.
If you redirect a String reference to a new String, the old String can be lost.
String methods use zero-based indexes, except for the second argument of substring().
The String class is final—its methods can't be overridden.
When the JVM finds a String literal, it is added to the String literal pool.
Strings have a method: length(), arrays have an attribute named length.
The StringBuffer's API is the same as the new StringBuilder's API, except that StringBuilder's methods are not synchronized for thread safety.
StringBuilder methods should run faster than StringBuffer methods.
All of the following bullets apply to both StringBuffer and StringBuilder:
* They are mutable—they can change without creating a new object.
* StringBuffer methods act on the invoking object, and objects can change without an explicit assignment in the statement.
* StringBuffer equals() is not overridden; it doesn't compare values.
* Remember that chained methods are evaluated from left to right.
* String methods to remember: charAt(), concat(), equalsIgnoreCase(), length(), replace(), substring(), toLowerCase(), toString(), toUpperCase(), and trim().
* Stringbuffer methods to remember: append(), delete(), insert(), reverse(), and toString()

Reactions: 

Exceptions come in two flavors: checked and unchecked.

Checked exceptions include all subtypes of Exception, excluding classes that extend RuntimeException.

Checked exceptions are subject to the handle or declare rule; any methodthat might throw a checked exception (including methods that invoke methods that can throw a checked exception) must either declare the exception using throws, or handle the exception with an appropriate try/catch.
Subtypes of Error or RuntimeException are unchecked, so the compiler doesn't enforce the handle or declare rule. You're free to handle them, or to declare them, but the compiler doesn't care one way or the other.
If you use an optional finally block, it will always be invoked, regardless of whether an exception in the corresponding try is thrown or not, and regardless of whether a thrown exception is caught or not.
The only exception to the finally-will-always-be-called rule is that a finally will not be invoked if the JVM shuts down. That could happen if code from the try or catch blocks calls System.exit().
Just because finally is invoked does not mean it will complete. Code in the finally block could itself raise an exception or issue a System.exit().
Uncaught exceptions propagate back through the call stack, starting from the method where the exception is thrown and ending with either the first method that has a corresponding catch for that exception type or a JVM shutdown (which happens if the exception gets to main(), and main() is "ducking" the exception by declaring it).
You can create your own exceptions, normally by extending Exception or one of its subtypes. Your exception will then be considered a checked exception, and the compiler will enforce the handle or declare rule for that exception.
All catch blocks must be ordered from most specific to most general.
If you have a catch clause for both IOException and Exception, you must put the catch for IOException first in your code. Otherwise, the IOException would be caught by catch(Exception e), because a catch argument can catch the specified exception or any of its subtypes! The compiler will stop you from defining catch clauses that can never be reached.
Some exceptions are created by programmers, some by the JVM.


Unchecked exceptions :

  • represent defects in the program (bugs) - often invalid arguments passed to a non-private method. To quote from The Java Programming Language, by Gosling, Arnold, and Holmes : "Unchecked runtime exceptions represent conditions that, generally speaking, reflect errors in your program's logic and cannot be reasonably recovered from at run time."
  • are subclasses of RuntimeException, and are usually implemented using IllegalArgumentException, NullPointerException, or IllegalStateException
  • a method is not obliged to establish a policy for the unchecked exceptions thrown by its implementation (and they almost always do not do so)
Checked exceptions :
  • represent invalid conditions in areas outside the immediate control of the program (invalid user input, database problems, network outages, absent files)
  • are subclasses of Exception
  • a method is obliged to establish a policy for all checked exceptions thrown by its implementation (either pass the checked exception further up the stack, or handle it somehow)

Reactions: 

Try, Catch , Finally

Posted by Vinayak On 12:55 AM 0 comments

It is illegal to use a try clause without either a catch clause or a finally
clause. A try clause by itself will result in a compiler error. Any catch clauses must
immediately follow the try block. Any finally clause must immediately follow the last
catch clause (or it must immediately follow the try block if there is no catch). It is legal
to omit either the catch clause or the finally clause, but not both

Reactions: 

Initialization Blocks

Posted by Vinayak On 12:45 AM 0 comments

Static initialization blocks run once, when the class is first loaded.
Instance initialization blocks run every time a new instance is created. They run after all super-constructors and before the constructor's code has run.
If multiple init blocks exist in a class, they follow the rules stated above, AND they run in the order in which they appear in the source file

Reactions: 

Passing Variables into Methods

Posted by Vinayak On 12:43 AM 0 comments

Methods can take primitives and/or object references as arguments.
Method arguments are always copies.
Method arguments are never actual objects (they can be references to objects).
A primitive argument is an unattached copy of the original primitive.
A reference argument is another copy of a reference to the original object

Reactions: 

When an array of objects is instantiated, objects within the array are not instantiated automatically, but all the references get the default value of null.
When an array of primitives is instantiated, elements get default values.
Instance variables are always initialized with a default value.
qLocal/automatic/method variables are never given a default value. If you attempt to use one before initializing it, you'll get a compiler error

Reactions: 

Object Creation

Posted by Vinayak On 12:38 AM 0 comments

When creating a new object, e.g., Button b = new Button();, three
things happen:
Make a reference variable named b, of type Button
Create a new Button object
Assign the Button object to the reference variable b

Reactions: 

Scope

Posted by Vinayak On 12:36 AM 0 comments

Scope refers to the lifetime of a variable.
There are four basic scopes:
Static variables live basically as long as their class lives.
Instance variables live as long as their object lives.
Local variables live as long as their method is on the stack; however, if their method invokes another method, they are temporarily unavailable.
Block variables (e.g.., in a for or an if) live until the block completes

Reactions: 

Stack and Heap

Posted by Vinayak On 12:32 AM 0 comments

Local variables (method variables) live on the stack.
Objects and their instance variables live on the heap

Reactions: 
Blog Widget by LinkWithin