DOM and SAX Parser


In this tutorial we will see about DOM and SAX parser along with simple Java code to parse the XML. 

Common XML document which we are going to use for both DOM and SAX parser are 


SAMPLE XML:
<?xml version="1.0" encoding="UTF-8"?>
<organization>
 <employee mode="permanent">
  <name>John</name>
  <empid>1234</empid>
  <designation>SE</designation>
  <technology>Java</technology>
 </employee>
 <employee mode="contract">
  <name>David</name>
  <empid>4545</empid>
  <designation>Manager</designation>
  <technology>.NET</technology>
 </employee>
</organization>


DOM Parser:

Document Object Model (DOM) parser will creates a complete tree structure in memory from the XML provided and reads each node values as and when required. 

The advantage of DOM parser is provided with lot of rich functionality where developers can make use without any additional coding. Also when the document loaded into memory developers can access any part of the DOM tree and can modify.

Disadvantages are need more memory in case of huge document in size. 
it takes a little bit longer to learn how to work with it.



DOM Parser for above XML document:



import java.io.File;
import java.io.IOException;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;

public class DOMParser {

 public static void main(String[] args) {
  String file = "document.xml";
  
  parseXMLUsingDOM(file);
 }
 
 public static void parseXMLUsingDOM(String file){
  
  try {
   DocumentBuilderFactory builderFac = DocumentBuilderFactory.newInstance();
   DocumentBuilder builder = builderFac.newDocumentBuilder();
      Document doc = builder.parse(new File(file));

      System.out.println ("ROOT: " + doc.getDocumentElement().getNodeName());
            NodeList list = doc.getElementsByTagName("employee");
            System.out.println("No. Of Employees: " + list.getLength());
            
            for(int i= 0; i<list.getLength(); i++){
                    Node node = list.item(i);
                    if(node.getNodeType() == Node.ELEMENT_NODE) {
                        Element element = (Element)node;
                        System.out.println("\nEMP MODE: "+element.getAttribute("mode"));
                        
                        NodeList nList = element.getElementsByTagName("name");
                        Element nElement = (Element)nList.item(0);
                        NodeList tList = nElement.getChildNodes();
                        System.out.println("NAME: " + ((Node)tList.item(0)).getNodeValue().trim());
                        
                        nList = element.getElementsByTagName("empid");
                        nElement = (Element)nList.item(0);
                        tList = nElement.getChildNodes();
                        System.out.println("EMP_ID: " + ((Node)tList.item(0)).getNodeValue().trim());
                        
                        nList = element.getElementsByTagName("designation");
                        nElement = (Element)nList.item(0);
                        tList = nElement.getChildNodes();
                        System.out.println("DESIGNATION: " + ((Node)tList.item(0)).getNodeValue().trim());
                        
                        nList = element.getElementsByTagName("technology");
                        nElement = (Element)nList.item(0);
                        tList = nElement.getChildNodes();
                        System.out.println("TECHNOLOGY: " + ((Node)tList.item(0)).getNodeValue().trim());
                    }
            }            
     } catch (ParserConfigurationException e) {
      e.printStackTrace();  
  } catch (SAXException e) {
      e.printStackTrace();
  } catch (IOException e) {
      e.printStackTrace();
  } 
 }
}


OUTPUT:

ROOT: organization
No. Of Employees: 2

EMP MODE: permanent
NAME: John
EMP_ID: 1234
DESIGNATION: SE
TECHNOLOGY: Java

EMP MODE: contract
NAME: David
EMP_ID: 4545
DESIGNATION: Manager
TECHNOLOGY: .NET



SAX Parser:

Simple API for XML (SAX) parser will not create any internal tree structure as like DOM. It just search for the component occurrences as per input and it will gives the values. Always SAX parser will read only specific document values as requested. 

As advantage SAX parser is much more space efficient in case of a huge document, because its not creating complete tree structure like DOM. Next its faster and easy to implement with basic needs. 
From functionality side it serves less as compared to DOM.


SAX Parser for above XML document:

import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {
 
 public static void main(String[] args) {
  String file = "document.xml";
  
  parseXMLUsingSAX(file);
 }
 
 public static void parseXMLUsingSAX(String file) {
  try{
   DefaultHandler handler = createHandler();
   
   SAXParserFactory factory = SAXParserFactory.newInstance();
   SAXParser parser = factory.newSAXParser();
   
   parser.parse(file, handler);
   
  }catch (ParserConfigurationException  e) {
   e.printStackTrace();
  }catch (SAXException e) {
   e.printStackTrace();
  }catch (IOException e) {
   e.printStackTrace();
  }
 }
 
 public static DefaultHandler createHandler(){
  DefaultHandler handler = null;
  try{
   handler = new DefaultHandler(){
    boolean fName = false;
    boolean fEmpId = false;
    boolean fDesig = false;
    boolean fTech = false;
    public void startElement(String uri, String vName, String tagName, Attributes attri){
     try{
      if(tagName.equalsIgnoreCase("employee")){
       System.out.println("\nMODE: " + attri.getValue("mode"));
      }
      if(tagName.equalsIgnoreCase("name")) fName = true;
      if(tagName.equalsIgnoreCase("empid")) fEmpId = true;
      if(tagName.equalsIgnoreCase("designation")) fDesig = true;
      if(tagName.equalsIgnoreCase("technology")) fTech = true;
     }catch (Exception e) {
      e.printStackTrace();
     }     
    }
    
    public void characters(char chars[], int id, int size) throws SAXException {
     if (fName) {
      System.out.println("NAME: " + new String(chars, id, size));
      fName = false;
     }else if (fEmpId) {
      System.out.println("EMP_ID: " + new String(chars, id, size));
      fEmpId = false;
     }else if (fDesig) {
      System.out.println("DESIGNATION: " + new String(chars, id, size));
      fDesig = false;
     }else if (fTech) {
      System.out.println("TECHNOLOGY: " + new String(chars, id, size));
      fTech = false;
     }    
    }
   };
   
  }catch (Exception e) {
   e.printStackTrace();
  }
  return handler;
 }
}


OUTPUT:

MODE: permanent
NAME: John
EMP_ID: 1234
DESIGNATION: SE
TECHNOLOGY: Java

MODE: contract
NAME: David
EMP_ID: 4545
DESIGNATION: Manager
TECHNOLOGY: .NET