How to read webpage source code through Java?

 

We might be seeing "view page source" option in all web browsers. Just by right click and by selecting "view page source" it will give us the complete client side source code of that particular page. So how we can get this done by using Java code? For this we need to use URLConnection class which establish the connection to the server and next by using InputStream class we can read complete page content as bytes. 

Next by iterating InputStream instance (byte by byte) we can get the complete page source as bytes and we can store it in a file for our reference. Lets see simple example to read web page source code through Java.


import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;

public class URLReader {
 
 public static void main(String[] args) {
  try{
   URL url = new URL("http://docs.oracle.com/javase/6/docs/api/java/net/URLConnection.html");
   URLConnection urlCon = url.openConnection();
   InputStream is = urlCon.getInputStream();
   
   File myFile = new File("C://URLConnection.html");
   if(!(myFile.exists())){ 
             myFile.createNewFile();
   }
   
   FileWriter fWrite = new FileWriter(myFile, true);  
         BufferedWriter bWrite = new BufferedWriter(fWrite); 
   int i=0;
   while((i=is.read()) != -1){
       bWrite.write((char)i); 
   }

   bWrite.close();
   System.out.println("Web page reading completed...");
   
  }catch (Exception e) {
   e.printStackTrace();
  } 
  
 }
}


OUTPUT:
read webpage source code through Java

read webpage source code through Java



No comments:
Write comments