May 272016

ADM

How to read CSV file in Java

  • 27 May 2016
  • ADM

 

 

 

How to read CSV file in Java - images/logos/java.jpg

 

A CSV file is a comma separated values file, which allows data to be saved in a table structured format. In order to read and parse the CSV file in Java you can read the file as simple text file and then is just string manipulation. But first let have a look on a CSV file example.

Name,Address,Phone
Deidre Haider,"631 Grand Avenue Glendora, CA 91740",202-555-0150
Annette Sharrock,"230 Railroad Avenue Myrtle Beach, SC 29577",202-555-0149
Ebonie Skowron,"762 Deerfield Drive Romeoville, IL 60446",202-555-0155
Devon Huynh,"573 Hudson Street Wooster, OH 44691",202-555-0196
Cristine Riddle,"858 2nd Avenue Prior Lake, MN 55372",202-555-0182
Kristeen Ellman,"169 Creekside Drive Front Royal, VA 22630",202-555-0198
Ocie Blansett,"8 Grant Street Dracut, MA 01826",202-555-0135
Ami Feucht,"783 4th Street Leland, NC 28451",202-555-0105
Elroy Geers,"856 Grant Avenue Richmond, VA 23223",202-555-0134
Shaunte Brockwell,"1000 Park Place Mooresville, NC 28115",202-555-0140
Evonne Kellar,"309 Briarwood Drive Stow, OH 44224",202-555-0155
Gladis Schwalb,"407 13th Street Hobart, IN 46342",202-555-0109
Terina Fukuda,"25 Primrose Lane"" High Point, NC 27265",202-555-0151
Annetta Knicely,"647 Fieldstone Drive Dalton, GA 30721",202-555-0187
Rozanne Westmoreland,"36 9th Street West Voorhees, NJ 08043",202-555-0156
Louella Hutchens,"63 Route 41 Helotes, TX 78023",202-555-0113
Alesha Ennis,"505 Bank Street"" Morganton, NC 28655",202-555-0133
Carisa Motton,"114 Orchard Avenue Fort Mill, SC 29708",202-555-0153
Zane Gard,"678 Spruce Avenue Milford, MA 01757",202-555-0124
Marya Patchett,"868 2nd Street Canonsburg, PA 15317",202-555-0189

If a column contains the delimiter the column will be added between double quotes. Also if the column will contains double quote char the character will be added twice.

Having this info in mind, I choose an CSV file with comma and double quote characters in one of the fields -> this will complicate a little the parser, but having these characters is something very common for a CSV file.

Requirements

After we defined the problem let's define the requirements for our CSV file parser to address all the aspects of the problems:

  1. parse files with multiple separators. e.g. comma or tab.
  2. to handle the separators in the text.
  3. to handle double quote in the text.

Example

You can easily adjust the parser method to add more delimiters. For more details about the solution please follow the comments within the code.

package com.admfactory;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;

public class CSVParser {
	/** characters used as delimiters */
	private char[] separators = {',', '\t'};
	
	/** when the delimiters appears in the text the value will be between two double quotas */
	private char specialChars = '"';
	
	/**
	 * Method used to spit each line into values
	 * 
	 * @param line
	 * @return the array of values
	 */
	private String[] lineParser(String line) {
		String[] result = null;
		
		/** Using ArrayList as the number of values are unknown at this stage */
		ArrayList<String> parsedLine = new ArrayList<String>();
		
		int len = line.length();
		int i = 0;
		
		/** iterate through all the chars in the line */
		while (i < len) {
			int nextSep = len;
			/** Get the next separator */
			for (int j = 0; j < separators.length; ++j) {
				int temp = line.indexOf(separators[j], i);
				if ((temp == -1) || (temp >= nextSep))
					continue;
				nextSep = temp;
			}
			
			/** Place the special separator at the end of the string */
			int nextSpecialSep = len;
			
			/** Check if there is any special separator */
			int temp = line.indexOf(specialChars, i);
			if ((temp == -1) || (temp >= nextSpecialSep))
				nextSpecialSep = len;
			else
				nextSpecialSep = temp;
			
			/** if we are at the special separator get the text until the next special separator */
			if (nextSpecialSep == i) {
				char c = line.charAt(i);
				/** check if there is any double quote chars in the text */
				int d = line.indexOf((c + "") + (c + ""), i + 1);
				
				/** if there are two double quota chars jump to the next one - are part of the text */
				int end = line.indexOf(c, d >= 0 ? d + 3 : i + 1);
				if (end == -1) {
					end = len;
				}
				String toAdd = line.substring(i + 1, end);
				/** Replace two double quota with one double quota */
				toAdd = toAdd.replaceAll((c + "") + (c + ""), c + "");
				
				parsedLine.add(toAdd);
				i = end + 1;
			}
			/** if we are at a normal separator, ignore the separator and jump to the next char */
			else if (nextSep == i) {
				++i;
			}
			/** Copy the value in the result string */
			else {
				parsedLine.add(line.substring(i, nextSep));
				i = nextSep;
			}
		}
		
		/** Convert the result to String[] */
		result = parsedLine.toArray(new String[parsedLine.size()]);
		return result;
	}
	
	/**
	 * 
	 * Method used to parse the file
	 * 
	 * @param path
	 *           to the file
	 * @return array of all lines
	 */
	public ArrayList<String[]> parser(String path) {
		BufferedReader br = null;
		ArrayList<String[]> result = new ArrayList<String[]>();
		try {
			
			br = new BufferedReader(new FileReader(path));
			
			/** Parsing each line in the file */
			String line = "";
			while ((line = br.readLine()) != null) {
				
				/** Parse each line into values */
				String[] values = lineParser(line);
				
				/** Adding the lines to the array list */
				result.add(values);
			}
		}
		catch (Exception e) {
			/** Just display the error */
			e.printStackTrace();
		}
		finally {
			/** Closing the the stream */
			if (br != null) {
				try {
					br.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		return result;
	}
	
	/**
	 * main method for testing
	 * 
	 * @param args
	 */
	public static void main(String[] args)	{
		String path = "names.csv";
		System.out.println("CSV Parser Example");
		System.out.println("Parsing file " + path);
		CSVParser parser = new CSVParser();
		ArrayList<String[]> lines = parser.parser(path);
		
		System.out.println("File Content");
		for (int i = 0; i < lines.size(); i++) {
			String[] line = lines.get(i);
			for (int j = 0; j < line.length; j++) {
				String print = String.format("%-45s", line[j]);
				System.out.print(print);
			}
			System.out.println();
		}
	}
}

Output

CSV Parser Example
Parsing file names.csv
File Content
Name                                         Address                                      Phone                                        
Deidre Haider                                631 Grand Avenue Glendora, CA 91740          202-555-0150                                 
Annette Sharrock                             230 Railroad Avenue Myrtle Beach, SC 29577   202-555-0149                                 
Ebonie Skowron                               762 Deerfield Drive Romeoville, IL 60446     202-555-0155                                 
Devon Huynh                                  573 Hudson Street Wooster, OH 44691          202-555-0196                                 
Cristine Riddle                              858 2nd Avenue Prior Lake, MN 55372          202-555-0182                                 
Kristeen Ellman                              169 Creekside Drive Front Royal, VA 22630    202-555-0198                                 
Ocie Blansett                                8 Grant Street Dracut, MA 01826              202-555-0135                                 
Ami Feucht                                   783 4th Street Leland, NC 28451              202-555-0105                                 
Elroy Geers                                  856 Grant Avenue Richmond, VA 23223          202-555-0134                                 
Shaunte Brockwell                            1000 Park Place Mooresville, NC 28115        202-555-0140                                 
Evonne Kellar                                309 Briarwood Drive Stow, OH 44224           202-555-0155                                 
Gladis Schwalb                               407 13th Street Hobart, IN 46342             202-555-0109                                 
Terina Fukuda                                25 Primrose Lane" High Point, NC 27265       202-555-0151                                 
Annetta Knicely                              647 Fieldstone Drive Dalton, GA 30721        202-555-0187                                 
Rozanne Westmoreland                         36 9th Street West Voorhees, NJ 08043        202-555-0156                                 
Louella Hutchens                             63 Route 41 Helotes, TX 78023                202-555-0113                                 
Alesha Ennis                                 505 Bank Street" Morganton, NC 28655         202-555-0133                                 
Carisa Motton                                114 Orchard Avenue Fort Mill, SC 29708       202-555-0153                                 
Zane Gard                                    678 Spruce Avenue Milford, MA 01757          202-555-0124                                 
Marya Patchett                               868 2nd Street Canonsburg, PA 15317          202-555-0189                                 

Analyzing the output we can see that the parser meet all our requirements established before.

 

 

 

References