How to read CSV file in Java
CSV file parser java
parse csv file in java
java csv parser
A CSV file is a comma separated values file, which allows data to be saved in a table structured format. In order to read and parse the CSV file in Java you can read the file as simple text file and then is just string manipulation. But first let have a look on a CSV file example.
Name,Address,Phone
Deidre Haider,"631 Grand Avenue Glendora, CA 91740",202-555-0150
Annette Sharrock,"230 Railroad Avenue Myrtle Beach, SC 29577",202-555-0149
Ebonie Skowron,"762 Deerfield Drive Romeoville, IL 60446",202-555-0155
Devon Huynh,"573 Hudson Street Wooster, OH 44691",202-555-0196
Cristine Riddle,"858 2nd Avenue Prior Lake, MN 55372",202-555-0182
Kristeen Ellman,"169 Creekside Drive Front Royal, VA 22630",202-555-0198
Ocie Blansett,"8 Grant Street Dracut, MA 01826",202-555-0135
Ami Feucht,"783 4th Street Leland, NC 28451",202-555-0105
Elroy Geers,"856 Grant Avenue Richmond, VA 23223",202-555-0134
Shaunte Brockwell,"1000 Park Place Mooresville, NC 28115",202-555-0140
Evonne Kellar,"309 Briarwood Drive Stow, OH 44224",202-555-0155
Gladis Schwalb,"407 13th Street Hobart, IN 46342",202-555-0109
Terina Fukuda,"25 Primrose Lane"" High Point, NC 27265",202-555-0151
Annetta Knicely,"647 Fieldstone Drive Dalton, GA 30721",202-555-0187
Rozanne Westmoreland,"36 9th Street West Voorhees, NJ 08043",202-555-0156
Louella Hutchens,"63 Route 41 Helotes, TX 78023",202-555-0113
Alesha Ennis,"505 Bank Street"" Morganton, NC 28655",202-555-0133
Carisa Motton,"114 Orchard Avenue Fort Mill, SC 29708",202-555-0153
Zane Gard,"678 Spruce Avenue Milford, MA 01757",202-555-0124
Marya Patchett,"868 2nd Street Canonsburg, PA 15317",202-555-0189
If a column contains the delimiter the column will be added between double quotes. Also if the column will contains double quote char the character will be added twice.
Having this info in mind, I choose an CSV file with comma and double quote characters in one of the fields -> this will complicate a little the parser, but having these characters is something very common for a CSV file.
Requirements
After we defined the problem let's define the requirements for our CSV file parser to address all the aspects of the problems:
- parse files with multiple separators. e.g. comma or tab.
- to handle the separators in the text.
- to handle double quote in the text.
Example
You can easily adjust the parser method to add more delimiters. For more details about the solution please follow the comments within the code.
package com.admfactory;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
public class CSVParser {
/** characters used as delimiters */
private char[] separators = {',', '\t'};
/** when the delimiters appears in the text the value will be between two double quotas */
private char specialChars = '"';
/**
* Method used to spit each line into values
*
* @param line
* @return the array of values
*/
private String[] lineParser(String line) {
String[] result = null;
/** Using ArrayList as the number of values are unknown at this stage */
ArrayList<String> parsedLine = new ArrayList<String>();
int len = line.length();
int i = 0;
/** iterate through all the chars in the line */
while (i < len) {
int nextSep = len;
/** Get the next separator */
for (int j = 0; j < separators.length; ++j) {
int temp = line.indexOf(separators[j], i);
if ((temp == -1) || (temp >= nextSep))
continue;
nextSep = temp;
}
/** Place the special separator at the end of the string */
int nextSpecialSep = len;
/** Check if there is any special separator */
int temp = line.indexOf(specialChars, i);
if ((temp == -1) || (temp >= nextSpecialSep))
nextSpecialSep = len;
else
nextSpecialSep = temp;
/** if we are at the special separator get the text until the next special separator */
if (nextSpecialSep == i) {
char c = line.charAt(i);
/** check if there is any double quote chars in the text */
int d = line.indexOf((c + "") + (c + ""), i + 1);
/** if there are two double quota chars jump to the next one - are part of the text */
int end = line.indexOf(c, d >= 0 ? d + 3 : i + 1);
if (end == -1) {
end = len;
}
String toAdd = line.substring(i + 1, end);
/** Replace two double quota with one double quota */
toAdd = toAdd.replaceAll((c + "") + (c + ""), c + "");
parsedLine.add(toAdd);
i = end + 1;
}
/** if we are at a normal separator, ignore the separator and jump to the next char */
else if (nextSep == i) {
++i;
}
/** Copy the value in the result string */
else {
parsedLine.add(line.substring(i, nextSep));
i = nextSep;
}
}
/** Convert the result to String[] */
result = parsedLine.toArray(new String[parsedLine.size()]);
return result;
}
/**
*
* Method used to parse the file
*
* @param path
* to the file
* @return array of all lines
*/
public ArrayList<String[]> parser(String path) {
BufferedReader br = null;
ArrayList<String[]> result = new ArrayList<String[]>();
try {
br = new BufferedReader(new FileReader(path));
/** Parsing each line in the file */
String line = "";
while ((line = br.readLine()) != null) {
/** Parse each line into values */
String[] values = lineParser(line);
/** Adding the lines to the array list */
result.add(values);
}
}
catch (Exception e) {
/** Just display the error */
e.printStackTrace();
}
finally {
/** Closing the the stream */
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return result;
}
/**
* main method for testing
*
* @param args
*/
public static void main(String[] args) {
String path = "names.csv";
System.out.println("CSV Parser Example");
System.out.println("Parsing file " + path);
CSVParser parser = new CSVParser();
ArrayList<String[]> lines = parser.parser(path);
System.out.println("File Content");
for (int i = 0; i < lines.size(); i++) {
String[] line = lines.get(i);
for (int j = 0; j < line.length; j++) {
String print = String.format("%-45s", line[j]);
System.out.print(print);
}
System.out.println();
}
}
}
Output
CSV Parser Example
Parsing file names.csv
File Content
Name Address Phone
Deidre Haider 631 Grand Avenue Glendora, CA 91740 202-555-0150
Annette Sharrock 230 Railroad Avenue Myrtle Beach, SC 29577 202-555-0149
Ebonie Skowron 762 Deerfield Drive Romeoville, IL 60446 202-555-0155
Devon Huynh 573 Hudson Street Wooster, OH 44691 202-555-0196
Cristine Riddle 858 2nd Avenue Prior Lake, MN 55372 202-555-0182
Kristeen Ellman 169 Creekside Drive Front Royal, VA 22630 202-555-0198
Ocie Blansett 8 Grant Street Dracut, MA 01826 202-555-0135
Ami Feucht 783 4th Street Leland, NC 28451 202-555-0105
Elroy Geers 856 Grant Avenue Richmond, VA 23223 202-555-0134
Shaunte Brockwell 1000 Park Place Mooresville, NC 28115 202-555-0140
Evonne Kellar 309 Briarwood Drive Stow, OH 44224 202-555-0155
Gladis Schwalb 407 13th Street Hobart, IN 46342 202-555-0109
Terina Fukuda 25 Primrose Lane" High Point, NC 27265 202-555-0151
Annetta Knicely 647 Fieldstone Drive Dalton, GA 30721 202-555-0187
Rozanne Westmoreland 36 9th Street West Voorhees, NJ 08043 202-555-0156
Louella Hutchens 63 Route 41 Helotes, TX 78023 202-555-0113
Alesha Ennis 505 Bank Street" Morganton, NC 28655 202-555-0133
Carisa Motton 114 Orchard Avenue Fort Mill, SC 29708 202-555-0153
Zane Gard 678 Spruce Avenue Milford, MA 01757 202-555-0124
Marya Patchett 868 2nd Street Canonsburg, PA 15317 202-555-0189
Analyzing the output we can see that the parser meet all our requirements established before.