Monday, 7 August 2017

How to tokenize an input file in java



i'm doing tokenizing a text file in java. I want to read an input file, tokenize it and write a certain character that has been tokenized into an output file. This is what i've done so far:



 

package org.apache.lucene.analysis;

import java.io.*;



class StringProcessing
{
// Create BufferedReader class instance
public static void main(String[] args) throws IOException
{
InputStreamReader input = new InputStreamReader(System.in);

BufferedReader keyboardInput = new BufferedReader(input);
System.out.print("Please enter a java file name: ");
String filename = keyboardInput.readLine();
if(!filename.endsWith(".DAT"))
{
System.out.println("This is not a DAT file.");
System.exit(0);
}
File File = new File(filename);
if(File.exists())

{
FileReader file = new FileReader(filename);
StreamTokenizer streamTokenizer = new StreamTokenizer(file);
int i=0;
int numberOfTokensGenerated = 0;
while(i != StreamTokenizer.TT_EOF)
{
i = streamTokenizer.nextToken();
numberOfTokensGenerated++;
}

// Output number of characters in the line
System.out.println("Number of tokens = " + numberOfTokensGenerated);
// Output tokens
for (int counter=0; counter < numberOfTokensGenerated; counter++)
{
char character = file.toString().charAt(counter);
if (character == ' ') System.out.println();
else System.out.print(character);
}
}
else

{
System.out.println("File does not exist!");
System.exit(0);
}



System.out.println("\n");
}//end main
}//end class



When i run this code, this is what i get:



Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT
Number of tokens = 129
java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25
at java.lang.String.charAt(Unknown Source)
at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)



The input file will look like this:




`-K1 Account



--Op1 withdraw



---Param1 an



----Type Int



---Param2 amount




----Type Int



--Op2 deposit



---Param1 an



----Type Int



---Param2 Amount




----Type Int



--CA1 acNo



---Type Int



-K2 CheckAccount



--SC Account




--CA1 credit_limit



---Type Int



-K3 Customer



--CA1 name



---Type String




-K4 Transaction



--CA1 date



---Type Date



--CA2 time



---Type Time




-K5 CheckBook



-K6 Check



-K7 BalanceAccount



--SC Account`



I just want to read the string which are starts with -K1, -K2, -K3, and so on... can anyone help me?


Answer




The problem is with this line --



char character = file.toString().charAt(counter);


file is a reference to a FileReader that does not implement toString() .. it calls Object.toString() which prints a reference around 25 characters long. Thats why your exception says OutofBoundsException at the 26th character.



To read the file correctly, you should wrap your filereader with a bufferedreader and then put each readline into a stringbuffer.



FileReader fr = new FileReader(filename);

BufferedReader br = new BufferedReader(fr);
StringBuilder sb = new StringBuilder();
String s;
while((s = br.readLine()) != null) {
sb.append(s);
}


// Now use sb.toString() instead of file.toString()


No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...