Well, the process I would take is to first read the whole file into a
bufferedreader. Then go through line by line. On each line, I would use
apache's regexp package and the RE.matchAt method to find words using regular
expressions (you can match on word boundaries). I'd add each word to an
arraylist. When I finished processing the file, I would write all my words to
the new file and output the size of the arraylist.