By the way I had little time to do something for stemming at that time, so the main change was replacing the english stop words set with persian stop words, you can find and replace this set in StandardAnalyzer class, but its just a hasty solution (changing the code of lucene itself). You could find better way to do this ( e.g. by passing stop words as parameter to StandardAnalyzer constructor).