Logo 
Search:

Java Answers

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds
  Question Asked By: Nicole Hughes   on Mar 17 In Java Category.

  
Question Answered By: Heidi Larson   on Mar 17

The .prx file  contains the lists of positions that each term occurs at within documents.

ProxFile (.prx) --> <TermPositions>TermCount

TermPositions --> <Positions>DocFreq

Positions --> <PositionDelta>Freq

PositionDelta --> VInt

TermPositions are ordered by term (the term is implicit, from the .tis file).

Positions entries are ordered by increasing document number (the document number is implicit from the .frq file).

PositionDelta is the difference between the position  of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document).

For example, the TermPositions for a term which occurs as the fourth term in one document, and as the fifth and ninth term in a subsequent document, would be the following sequence of VInts:

4, 5, 4

If you want to implement  a search  engine, I propose you to use nutch instead of lucene. It simulates google using lucene. If your purpose is text summarization, MEAD is best solution.

Share: 

 
 
Didn't find what you were looking for? Find more on problem in lucene Or get search suggestion and latest updates.


Tagged: