Java Answers

Question Asked By: Nicole Hughes on Mar 17 In Java Category.

problem in lucene

Question Answered By: Heidi Larson on Mar 17

The .prx file contains the lists of positions that each term occurs at within documents.

ProxFile (.prx) --> <TermPositions>TermCount

TermPositions --> <Positions>DocFreq

Positions --> <PositionDelta>Freq

PositionDelta --> VInt

TermPositions are ordered by term (the term is implicit, from the .tis file).

Positions entries are ordered by increasing document number (the document number is implicit from the .frq file).

PositionDelta is the difference between the position of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document).

For example, the TermPositions for a term which occurs as the fourth term in one document, and as the fifth and ninth term in a subsequent document, would be the following sequence of VInts:

4, 5, 4

If you want to implement a search engine, I propose you to use nutch instead of lucene. It simulates google using lucene. If your purpose is text summarization, MEAD is best solution.

View Complete Question Thread

Didn't find what you were looking for? Find more on problem in lucene Or get search suggestion and latest updates.

Tagged:problem in lucene

RSS Feeds:	Articles \| Forum \| New Users \| Activities \| Interview FAQ \| Poll \| Hotlinks
Social Networking:	Hall of Fame \| Facebook \| Twitter \| LinkedIn
Terms:	Terms of Use \| Privacy Policy \| Contact us