Mon 16 Mar 2009
Vertical Search Engine Template - Second Update and Index
Posted by admin under Search
Having completed the second crawl it is now the task to updatedb, invertlinks and index the new pages which are stored in the new segment “20090317171019″. We will run these commands, one after the other -
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes2 /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
The only difference from the first time around is that in the last command above - the index command - the index is created in the “indexes2″ directory. We already have an index of the first crawl in “indexes”. Each index is only of its own segment - so far. We could make an index of both segments if we wanted to by specifying more than one segment to index. Note that this is not the same as merging indexes although the end result can be the same as merging indexes it could also be different.
For example, we could use the command -
/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes3 /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601 /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317091415
The previous command would make an index of two segments (20090317080601, 20090317091415) and store the index in “indexes4″. The usual way would be to use the wildcard “*” in the command -
/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes3 /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/*
That command would make an index in the “indexes4″ directory and it would be an index of all the segments (*) in the “segments” directory. Next post we’ll do the third (and final for our purposes now) crawl, index, etc.
No Responses to “ Vertical Search Engine Template - Second Update and Index ”
Comments:
Leave a Reply
You must be logged in to post a comment.