Tue 10 Mar 2009
Vertical Search Engine Template - Invert Links
Posted by admin under Search
At this point we have injected the initial urls to crawl, generated the fetchlist, did the crawl and updated the database. The four commands that were used are -
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
Next we have to “invert the links”. The command is -
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
That give the following output -
LinkDb: starting
LinkDb: linkdb: /home/ronpaul/nutchcrawls/nanaimo/crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: /home/ronpaul/nutchcrawls/nanaimo/crawl/segments/20090317091415
LinkDb: done
The next thing to do is to index.
No Responses to “ Vertical Search Engine Template - Invert Links ”
Comments:
Leave a Reply
You must be logged in to post a comment.