Sun 8 Mar 2009
Vertical Search Engine Template - Update DataBase
Posted by admin under Search
We have completed the first crawl by using the following three commands -
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
Next we need to update the Nutch database by running the “updatedb” command.
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
The first part will instruct Nutch to run the “updatedb” command. The second part tells Nutch which directory(db) to update and the last part tells Nutch where the directory that holds the results of the crawl - the “fetched” pages, a “segment” - are.
Running the above command produces output -
CrawlDb update: starting
CrawlDb update: db: /home/greg/nutchcrawls/nanaimo/crawl/crawldb
CrawlDb update: segments: [/home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
CrawlDb update: Merging segment data into db.
CrawlDb update: done
The Nutch database associated with the crawl is now updated. Next we will “invert the links”.
No Responses to “ Vertical Search Engine Template - Update DataBase ”
Comments:
Leave a Reply
You must be logged in to post a comment.