We have used these commands to “inject” and “generate” -

/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt

/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments

Next we will do the fetch ie make a crawl. This is the command that we will use -

/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601


Remember that the “20090317080601″ refers to a directory named based on the time at which it was created. To know what to enter you’ll need to do a command like

ls /home/greg/nutchcrawls/crawls/segments

Running the fetch command returns this output -

Fetcher: starting
Fetcher: segment: /home/ronpaul/nutchcrawls/nanaimo/crawl/segments/20090317080601
Fetcher: threads: 10
fetching http://www.rdn.bc.ca/
fetching http://www2.canada.com/nanaimodailynews
fetching http://www.nanaimo.ca/
Fetcher: done

The output shows that Nutch fetched three pages, and since there were three urls in the file that was injected into the crawldb it seems reasonable. Next we’ll need to “updatedb” (update the Nutch database).