The next step is to make an index. We have already done 5 steps -

/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt

/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments

/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601

/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601

/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601

Next we need to do the indexing which is done by issuing this command -

/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601

The result of that command is -

Indexer: starting
Indexer: done

The indexing is now finished and so we could now search the pages that we fetched. Of course, there were only three pages but still, the index has been built and the pages’ content has been stored. We will need to have more pages than that - we will get them from analyzing the pages that we did get and extracting the links from those pages and then fetch them and repeat the process(es) as often as required.

We will be making three of these indexes - “nanaimo”, “music” and “diving” and then blending the indexes (merge) to create what we are calling a search engine with “flavor” - in this case a Nanaimo-centric search engine with some music flavor and some diving flavor to it - a taste of “diving” and a taste of “music”.

The next step will be to make a new fetch-list for the crawler by extracting the links from the pages that we already have stored (in the segment 20090317080601 in this case).