Archive for March, 2009

Monday, March 23rd, 2009

Shell Scripts - CRON

We are using Linux, specifically a Ubuntu Distribution. We intend to use a program called “cron” to execute commands that are in a script file. There exists a file known as a “crontab” which holds commands to execute at certain times. Every minute the “crontab” is checked to see if anything is supposed to happen.
Since [...]

No Comments » - Posted in Search by admin

Sunday, March 22nd, 2009

Vertical Search Engine Template - Command Summary

To sum up the commands that we’ve used to go three iterations of crawling -

No Comments » - Posted in Search by admin

Wednesday, March 18th, 2009

Vertical Search Engine Template - Third Crawl

The second crawl has been done and indexed. For this example we are going to do a final crawl. There is no real differences that weren’t noted previously so we are just going to do this third crawl and index all in this post. We will now run the following commands -

No Comments » - Posted in Search by admin

Monday, March 16th, 2009

Vertical Search Engine Template - Second Update and Index

Having completed the second crawl it is now the task to updatedb, invertlinks and index the new pages which are stored in the new segment “20090317171019″. We will run these commands, one after the other -

No Comments » - Posted in Search by admin

Sunday, March 15th, 2009

Vertical Search Engine Template - Second Fetch

We have extracted the links from the pages fetched in the first crawl and are now ready to do the second crawl. So far we have completed these commands -

No Comments » - Posted in Search by admin

Friday, March 13th, 2009

Vertical Search Engine Template - Extract Links

So far we have a working index of a database of three pages. We need to extract the links from those pages and then fetch the new pages. We do that with the “generate” command that we used before. In this case though we will add a “limiter” to limit the number of links that [...]

No Comments » - Posted in Search by admin

Thursday, March 12th, 2009

Vertical Search Engine Template - Index

The next step is to make an index. We have already done 5 steps -

No Comments » - Posted in Search by admin

Tuesday, March 10th, 2009

Vertical Search Engine Template - Invert Links

At this point we have injected the initial urls to crawl, generated the fetchlist, did the crawl and updated the database. The four commands that were used are -

No Comments » - Posted in Search by admin

Sunday, March 8th, 2009

Vertical Search Engine Template - Update DataBase

We have completed the first crawl by using the following three commands -

No Comments » - Posted in Search by admin

Friday, March 6th, 2009

Vertical Search Engine Template - Fetch

We have used these commands to “inject” and “generate” -
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
Next we will do the fetch ie make a crawl. This is the command that we will use -
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601

No Comments » - Posted in Search by admin

Wednesday, March 4th, 2009

Vertical Search Engine Template - Generate

Last post we created a file with some urls in it and injected them into a “crawldb” directory within a “crawl” within a “nanaimo” directory within a “nutchcrawls” directory within our home directory. The “crawldb” directory was created by the inject command that we used.
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
Next task is to generate the other resources [...]

No Comments » - Posted in Search by admin

Monday, March 2nd, 2009

Vertical Search Engine Template - Inject URLs

In the previous post we outlined the way that we are forming the commands and where we are creating some directories. We will be doing three search indexes and merging them - the main subject is “nanaimo”, with tastes of “diving” and “music”.

No Comments » - Posted in Search by admin