<br />
<b>Warning</b>:  include(/home/associated1/ASSOCIATEDGEOGENERICS.COM/wp-supercache.php) [<a href='function.include'>function.include</a>]: failed to open stream: No such file or directory in <b>/home/associated1/ASSOCIATEDGEOGENERICS.COM/wp-config.php</b> on line <b>31</b><br />
<br />
<b>Warning</b>:  include() [<a href='function.include'>function.include</a>]: Failed opening '/home/associated1/ASSOCIATEDGEOGENERICS.COM/wp-supercache.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in <b>/home/associated1/ASSOCIATEDGEOGENERICS.COM/wp-config.php</b> on line <b>31</b><br />
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Associated Geogenerics Dot Com</title>
	<atom:link href="http://associatedgeogenerics.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://associatedgeogenerics.com</link>
	<description></description>
	<pubDate>Fri, 06 Apr 2012 23:00:26 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>Vertical Search Engine Template - Extract Links</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-extract-links/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-extract-links/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 07:00:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=158</guid>
		<description><![CDATA[So far we have a working index of a database of three pages. We need to extract the links from those pages and then fetch the new pages. We do that with the &#8220;generate&#8221; command that we used before. In this case though we will add a &#8220;limiter&#8221; to limit the number of links that [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-extract-links/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Fetch</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-fetch/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-fetch/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 07:00:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=143</guid>
		<description><![CDATA[We have used these commands to &#8220;inject&#8221; and  &#8220;generate&#8221; - 
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments

Next we will do the fetch ie make a crawl. This is the command that we will use - 
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601



Remember that the &#8220;20090317080601&#8243; refers to a directory named based on the time at which it was [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-fetch/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Generate</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-generate/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-generate/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 07:00:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=140</guid>
		<description><![CDATA[Last post we created a file with some urls in it and injected them into a &#8220;crawldb&#8221; directory within a &#8220;crawl&#8221; within a &#8220;nanaimo&#8221; directory within a &#8220;nutchcrawls&#8221; directory within our home directory. The &#8220;crawldb&#8221; directory was created by the inject command that we used.
/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt

Next task is to generate the other resources [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-generate/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Invert Links</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-invert-links/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-invert-links/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 07:00:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=153</guid>
		<description><![CDATA[At this point we have injected the initial urls to crawl, generated the fetchlist, did the crawl and updated the database. The four commands that were used are -


/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
 /home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
 /home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601 
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601


Next we have to &#8220;invert the links&#8221;. The command is -
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-invert-links/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Intelligent Link Generation - Initial Thoughts</title>
		<link>http://associatedgeogenerics.com/intelligent-link-generation-thoughts/</link>
		<comments>http://associatedgeogenerics.com/intelligent-link-generation-thoughts/#comments</comments>
		<pubDate>Sat, 04 Apr 2009 06:36:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=203</guid>
		<description><![CDATA[After a fetch or an index, not sure exactly where or when as I&#8217;m writing but at some point(_s) we&#8217;ll want to rank the links to follow, keep the ones that we deem acceptable and either discard the rest or store them somewhere with a &#8220;do not index or follow&#8221; instruction.
Then after we have Nutch [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/intelligent-link-generation-thoughts/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Shell Scripts - CRON</title>
		<link>http://associatedgeogenerics.com/shell-scripts-cron/</link>
		<comments>http://associatedgeogenerics.com/shell-scripts-cron/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 19:07:53 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=197</guid>
		<description><![CDATA[We are using Linux, specifically a Ubuntu Distribution. We intend to use a program called &#8220;cron&#8221; to execute commands that are in a script file. There exists a file known as a &#8220;crontab&#8221; which holds commands to execute at certain times. Every minute the &#8220;crontab&#8221; is checked to see if anything is supposed to happen.
Since [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/shell-scripts-cron/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Command Summary</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-command-summary/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-command-summary/#comments</comments>
		<pubDate>Sun, 22 Mar 2009 18:45:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=166</guid>
		<description><![CDATA[To sum up the commands that we&#8217;ve used to go three iterations of crawling -


/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601 
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments -topN 1000 
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments -topN 1000 
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317175829
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317175829
/home/greg/nutch/bin/nutch [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-command-summary/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Third Crawl</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-third-crawl/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-third-crawl/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 17:57:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=165</guid>
		<description><![CDATA[The second crawl has been done and indexed. For this example we are going to do a final crawl. There is no real differences that weren&#8217;t noted previously so we are just going to do this third crawl and index all in this post. We will now run the following commands -


/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-third-crawl/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Second Update and Index</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-second-update-and-index/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-second-update-and-index/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 17:30:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=162</guid>
		<description><![CDATA[Having completed the second crawl it is now the task to updatedb, invertlinks and index the new pages which are stored in the new segment &#8220;20090317171019&#8243;. We will run these commands, one after the other -


/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019
/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes2 /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317171019

The only difference from the first time around is [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-second-update-and-index/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Vertical Search Engine Template - Second Fetch</title>
		<link>http://associatedgeogenerics.com/vertical-search-engine-template-second-fetch/</link>
		<comments>http://associatedgeogenerics.com/vertical-search-engine-template-second-fetch/#comments</comments>
		<pubDate>Sun, 15 Mar 2009 17:26:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://associatedgeogenerics.com/?p=160</guid>
		<description><![CDATA[We have extracted the links from the pages fetched in the first crawl and are now ready to do the second crawl. So far we have completed these commands -


/home/greg/nutch/bin/nutch inject /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/search_urls/nanaimo/initial.txt
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments
/home/greg/nutch/bin/nutch fetch /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch updatedb /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch invertlinks /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601 
/home/greg/nutch/bin/nutch index /home/greg/nutchcrawls/nanaimo/crawl/indexes /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/linkdb /home/greg/nutchcrawls/nanaimo/crawl/segments/20090317080601
/home/greg/nutch/bin/nutch generate /home/greg/nutchcrawls/nanaimo/crawl/crawldb /home/greg/nutchcrawls/nanaimo/crawl/segments -topN 1000

Now [...]]]></description>
		<wfw:commentRss>http://associatedgeogenerics.com/vertical-search-engine-template-second-fetch/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

