Roomba Floor Vac Robot US$199.95 Would
you like someone else to sweep the kitchen floor? Let the Roomba™ Floor Vac handle
that! Like a pool cleaner, this innovative robot moves automatically to vacuum
and sweep up dirt, dust, spilled cereal and food crumbs from short-pile carpet,
hardwood floors and kitchen tile. rechargeable Roomba goes to work — navigating around obstacles, protected by its non-marring bumper and guided by infrared sensors. A side brush thoroughly cleans next to walls and hard-to-reach places. An included device creates an invisible infrared "wall" to keep Roomba from crossing open areas as wide as 20 feet! Extra infrared "virtual wall" unit is available (IR107). One charge of the included NiMH battery gives you enough power to clean up to three medium-size rooms. Extra battery available (IR106). Measures 13 3/4" diameter x 3 3/4" high and weighs 7 1/2 lbs. 90-day warranty. |
All Robots This page is a test page, trying to attract robots. Please ignore everything below:
www.internetkiosks.com.au if you are looking for Robot kits then Click Here
This is the site map of Abacus Internet Kiosks Pty Ltd
Use this generic robot to set rules for all web bots.
Ahoy! The Homepage Finder
ahoy
maintenance
Ahoy! is an ongoing research project at the University of Washington for finding personal Homepages.
www.cs.washington.edu/research/ahoy/doc/home.html
Alkaline
AlkalineBOT
indexing
Unix/NT internet/intranet search engine.
http://www.vestris.com/alkaline ananzi
EMC Spider
indexing
Arachnophilia
Arachnophilia
The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation.
ArchitextSpider
ArchitextSpider
indexing, statistics
Its purpose is to generate a Resource Discovery database, and to generate statistics. The ArchitextSpider collects information for the Excite and WebCrawler search engines.
Architext Software
ASpider (Associative Spider)
ASpider/0.09
indexing
ASpider is a CGI script that searches the web for keywords given by the user through a form.
AURESYS
AURESYS/1.0
indexing,statistics
The AURESYS is used to build a personnal database for somebody who search information. The database is structured to be analysed. AURESYS can found new server by IP incremental. It generate statistics...
http://crrm.univ-mrs.fr BackRub
BackRub/*.*
indexing, statistics
Big Brother
Big Brother
maintenance
Macintosh-hosted link validation tool.
Francois Pottier
BlackWidow
BlackWidow
indexing, statistics
Started as a research project and now is used to find links for a random link generator. Also is used to research the growth of specific sites.
bright.net caching robot
Die Blinde Kuh
caching
BSpider
bspider
indexing
BSpider is crawling inside of Japanese domain for indexing.
not yet CACTVS Chemistry Spider
CACTVS Chemistry Spider
indexing.
Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.).
Checkbot
Checkbot/x.xx LWP/5.x
maintenance
Checkbot checks links in a given set of pages on one or more servers. It reports links which returned an error code.
CMC/0.01
CMC/0.01
maintenance
This CMC/0.01 robot collects the information of the page that was registered to the music specialty searching service.
http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot Combine System
combine
indexing
An open, distributed, and efficient harvester.
http://www.ub2.lu.se/~tsao/combine.ps ComputingSite Robi/1.0
robi
indexing,maintenance
Intelligent agent used to build the ComputingSite Search Directory.
Tecor Communications S.L.
http://www.computingsite.com/robi/ Conceptbot
conceptbot
indexing
The Conceptbot spider is used to research concept-based search indexing techniques. It uses a breadth first seach to spread out the number of hits on a single site over time. The spider runs at irregular intervals and is still under construction.
http://www.aptltd.com/~sifry/conceptbot CS-HKUST WISE: WWW Index and Search Engine
CS-HKUST-IndexServer/1.0
Its purpose is to generate a Resource Discovery database, and validate HTML. Part of an on-going research project on Internet Resource Discovery at Department of Computer Science, Hong Kong University of Science and Technology (CS-HKUST).
CyberSpyder Link Test
cyberspyder
link validation, some html validation
CyberSpyder Link Test is intended to be used as a site management tool to validate that HTTP links on a page are functional and to produce various analysis reports to assist in managing a site.
http://www.cyberspyder.com/cslnkts1.html DeWeb(c) Katalog/Index
Deweb/1.01
indexing, mirroring, statistics
Its purpose is to generate a Resource Discovery database, perform mirroring, and generate statistics. Uses combination of Informix(tm) Database and WN 1.11 serversoftware for indexing/ressource discovery, fulltext search, text excerpts.
Die Blinde Kuh
Die Blinde Kuh
indexing
The robot is use for indixing and proofing the registered urls in the german language search-engine for kids. Its a none-comercial one-woman-project of Birgit Bachmann living in Hamburg, Germany.
http://www.blinde-kuh.de/robot.html (german language) DienstSpider
dienstspider/1.0
indexing
Indexing and searching the NCSTRL(Networked Computer Science Technical Report Library) and ERCIM Collection.
Antonis Sidiropoulos
Digimarc MarcSpider
Digimarc WebReader/1.2
maintenance
Examines image files for watermarks. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific URLs of interest to us. If an URL is to an image, we may read the image, but we do not crawl to any other URLs. If an URL is to a page of interest (ususally due to CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages.
Digimarc Corporation
http://www.digimarc.com/prod_fam.html Digimarc Marcspider/CGI
Digimarc CGIReader/1.0
maintenance
Similar to Digimarc Marcspider, Marcspider/CGI examines image files for watermarks but more focused on CGI Urls. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific CGI URLs of interest to us. If an URL is to a page of interest (via CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages.
Digimarc Corporation
http://www.digimarc.com/prod_fam.html DNAbot
DNAbot/1.0
indexing
A search robot in 100 java, with its own built-in database engine and web server . Currently in Japanese.
http://xx.dnainc.co.jp/dnabot/ DownLoad Express
downloadexpress
graphic download
Automatically downloads graphics from the web.
DownLoad Express Inc
http://www.jacksonville.net/~dlxpress DragonBot
DragonBot
indexing
Collects web pages related to East Asia.
EIT Link Verifier Robot
EIT-Link-Verifier-Robot/0.2
maintenance
Combination of an HTML form and a CGI script that verifies links from a given starting point (with some controls to prevent it going off-site or limitless).
Emacs-w3 Search Engine
Emacs-w3/v[0-9\.]+
indexing
Its purpose is to generate a Resource Discovery database This code has not been looked at in a while, but will be spruced up for the Emacs-w3 2.2.0 release sometime this month. It will honor the /robots.txt file at that time.
Esther
esther
indexing
This crawler is used to build the search database at.
http://search.falconsoft.com/ Felix IDE
FELIX IDE
indexing, statistics
Felix IDE is a retail personal search spider sold by The Pentone Group, Inc.
The Pentone Group, Inc.
http://www.pentone.com FetchRover
ESI
maintenance, statistics
FetchRover fetches Web Pages. It is an automated page-fetching engine. FetchRover can be used stand-alone or as the front-end to a full-featured Spider. Its database can use any ODBC compliant database server, including Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc.
http://www.engsoftware.com/spiders/ fido
fido
indexing
Fido is used to gather documents for the search engine provided in the PlanetSearch service, which is operated by the Philips Multimedia Center. The robots runs on an ongoing basis.
http://www.planetsearch.com/info/fido.html Fish search
Fish-Search-Robot
indexing
Its purpose is to discover resources on the fly a version exists that is integrated into the Tübingen Mosaic 2.4.2 browser (also written in C).
Fouineur
fouineur
indexing, statistics
This robot build automaticaly a database that is used by our own search engine. This robot auto-detect the language (french, english & spanish) used in the HTML page. Each database record generated by this robot.
http://fouineur.9bit.qc.ca/informations.html Freecrawl
Freecrawl
indexing
The Freecrawl robot is used to build a database for the EuroSeek service.
Jesper Ekhall
FunnelWeb
FunnelWeb-1.0
indexing, statisitics
Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development.
GCreep
gcreep
indexing
Indexing robot to learn SQL.
Instrumentpolen AB
http://www.instrumentpolen.se/gcreep/index.html GetBot
???
maintenance
GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96.
GetterroboPlus Puu
Getterrobo-Plus
Purpose of the robot. One or more of:
Puu robot is used to gater data from registered site in Search Engin "straight FLASH!!" for building anouncement page of state of renewal of registered site in "straight FLASH!!". Robot runs everyday.
marunaka
http://marunaka.homing.net/straight/getter/ GetURL
GetURL.rexx v1.05
maintenance, mirroring
Its purpose is to validate links, perform mirroring, and copy document trees. Designed as a tool for retrieving web pages in batch mode without the encumbrance of a browser. Can be used to describe a set of pages to fetch, and to maintain an archive or mirror. Is not run by a central site and accessed by clients - is run by the end user or archive maintainer.
Golem
golem
maintenance
Golem generates status reports on collections of URLs supplied by clients. Designed to assist with editorial updates of Web-related sites or products.
Geoff Duncan
http://www.quibble.com/golem/ Google.Com
Googlebot
indexing statistics
GoogleBot is the web crawler and indexing agent for the new Google.Com search engine. Goggle.Com has some nice search features that give it much potential in the online search market. Google is one of the search engines that is accessed from Netscape's Netcenter portal.
Google, Inc.
http://google.com/ Gromit
Gromit
indexing
Gromit is a Targetted Web Spider that indexes legal sites contained in the AustLII legal links database.
http://www2.austlii.edu.au/~dan/gromit/ Hämähäkki
Hämähäkki
indexing
Its purpose is to generate a Resource Discovery database from the Finnish (top-level domain .fi) www servers. The resulting database is used by the search engine.
http://www.fi/www/spider.html HamBot
hambot
indexing
Two HamBot robots are used (stand alone & browser based) to aid in building the database for HamRad Search - The Search Engine for Search Engines. The robota are run intermittently and perform nearly identical functions.
http://www.hamrad.com/ havIndex
havIndex
indexing
HavIndex allows individuals to build searchable word index of (user specified) lists of URLs. havIndex does not crawl - rather it requires one or more user supplied lists of URLs to be indexed. havIndex does (optionally) save urls parsed from indexed pages.
hav.Software and Horace A. (Kicker) Vallas
http://www.hav.com/ HI (HTML Index) Search
AITCSRobot/1.1
indexing
Its purpose is to generate a Resource Discovery database. This Robot traverses the net and creates a searchable database of Web pages. It stores the title string of the HTML document and the absolute url. A search engine provides the boolean AND & OR query models with or without filtering the stop list of words. Feature is kept for the Web page owners to add the url to the searchable database.
HKU WWW Octopus
HKU WWW Robot,
indexing
HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong.
ht://Dig
htdig
indexing
http://www.htdig.org/howitworks.html HTMLgobble
HTMLgobble v2.2
mirror
A mirroring robot. Configured to stay within a directory, sleeps between requests, and the next version will use HEAD to check if the entire document needs to be retrieved.
IBM_Planetwide
IBM_Planetwide,
indexing, maintenance, mirroring
Restricted to IBM owned or related domains.
Imagelock
Mozilla 3.01 PBWF (Win95)
maintenance
IncyWincy
IncyWincy/1.0b1
Various Research projects at the University of Sunderland.
Informant
Informant
indexing
The Informant robot continually checks the Web pages that are relevant to user queries. Users are notified of any new or updated pages. The robot runs daily, but the number of hits per site per day should be quite small, and these hits should be randomly distributed over several hours. Since the robot does not actually follow links (aside from those returned from the major search engines such as Lycos), it does not fall victim to the common looping problems. The robot will support the Robot Exclusion Standard by early December, 1996.
http://informant.dartmouth.edu/about.html InfoSeek Robot 1.0
InfoSeek Robot 1.0
indexing
Its purpose is to generate a Resource Discovery database. Collects WWW pages for both InfoSeek's free WWW search and commercial search. Uses a unique proprietary algorithm to identify the most popular and interesting WWW pages. Very fast, but never has more than one request per site outstanding at any given time. Has been refined for more than a year.
Infoseek Sidewinder
Infoseek Sidewinder
indexing
Mike Agostino
InfoSpiders
InfoSpiders
search
Application of artificial life algorithm to adaptive distributed information retrieval.
Ingrid
INGRID/0.1
Indexing
Ilse c.v.
Inktomi Slurp
slurp
indexing, statistics
Indexing documents for the HotBot search engine (www.hotbot.com), collecting Web statistics.
Inktomi Corporation
http://www.inktomi.com/slurp.html Inspector Web
inspectorwww
maintentance: link validation, html validation, image size
Provide inspection reports which give advise to WWW site owners on missing links, images resize problems, syntax errors, etc
http://www.greenpac.com/inspector/ourrobot.html IntelliAgent
'IAGENT/1.0'
indexing
IntelliAgent is still in development. Indeed, it is very far from completion. I'm planning to limit the depth at which it will probe, so hopefully IAgent won't cause anyone much of a problem. At the end of its completion, I hope to publish both the raw data and original source code.
Iron33
Iron33
indexing, statistics
The robot "Iron33" is used to build the database for the WWW search engine "Verno".
Takashi Watanabe
http://verno.ueda.info.waseda.ac.jp/iron33/history.html Israeli-search
IsraeliSearch/1.0
indexing.
JCrawler
jcrawler
indexing
JCrawler is currently used to build the Vietnam topic specific WWW index for VietGATE.
Jeeves
jeeves
indexing maintenance statistics
Jeeves is basically a web-mirroring robot built as a final-year degree project. It will have many nice features and is already web-friendly. Still in development.
Jobot
Jobot/0.1alpha libwww-perl/4.0
standalone
Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot.
JoeBot
JoeBot/x.x,
JumpStation
jumpstation
indexing
Jonathon Fletcher
Katipo
Katipo/1.0
maintenance
Watches all the pages you have previously visited and tells you when they have changed.
http://www.vuw.ac.nz/~newbery/Katipo/Katipo-doc.html KDD-Explorer
KDD-Explorer
indexing
KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS.
Kazunori Matsumoto
not available Kilroy
*
indexing,statistics
Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds.
OCLC
http://purl.org/kilroy KIT-Fireball
KIT-Fireball
indexing
The Fireball robots gather web documents in German language for the database of the Fireball search service.
Gruner + Jahr Electronic Media Service GmbH
http://www.fireball.de/technik.html (in German) KO_Yappo_Robot
ko_yappo_robot
indexing
The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order.
http://yappo.com/ LabelGrabber
label-grabber
Grabs PICS labels from web pages, submits them to a label bueau
The label grabber searches for PICS labels and submits them to a label bureau.
http://www.w3.org/PICS/refcode/LabelGrabber/index.htm LinkWalker
linkwalker
maintenance, statistics
LinkWalker generates a database of links. We send reports of bad ones to webmasters.
http://www.seventwentyfour.com/tech.html Lockon
Lockon
indexing
This robot gathers only HTML document.
Seiji Sasazuka & Takahiro Ohmori
logo.gif Crawler
logo_gif_crawler
indexing
Meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected.
Lycos
Lycos/x.x
indexing
This is a research program in providing information retrieval and discovery in the WWW, using a finite memory model of the web to guide intelligent, directed searches for specific information needs.
Dr. Michael L. Mauldin
Magpie
Magpie/1.0
indexing, statistics
Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites.
MediaFox
mediafox
indexing and maintenance
The robot is used to index meta information of a specified set of documents and update a database accordingly.
Lars Eilebrecht
none MerzScope
MerzScope
WebMapping
Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web.
(Client based robot)
http://www.merzcom.com MOMspider
MOMspider/1.00 libwww-perl/0.40
maintenance, statistics
To validate links, and generate statistics. It's usually run from anywhere.
Monster
Monster/vX.X.X -$TYPE ($OSTYPE)
maintenance, mirroring
The Monster has two parts - Web searcher and Web analyzer. Searcher is intended to perform the list of WWW sites of desired domain (for example it can perform list of all WWW sites of mit.edu, com, org, etc... domain) In the User-agent field $TYPE is set to 'Mapper' for Web searcher and 'StAlone' for Web analyzer.
Motor
Motor
indexing
The Motor robot is used to build the database for the www.webindex.de search service operated by CyberCon. The robot ios under development - it runs in random intervals and visits site in a priority driven order (.de/.ch/.at first, root and robots.txt first).
Muscat Ferret
MuscatFerret
indexing
Used to build the database for the EuroFerret.
Olly Betts
Mwd.Search
MwdSearch
indexing
Robot for indexing finnish (toplevel domain .fi) webpages for search engine called Fifi. Visits sites in random order.
(none) NEC-MeshExplorer
NEC-MeshExplorer
indexing
The NEC-MeshExplorer robot is used to build database for the NETPLAZA search service operated by NEC Corporation. The robot searches URLs around sites in japan(JP domain). The robot runs every day, and visits sites in a random order.
web search service maintenance group
http://netplaza.biglobe.or.jp/keyword.html Nederland.zoek
Nederland.zoek
indexing
This robot indexes all .nl sites for the search-engine of Nederland.net.
System Operator Nederland.net
NetCarta WebMap Engine
NetCarta CyberPilot Pro
indexing, maintenance, mirroring, statistics
The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID.
NetCarta WebMap Engine
NetMechanic
WebMechanic
Link and HTML validation
NetMechanic is a link validation and HTML validation robot run using a web page interface.
Tom Dahm
http://www.netmechanic.com/faq.html NetScoop
NetScoop
indexing
The NetScoop robot is used to build the database for the NetScoop search engine.
http://www.netmechanic.com/faq.html NHSE Web Forager
NHSEWalker/3.0
indexing
To generate a Resource Discovery database.
Nomad
Nomad-V2.x
indexing
Richard Sonnen
Northern Light
gulliver
indexing
Gulliver is a robot to be used to collect web pages for indexing and subsequent searching of the index.
Mike Mulligan
http://www.nlsearch.com/ nzexplorer
explorersearch
indexing, statistics
This crawler is used to build the search database at.
Occam
Occam
indexing
The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache. Currently the only user is me.
Open Text Index Robot
Open Text Site Crawler
indexing
This robot is run by Open Text Corporation to produce the data for the Open Text Index.
http://index.opentext.net/OTI_Robot.html Orb Search
Orbsearch/1.0
indexing
Orbsearch builds the database for Orb Search Engine. It runs when requested.
http://orbsearch.home.ml.org Pack Rat
packrat or *
both maintenance and mirroring
Used for local maintenance and for gathering web pages so that local statisistical info can be used in artificial intelligence programs. Funded by NEMOnline.
Patric
patric
statistics
(contained at http://www.nwnet.net/technical/ITR/index.html ).
toney@nwnet.net
http://www.nwnet.net/technical/ITR/index.html PerlCrawler 1.0
perlcrawler
indexing
The PerlCrawler robot is designed to index and build a database of pages relating to the Perl programming language.
Matt McKenzie
http://www.xav.com/scripts/xavatoria/index.html PGP Key Agent
PGP-KA/1.2
indexing
This program search the pgp public key for the specified user.
Phantom
Duppies
indexing
Designed to allow webmasters to provide a searchable index of their own site as well as to other sites, perhaps with similar content.
Pioneer
Pioneer
indexing, statistics
Pioneer is part of an undergraduate research project.
PlumtreeWebAccessor
PlumtreeWebAccessor
indexing for the Plumtree Server
The Plumtree Web Accessor is a component that customers can add to the Plumtree Server to index documents on the World Wide Web.
http://www.plumtree.com/ Popular Iconoclast
gestaltIconoclast/1.0 libwww-FM/2.17
statistics
This guy likes statistics.
http://gestalt.sewanee.edu/ic/info.html Resume Robot
Resume Robot
indexing.
James Stakelum
Road Runner: The ImageScape Robot
roadrunner
indexing
Create Image/Text index for WWW.
LIM Group
RoadHouse Crawling System
RHCS
indexing.
Robot used tp build the database for the RoadHouse search service project operated by Perceval.
Robbie the Robot
Robbie
indexing
Used to define document collections for the DISCO system. Robbie is still under development and runs several times a day, but usually only for ten minutes or so. Sites are visited in the order in which references are found, but no host is visited more than once in any two-minute period.
Robbie the Robot
Robbie
indexing
Used to define document collections for the DISCO system. Robbie is still under development and runs several times a day, but usually only for ten minutes or so. Sites are visited in the order in which references are found, but no host is visited more than once in any two-minute period.
Robot Francoroute
Robot du CRIM 1.0a
indexing, mirroring, statistics
Part of the RISQ's Francoroute project for researching francophone. Uses the Accept-Language tag and reduces demand accordingly.
Marc-Antoine Parent
Roverbot
Roverbot
indexing
Targeted email gatherer utilizing user-defined seed points and interacting with both the webserver and MX servers of remote sites.
GlobalMedia Design (Andrew Cowan & Brian
SafetyNet Robot
SafetyNet Robot 0.1,
indexing.
Finds URLs for K-12 content management.
Scooter
Scooter
indexing
Scooter is AltaVista's prime index agent.
AltaVista
http://www.altavista.com/av/content/addurl.htm Search.Aus-AU.COM
Search-AU
- indexing: gather content for an indexing service
Search-AU is a development tool I have built to investigate the power of a search engine and web crawler to give me access to a database of web content ( html / url's ) and address's etc from which I hope to build more accurate stats about the .au zone's web content.
http://Search.Aus-AU.COM/ Senrigan
Senrigan
indexing
This robot now gets HTMLs from only jp domain.
SG-Scout
SG-Scout
indexing
Does a "server-oriented" breadth-first search in a round-robin fashion, with multiple processes.
Shai'Hulud
Shai'Hulud
mirroring
Used to build mirrors for internal use.
Dimitri Khaoustov
Simmany Robot Ver1.0
SimBot
indexing, maintenance, statistics
The Simmany Robot is used to build the Map(DB) for the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The robot runs weekly, and visits sites that have a useful korean information in a defined order.
http://simmany.hnc.net/irman1.html SiteTech-Rover
SiteTech-Rover
indexing
Originated as part of a suite of Internet Products to organize, search & navigate Intranet sites and to validate links in HTML documents.
Smart Spider
ESI
indexing
Classifies sites using a Knowledge Base. Robot collects web pages which are then parsed and feed to the Knowledge Base. The Knowledge Base classifies the sites into any of hundreds of categories.
http://www.engsoftware.com/robots.htm Snooper
snooper
Solbot
solbot
indexing
Builds data for the Kvasir search service. Only searches.
Spanner
Spanner
indexing,maintenance
Used to index/check links on an intranet.
http://www.kluge.net/NES/spanner/ SpiderBot 1.0 - P.F.C. "Recuperador p.ginas Web" de Ignacio Cruzado Nu.o (U.B.U.)
yes
indexing
Recovers Web Pages and saves them on your hard disk. Then it reindexes them.
Ignacio Cruzado Nu.o : Student of "Computer Engineering" at Burgos University(Spain)
http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/details.htm Suke
suke
indexing
This robot visits mainly sites in japan.
http://www.kuro.net/robot/index.ja.html TACH Black Widow
tach_bw
maintenance: link validation
Exhaustively recurses a single site to check for broken links.
Michael Jennings
http://theautochannel.com/~mjenn/bw-syntax.html Tarantula
yes
indexing
Tarantual gathers information for german search engine Nathanrobot-history: Started February 1997.
http://www.nathan.de/ tarspider
tarspider
mirroring
Olaf Schreck
Tcl W3 Robot
dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)
maintenance, statistics
Its purpose is to validate links, and generate statistics.
Laurent Demailly
TechBOT
TechBOT
statistics, maintenance
TechBOT is constantly upgraded. Currently he is used for Link Validation, Load Time, HTML Validation and much much more.
TechAID Internet Services
http://www.echaid.net/TechBOT/ Templeton
templeton
mirroring, mapping, automating web applications
Templeton is a very configurable robots for mirroring, mapping, and automating applications on retrieved documents.
http://www.bmtmicro.com/catalog/tton/ The Jubii Indexing Robot
JubiiRobot/version#
indexing, maintainance
Its purpose is to generate a Resource Discovery database, and validate links. Used for indexing the .dk top-level domain as well as other Danish sites for aDanish web database, as well as link validation.
The NorthStar Robot
NorthStar
indexing
Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing.
The NWI Robot
VWbot_K
discovery,statistics
A resource discovery robot, used primarily for the indexing of the Scandinavian Web.
Sigfrid Lundberg, Lund university, Sweden
http://vancouver-webpages.com/VWbot/aboutK.shtml The Peregrinator
Peregrinator-Mathematics/0.7
This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially.
Jim Richardson
The Web Moose
WebMoose
statistics, maintenance
This robot collects statistics and verifies links. It builds an graph of its visit path.
http://www.nwlink.com/~mikeblas/webmoose/ the World Wide Web Wanderer
WWWWanderer v3.0
statistics
Run initially in June 1993, its aim is to measure the growth in the web.
Matthew Gray
TITAN
TITAN/0.1
indexing
Its purpose is to generate a Resource Discovery database, and copy document trees. Our primary goal is to develop an advanced method for indexing the WWW documents. Uses libwww-perl.
Yoshihiko HAYASHI
http://isserv.tas.ntt.jp/chisho/titan-help/eng/titan-help-e.html TitIn
titin
indexing, statistics
The TitIn is used to index all titles of Web server in .hr domain.
http://www.foi.hr/~dpavlin/titin/tehnical.htm UCSD Crawl
UCSD-Crawler
indexing, statistics
Should hit ONLY within UC San Diego - trying to count servers here.
URL Check
urlck
maintenance
The robot is used to manage, maintain, and modify web sites. It builds a database detailing the site, builds HTML reports describing the site, and can be used to up-load pages to the site or to modify existing pages and URLs within the site. It can also be used to mirror whole or partial sites. It supports HTTP, File, FTP, and Mailto schemes.
http://www.cutternet.com/products/urlck.html URL Spider Pro
URL Spider Pro/1.5
indexing: gather content for an indexing service
URL Spider Pro builds Targeted Search Engines.
Infostreak Software
http://www.infostreak.com/us.htm Valkyrie
Valkyrie libwww-perl
indexing
Used to collect resources from Japanese Web sites for ODIN search engine.
http://kichijiro.c.u-tokyo.ac.jp/odin/robot.html Victoria
Victoria
maintenance
Victoria is part of a groupware produced.
Adrian Howard
vision-search
vision-search/3.0'
indexing.
Intended to be an index of computer vision pages, containing all pages within n links (for some small n) of the Vision Home Page.
Voyager
Voyager
indexing, maintenance
This robot is used to build the database for the Lisa Search service. The robot manually launch and visits sites in a random order.
Voyager Staff
VWbot
VWbot_K
indexing
Used to index BC sites for the searchBC database. Runs daily.
http://vancouver-webpages.com/VWbot/aboutK.shtml W3M2
W3M2/x.xxx
indexing, maintenance, statistics
To generate a Resource Discovery database, validate links, validate HTML, and generate statistics.
w3mir
w3mir
mirroring.
W3mir uses the If-Modified-Since HTTP header and recurses only the directory and subdirectories of it's start document. Known to work on U*ixes and Windows NT.
Web Core / Roots
root/0.1
indexing, maintenance
Parallel robot developed in Minho Univeristy in Portugal to catalog relations among URLs and to support a special navigation aid.
Jorge Portugal Andrade
WebBandit Web Spider
WebBandit/1.0
Resource Gathering / Server Benchmarking
Multithreaded, hyperlink-following, resource finding webspider.
http://pw2.netcom.com/~wooger/ WebCatcher
webcatcher
indexing
WebCatcher gathers web pages that Japanese collage students want to visit.
WebCopy
WebCopy/(version)
mirroring
Its purpose is to perform mirroring. WebCopy can retrieve files recursively using HTTP protocol.It can be used as a delayed browser or as a mirroring tool. It cannot jump from one site to another.
webfetcher
WebFetcher/0.8,
mirroring
Don't wait! OnTV's WebFetcher mirrors whole sites down to your hard disk on a TV-like schedule. Catch w3 documentation. Catch discovery.com without waiting! A fully operational web robot for NT/95 today, most UNIX soon, MAC tomorrow.
weblayers
weblayers/0.0
maintainance
Its purpose is to validate, cache and maintain links. It is designed to maintain the cache generated by the emacs emacs w3 mode (N*tscape replacement) and to support annotated documents (keep them in sync with the original document via diff/patch).
WebLinker
WebLinker/0.0 libwww-perl/0.1
maintenance
It traverses a section of web, doing URN->URL conversion. It will be used as a post-processing tool on documents created by automatic converters such as LaTeX2HTML or WebMaker. At the moment it works at full speed, but is restricted to localsites. External GETs will be added, but these will be running slowly. WebLinker is meant to be run locally, so if you see it elsewhere let the author know!.
WebQuest
webquest
indexing
WebQuest will be used to build the databases for various web search service sites which will be in service by early 1998. Until the end of Jan. 1998, WebQuest will run from time to time. Since then, it will run daily(for few hours and very slowly).
WebReaper
webreaper
indexing/offline browsing
Freeware app which downloads and saves sites locally for offline browsing.
Mark Otway
webs
webs
statistics
The webs robot is used to gather WWW servers' top pages last modified date data. Collected statistics reflects the priority of WWW server data collection for webdew indexing service. Indexing in webdew is done by manually.
Recruit Co.Ltd,
http://webdew.rnet.or.jp/service/shank/NAVI/SEARCH/info2.html#robot WebSpider
webspider
maintenance, link diagnostics
http://www.csi.uottawa.ca/~u610468 WebStolperer
WOLP
indexing
The robot gathers information about specified web-projects and generates knowledge bases in Javascript or an own format.
http://www.suchfibel.de/maschinisten/text/werkzeuge.htm (in German) WebVac
webvac/1.0
mirroring
webwalk
webwalk
indexing, maintentance, mirroring, statistics
Its purpose is to generate a Resource Discovery database, validate links, validate HTML, perform mirroring, copy document trees, and generate statistics. Webwalk is easily extensible to perform virtually any maintenance function which involves web traversal, in a way much like the '-exec' option of the find(1) command. Webwalk is usually used behind the HP firewall.
Rich Testardi
WebWalker
WebWalker
maintenance
WebWalker performs WWW traversal for individual sites and tests for the integrity of all hyperlinks to external sites.
Fah-Chun Cheong
WebWatch
WebWatch
maintainance, statistics
Its purpose is to validate HTML, and generate statistics. Check URLs modified since a given date.
Joseph Janos
Wget
wget
mirroring, maintenance
Wget is a utility for retrieving files using HTTP and FTP protocols. It works non-interactively, and can retrieve HTML pages and FTP trees recursively. It can be used for mirroring Web pages and FTP sites, or for traversing the Web gathering data. It is run by the end user or archive maintainer.
Hrvoje Niksic
WhoWhere Robot
whowhere
indexing
Gathers data for email directory from web pages.
Rupesh Kapoor
Wild Ferret Web Hopper #1, #2, #3
Hazel's Ferret Web hopper,
indexing maintenance statistics
The wild ferret web hopper's are designed as specific agents to retrieve data from all available sources on the internet. They work in an onion format hopping from spot to spot one level at a time over the internet. The information is gathered into different relational databases, known as "Hazel's Horde". The information is publicly available and will be free for the browsing at www.greenearth.com. Effective date of the data posting is to be announced.
Wired Digital
hotwired
indexing
WWWC Ver 0.2.5
WWWC
maintenance
Tomoaki Nakashima.
XGET
XGET
mirroring
Its purpose is to retrieve updated files.It is run by the end userrobot-history: 1997.
http://www2.117.ne.jp/~moremore/x68000/soft/soft.html KIT-Fireball/2.0
KIT-Fireball/2.0
Indexing
Web robot for the German search engine, FireBall.
FireBall
http://www.fireball.de/ Lycos.Com
Lycos_Spider_(T-Rex)
Indexing
This robot is used by Lycos to search the Internet for useful content and to crawl pages and sites that have been submitted to them via their Add URL page.
Lycos Corporation
http://www.lycos.com