SEARCHING FOR INFORMATION ON THE
WORLD WIDE WEB

or

Has an infinitude of monkeys created <www.king_lear.com> ?

Roy F. Sullivan, Ph.D.

This article is based on a presentation at the American Academy of Audiology convention, Los Angeles, April 3, 1998. It has been published in
HEARING JOURNAL; Volume 51, No. 6, June, 1998, pp. 34-39.

Author's note: Since the time of publication - Internet years being like dog years - active link URLs may have changed or disappeared entirely. Please notify me of dead or inaccurate links. r@rcsullivan.com:

CONTENTS






INTRODUCTION

Amplifying a metaphor ascribed to 19th century philosopher/scientist Thomas Huxley in support of Darwin's conclusions on evolution, astronomer Sir Arthur Eddington proposed the often paraphrased "Infinite Monkey Theorem" in 1931. Viz: "If an army of monkeys were strumming on typewriters, they might write all the books in the British Museum". The British Museum Reading Room closed in 1997. Its contents were integrated into the British Library <http://portico.bl.uk/>. which now contains 150 million items. That catalog of titles-only is now accessible, in growing part, via the Internet.

Estimates suggest the worldwide, direct content availability of more than 150 million documents or pages on more than 650,000 Internet sites. An information server mirroring the entire Internet would require a hard disk capacity on the order of 6 Terabytes (6,000,000,000,000)! Unlike the British Library, the Internet contains material which has not been subjected to any manner of pre-publication review or authentication. In fact, some of it might as well have been produced by Huxley's typing monkeys.

Given the anarchic nature of the Internet, the acquisition, cataloging and authentication of materials is a task which defies description. For example, some 1995 research showed that the average time a page of text remained unchanged was only 75 days with a substantial proportion changing every ten days or less. Despite the abundance of chaff, there are informational kernels of wheat which can be identified and examined by any hearing health professional with Internet access, given the proper tools employed in an optimized manner. These tools (Table 1) are known generically as Internet Search Directories or Catalogs and Search Engines. The Internet references cited in this article serve as categorical examples of search tools rather than a comprehensive listing.

TABLE 1
INTERNET SEARCH TOOLS

  • SEARCH DIRECTORIES

-ENCYCLOPEDIC

-SPECIALIZED

  • SEARCH ENGINES

-OPEN SET

-META-SEARCH

-LIMITED SET

Protocol
Medium
Database
Single site
Single PC






SEARCH DIRECTORIES

Internet Search directories differ from Search Engines in that they are topically arranged, hierarchical listings of URLs (Uniform Resource Locators) which have been subjected to some form of hands-on identification for relevance. This does not imply any authentication for accuracy. Entries are acquired by directory staff using search engines and by URL submissions. The user selects a general topic and then refines the quest by drilling down until a desired level of specificity is achieved. Often a directory site search engine is available to support open text requests for information from within the directory database.

Encyclopedic Search Directories

Encyclopedic directories are eclectic with a wide variety of topics and sub-topics. The highest repeat usage rate (79%) among ALL Internet search services belongs to the search directory YAHOO <www.yahoo.com> YAHOO is allegedly acronymic for "Yet Another Hierarchically Organized Oracle") which has 14 major categories and as many as five or more layers of sub-categories. If you use the on-site search form for a specific query, YAHOO will first scour its own site for cached references and then give you the option to expand the inquiry using the AltaVista (cited below) search engine with a custom facade. Depth and breadth of categorical information on YAHOO is generally rated as superior to other search directories. It contains more than 725,000 cataloged sites.

Less comprehensive but still serviceable search directories include Magellan <www.mckinley.com> which reviews and rates sites. A2Z <www.a2z.lycos.com> is constructed using the Lycos search engine spider to seek out the most frequently linked-to sites on the 'Net. Point.Com's Best of the Web <http://point.lycos.com/categories/> presents a searchable database of the top 5% of annotated websites editorially reviewed, annotated, rated on content, design and overall.

While these directories also have search engine capabilities, the results are often inadequate. For example, a Magellan search of the words: video otoscopy allegedly produced 55,834 results. Unfortunately, the primary reference Audiology Forum: Video Otoscopy <www.li.net/~sullivan/ears.htm> did not appear in the first 100 listings, the highest relevance among which was rated at only 53%. The seventh ranked relevant listing was entitled "Live nude girls for video conferencing and teleconferencing"! A search of "video otoscopy" (quotation marks concatenating the two-word phrase) produced only 10 references of which the primary site was ranked seventh with a relevance-to-search rating of 47%.

Specialized Search Directories

Specialized directories are limited in scope and focus to selected topics, media or database. For example, one of the most comprehensive directory resources for medical and hearing health professionals is MedMark <www.medmark.org>. Thirty two medical specialty areas are listed as well as links to 24 other related medical directory resources. Selecting "Otorhinolaryngology" <www.medmark.org/orl/> unveils 14 subcategories with hundreds of listings. Headings include Associations/Societies, Centers/Institutes/Labs, Departments/Divisions, Education/Training, for Consumers, General, Guides/Guidelines, Hospitals/Clinics, Images/Atlases, Information Sources, Journals/News/Publications, Lists of Resources, Other Organizations and Programs/Projects. Health A to Z <www.healthatoz.com> is a comprehensive consumer-oriented directory of rated sites in the area of health and medicine.

Specific to hearing health professions, the most comprehensive topical directory is maintained by Internet pioneer Judy Kuster at Mankato State University, MN. Initiated in 1994, 'Net Connections for Communication Disorders and Sciences <www.mankato.msus.edu/dept/comdis/kuster2/welcome.html> has become an Internet benchmark for thoroughness and dedication in taxonomizing Internet references to communication disorders. It is exceptional in the areas of hearing and deafness.

 

SEARCH ENGINES

Search engines differ from directories in that they use mechanisms called "spiders" or "robots" to continuously scour the Internet for textual content with varying degrees of thoroughness. Some will access only page titles, some will locate keywords, others will catalog the entire website textual contents. One enters a keyword or words into the engine's search box. The engine then retrieves any and all references to those morphemic identifiers found in its cache and then lists the corresponding URLs. Search results may be displayed by degree of coincidence with the search criteria or relevance. Some sites provide relevancy ratings. Some provide a miniscule text sample from each site. Others provide summaries if available from the selected site.

In informational searches, often one encounters an embarrassment of riches combined with a largesse of fool's gold! Major search engine sites are now devoting significant developmental energies to strategies for locating and sampling items of relevance and demand rather than cataloging the universe of content. Consequently, arguments can be made both for- and against sampling the same query on different major search engines. Boolean Operators (Appendix) can be used to hone the search.

Open Set Search Engines

In terms of sheer size and power, AltaVista <www.altavista.digital.com>, HotBot<www.hotbot.com> and Excite<www.excite.com> can be considered large search engines, cataloging 55-110 MILLION pages and "crawling" 3 - 10 million pages per day. Medium search sites, listing between 30 and 50 million entries, include Lycos<www.lycos.com>, InfoSeek <www.infoseek.com> and a relatively new engine Northern Light <www.northernlight.com>. WebCrawler<www.webcrawler.com> with 2 million entries is considered a small search engine. Among the seven, only two (AltaVista and Northern Light) do NOT present some semblance of a subject directory. In reality, the formerly exclusive functions of search engines and search directories appear to be converging over time.

META-Search Engines

"Simultaneous multiple engine searches" describes the function of META-search engines. The same query is submitted concurrently to a number of search engines. Unfortunately, this typically results in a large amount of duplicate hits denying the user access to advanced query and search-refining features offered by individual engines. Established sites in this category are MetaCrawler <www.metacrawler.com> which simultaneously feeds inquiries to AltaVista, Excite, Lycos, WebCrawler and YAHOO and SavvySearch <www.savvy.cs.colostate.edu:2000/> which accesses nineteen search resources in twenty four languages.

META-Searches produce a large amount of data which, unless saved as a file, is difficult to cross correlate and update. Programs are now available for the PC to perform a META-Search, store, update and display the results, by rank, date or other variable for off-line viewing. A convenient, freeware program for this application is Copernic 98 <www.copernic.com> sending concurrent requests to 33 leading search engines and directories.





Limited Set Search Engines

In this category, search engines are limited in scope to a specific Internet protocol, medium, topic, database, single site or single PC.





Internet Protocol-Specific

Search mechanisms described thus far retrieve material via HyperText Transfer Protocol (HTTP) . Other Internet protocols include Network News Transfer Protocol (NNTP); i.e.newsgroup / usenet postings; and File Transfer Protocol (FTP) for accessing downloadable files and programs.

A notable search engine of newsgroup or usenet postings is DejaNews <www.dejanews.com> indexes more than 50,000 news/discussion groups and archives postings over more than 2.5 years. AltaVista and InfoSeek both provide NNTP/usenet search options.

A user-friendly search mechanism for file transfer protocol (FTP) downloads is Jumbo <www.jumbo.com> with 250,000 downloadable shareware and freeware programs classified in twelve categories.

Medium-Specific

[THIS SERVICE HAS BEEN DISCONTINUED]NewsWorks <www.newsworks.com> allows a topical search of 140 newspapers. MedLine <www.ncbi.nlm.nih.gov/PubMed> provides search access to references and abstracts of more than 9 million journal citations. E-mail addresses and telephone listings can be located using Four11 <www.four11.com>.

American Directory Assistance and American Yellow Pages <www.lookupusa.com> locate private and business telephone listings. An American Yellow Pages search on "audiologist" produced no finding in Garden City, NY. A search on "audiologistS" or "aud*" produced my office name, address, telephone number, a zoomable street map (locating me three blocks east of where I actually practice) and a "good" credit rating. White page listed telephone numbers can be located via American Directory Assistance.

Database-Specific

Thomas (Jefferson) <http://thomas.loc.gov> Provides federal legislation information. Complete texts of bills and regulations in both houses of Congress can be searched with current status. Federal regulations through 1973 are indexed and available. The Congressional Record may be accessed as well as Congressional committee information.

A number of states provide similar legislative search capability including Official California Legislative Information <www.leginfo.ca.gov> and New York State Legislature <http://unix2.nysed.gov/ils/legislature/legis.html>. Large, open set search engines can be used to determine the availability of legislative status information in other states.

While clearly an on-line superstore, Barnes and Noble <http://barnesandnoble.com>, provides subject, author and title search capability on millions of titles. An on-site search of "audiology" produced 250 available titles, some at discount. Fewer titles, often at lower prices, were found at Amazon.com <http://amazon.com> and www.books.com <www.books.com>. The latter, a fee-for-membership site, provides price comparisons with other on-line bookstores.

Site-Specific

Search engines are also applicable to the contextually disparate contents of single websites. Examples of single site search engines can be found at the American Academy of Audiology <www.audiology.org/search-aaa/> and at the American Speech-Language Hearing Association <www.asha.org/search.htm>.

PC-Specific

Search engines are now available to search content, documents and e-mail on your own PC. Notable among available freeware is [THIS VERSION HAS BEEN DISCONTINUED ON THE ALTAVISTA SITE. A NEW VERSION IS ANTICIPATED IN EARLY JULY, 1998. THE LINK WILL BE POSTED WHEN AVAILABLE] AltaVista Personal Search <www.altavista.digital.com/av/content/searchpx.htm>.

 

HELPFUL TIPS FOR EFFECTIVE SEARCHING

  • Use search directories if you want to find information on a general topic. Search engines are best for gathering a universe of information on a highly circumscribed topic.
  • Specialized directories with smaller, subject-oriented databases, such as Judy Kusters' website (cited above), tend to produce more valid, up-to-date samples of information.
  • Make sure your spelling is correct for search engines. An AltaVista search on "cerumin" will produce 12 references, 10 in Spanish or Portuguese and two referring to "cerumin-" as the root for terms such as "ceruminectomy" and "ceruminosis". A search on "cerumen" produces 770 references. You can monitor on-line search term requests at the Magellan "Search Voyeur" <voyeur.mckinley.com/cgi-bin/voyeur.cgi>. You will encounter spelling errors within the first two or three minutes.
  • Be as specific as possible in your search request. Refining the above AltaVista search request to "cerumen management" produced 74 targeted references.
  • Use quotation marks for phrases and names. Use capital letters to specify proper names and locations.
  • Learn BOOLEAN OPERATORS (Appendix), to refine searches as used by the different engines. Consult the engine or directory's HELP resource for details.
  • Use multiple browser windows, one for the engine and one to explore sites. In Netscape Navigator <www.netscape.com/download/> or MS Internet Explorer (IE) <www.microsoft.com/ie/download/>, select "New Window" or "New Browser" from the "file" menu, respectively. IE version 4.0 presents a "Search" button which places various search engines on the left side of the screen and results on the right.
  • Don't look beyond the first page or two of results. Search again with different keywords.
  • Bookmark sites that look promising.
  • Print out pages on-line for reference or reading when you are not at a computer. Make sure the URL prints on each page for reacquisition.
  • If available as a feature on your search directory or engine, request notification of changes to sites of interest.





RECOMMENDED CONTEMPORARY REFERENCES ON
SEARCHING THE WORLD WIDE WEB

Text:

Glossbrenner, A. and E., Search Engines for the World Wide Web, Peachpit Press, Berkeley, CA, 1998.

Internet:

Sullivan, D., Search Engine Watch; <www.searchenginewatch.com>; 1998.

 

 

 





APPENDIX
  BOOLEAN AND OTHER SEARCH OPERATORS

OPERATOR

EXAMPLE

APPLICATION

AND

ear AND infection

Requires both terms in the document, not necessarily adjacent. Boolean operators must be all in capital letters.

+

+ear +infection

Same as above

"..."

"ear infection"

Requires adjacency of terms in phrase

NOT

+"hearing aid battery" +zinc NOT mercury

Will require zinc and exclude documents mentioning mercury hearing aid batteries

-

+"hearing aid battery" +zinc -mercury

Same as above

OR

sensorineural OR "sensory neural"

Acquires documents with EITHER or BOTH terms

[no operator]

recruitment "loudness growth"

Same as OR, above

NEAR

Recruitment NEAR compression

Acquires documents where (in AltaVista) one term is within 10 words of the other

[Nesting]

(analog OR digital) AND programmable AND "hearing aid"

Without parentheses, the search would be:
Analog OR (digital AND programmable AND "hearing aid"

[Wildcard]

audiolog*

Acquires documents with audiology, audiologist, audiological

[Stopwords]

Portland NEAR "OR" AND Portland NEAR "ME"

Without quotes, engines may ignore OR as a stopword or consider it an operator.

[Case sensitivity]

AuD OR Au.D.

Forces exact match, ignores auD, AUD, aud, AU.D., etc.





Roy F. Sullivan, Ph.D., FAAA is in audiology private practice at Sullivan and Sullivan, Inc., Garden City, NY. Dr. Sullivan is former website editor of the American Academy of Audiology <www.audiology.org> and author of the teaching website Audiology Forum: Video Otoscopy <www.rcsullivan.com>. Correspondence to Dr. Sullivan at Sullivan and Sullivan, Inc., 50 Willow Street, Garden City, NY 11530, e-mail to r@rcsullivan.com

 

 

 




TOP