Finding Information on the
Web
1. PREFACE
Thus far in the course, we have examined
two of the most widely used information sources on the Internet: books and
periodicals. We have used online catalogs (to find books) and web
databases (to find periodical articles). In this lesson, we expand our
search for information sources to include information found in websites.
2. WEBSITES, URL’s,
AND DOMAIN NAMES
Every website (and every Web page) has a unique address known as a URL
(Uniform Resource Locator) which identifies where it is located on the
Web. For example, here is the URL for Skyline Library’s home page:
http://www.smccd.net/accounts/skylib/index.html
URLs have three
basic parts: the protocol, the server name and the resource ID. These parts
provide "clues" to where a Web page originates and who might be
responsible for the information at that page or site. Let's look at each part:
· PROTOCOL: appears at the start of the URL before the double slash and identifies the method (set of rules) by which the resource is transmitted. All Web pages use HyperText Transfer Protocol (HTTP). Thus, all Web URL's begin with http://.
·
SERVER NAME: appears between the double slash (//) and the first
single slash (/)
The server name for the Skyline Library URL is: www.smccd.net/
The server name identifies the computer on which the resource is found. (Computers that store and "serve up" Web pages are called servers.) This part of the URL commonly identifies which r organization or company is either directly responsible for the information or simply providing the computer space where the information is stored.
The server name always ends with a dot and a three-letter or two-letter extension called the domain name (sometimes called the domain type). The domain is important because it usually identifies the type of organization that created or sponsored the resource. Sometimes it indicates the country where the server is located. The most common domain names are:
.com for company or commercial sites
.org for non-profit organization sites
.edu for educational sites (most commonly four-year universities)
.gov for government sites
.net for Internet service providers or other types of networks
.mil for a military body
If the domain name is two letters, it identifies
a country, e.g. .us for the
· RESOURCE ID:
everything after the first single slash (/)
The resource ID for the Skyline Library URL is: accounts/skylib/index.html
The resource ID contains directories and subdirectories, thereby giving you the exact location of the document on the server. Following the last slash (/), you are given the file name for the specific page. The file name ends with a three or four letter designation that specifies the file type (e.g., .htm or .html for a standard Web page, .jpg or .gif for common graphic files).
3. GENERAL WEB SURFING
At some point in your research --
usually after searching the Deep Web using Web databases and
online catalogs -- you may want to look for information and opinion found on
free websites within the Visible Web. This is often
referred to as general Web surfing. Be cautious, however,
when searching the Visible Web because no quality control is in effect
here. You may find highly accurate and reliable information at one
website, and complete falsehoods at another.
Two types of Web search tools are available to help you find websites and/or
web pages: subject directories and search engines.
Let's examine each separately.
4. SUBJECT DIRECTORIES
AND SELECTIVE DIRECTORIES
Web subject directories (such as Yahoo!, LookSmart, LII,
and many others) provide lists of websites arranged by subject category.
The websites included at a subject directory are chosen by people known as indexers.
Each site in the directory is listed under one or more subject
categories, as determined by the directory's indexers. A brief
description of each site listed is usually included.
Directories are often a good place to start when
you’re looking for information on relatively general subjects or if you
want an overview of what’s available on the Web on a given subject.
Thus, to find websites on general subjects using subject
directories:
· browse through the directory’s list of subject categories, OR
· do a keyword search using search terms that describe your general subject
If, however, you already have a specific
research question in mind, a different approach can be used.
To find websites on a specific research question using subject
directories:
* do a keyword search using terms that describe the overall general
subject under which your topic falls (click here for an example, use the Back button to
return here)
* choose a general website on that subject from your results list (click here for an example, use the Back button
to return here)
* use that general website and see if it has a site-specific search engine that
allows you to search that website’s collection of information (click here for an example, use the Back
button to return here), AND/OR
* look for links provided by the general website to websites or webpages that
focus on specific topics (click here for an
example, use the Back button to return here)
There is wide variation in the number and quality of sites included in
different Web subject directories. Many of the best-known directories,
such as Yahoo! or Excite, try to be as comprehensive as possible,
with very extensive listings. However, one disadvantage of these large
directories is that they usually do little, if any, evaluation of the quality
of the sites they list, thus making it difficult to find the best sites in a
particular subject area.
For that reason, you are wise to use a subject directory that only lists sites known to be high quality. These directories are known as selective directories. In addition to only indexing credible websites, selective directories often provide links to other specialized sites, which in turn, provide links to even more specific high-quality documents on a particular subject or topic.
Recommended selective directories:
Librarians' Internet Index (http://www.lii.org) -- high-quality resources on a range of general subjects
InfoMine (http://infomine.ucr.edu/)
-- academic resources
Scout Report Archives (http://scout.cs.wisc.edu/archives/)
-- academic resources
AcademicInfo (http://www.academicinfo.net) -- scholarly sites on a wide range of subjects
5. GENERAL SEARCH
ENGINES AND SITE-SPECIFIC SEARCH ENGINES
Web search engines (such as AskJeeves, Google, AltaVista,
and many others) allow you to search through millions of websites using your own
keyword(s). Websites gathered and indexed by search engines are not
selected, organized or previewed by humans. Instead, their collection of
websites is created entirely by computer programs called spiders
(also known as robots) that continuously scan the Internet
looking for sites to add to their index.
Since the collection of websites indexed by
search engines are huge (numbering in the millions) and have no subject
organization at all, it is very important to think carefully about what search
words to use and be aware of the various search features available before
performing a search. Always look for the "Search Help," "Search
Tips," or other pages that explain the features of the search engine
you're using. Remember that Web search engines, unlike library online catalogs,
do not use a common set of subject headings. Therefore, to use search engines
effectively, it is usually best to use very precise search words or phrases, or
combine several search terms using Boolean logic (as discussed in
Lesson 3).
Search engines should be used when you have a focused research question in mind
or when you’re looking for a specific item of information, such as a
known document (e.g. the U.S. Declaration
of Independence), image, etc. They're not recommended for finding sites on
broad subjects, such as "astronomy" or "history." As
discussed earlier, Web subject directories should be used to find sites on
general subjects.
Finally, there is a special type of search engine you should be aware of. Sometimes, websites offer their own internal
search engine that allows you to search just that website’s collection of
information. These are known as site-specific
search engines. Click HERE
to see an example of a website that contains a site-specific search engine.
6. COMMON FEATURES OF
GENERAL SEARCH ENGINES
Listed below are features common to many
search engines. Keep in mind, however, that these features may not work the
same -- or even be available -- on every search engine.
AND: many search engines use the + sign (often called the "require" sign) in front of words that must be included in the search results. For example, + immigration +economy is often used instead of immigration AND economy. Some search engines that allow the use of AND and OR require that they be capitalized. (Thus, it's a good idea to always capitalize these connectors if you use them.) Finally, some search engines, such as Google, assume that a typed space equals AND. For example, immigration economy would automatically be understood as immigration AND economy.
OR: some search engines assume that a typed space between search terms equals OR. For example, economy business would automatically be understood as economy OR business.
Phrase searching: by putting a phrase in quotation marks, documents will be retrieved that contain that exact phrase. For example: "illegal immigration" will retrieve documents containing those two words next to each other as a phrase.
Truncation: a symbol (usually an asterisk) that allows you to search for all variations of a common root. For example, econom* finds: economy, economic, economics, economist, etc.
Parentheses: to designate which operations are to be carried out
first. For example, in this search statement:
("illegal immigra*" OR
"undocumented workers") AND econom*
a search engine would first search for ("illegal
immigra*" OR "undocumented workers"). That
result would then be ANDed with econom*.
Relevance
ranking: a programming method that
attempts to rank search results based on various factors. Different search
engines use different ranking systems. Documents returned from a search can be
ranked on such factors as:
* frequency of search words in document
* words found in title or near beginning of document
* search words found close to one another
7. KEY POINTS TO
REMEMBER
·
A website
is a coherent collection of Web pages linked together.
·
URL’s have 3 basic parts: the protocol, the server
name, and the resource ID.
·
The server name
always ends with a dot and a 3-letter or 2-letter extension called the domain
name (or domain type). The domain
name is important because it usually identifies the type of organization that
created or sponsored the website.
·
Looking for
information and opinion found on free websites within the Visible Web (as opposed
to the Deep Web) is known as general Web surfing. Be cautious,
however, because surfing can uncover highly credible sites as well as sites
containing very questionable or false information.
·
Two types of Web
search tools are available to help you find websites and/or web pages: subject
directories and search engines.
·
Web subject
directories provide lists of
websites arranged by subject category. The websites included in a subject
directory are selected, organized, and previewed by human beings. They’re
often a good place to start when you’re looking for information on
relatively general subjects or if you want an overview of what is available on
the Web on a given subject.
·
Selective
directories, such as the Librarians' Internet Index, are a type
of subject directory that only list sites recognized to be high in academic
quality.
·
Web search
engines (such as AskJeeves, Google, AltaVista, and many
others) allow you to search through millions of websites using your own
keyword(s). Computer programs known as spiders collect and index
the websites found with a search engine. It is appropriate to use search
engines when you have a focused research question in mind rather than a broad
subject.
· Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information. These are known as site-specific search engines.
| Home | Syllabus | Assignments | Text | Instructor |
last revised: 11-8-05 by Eric Brenner,
These materials may be used for educational purposes if you inform and credit the author and cite the source as: LSCI 106 Online Research. All commercial rights are reserved. Send comments or suggestions to: Eric Brenner at brenner@smccd.net