LESSON 4: FINDING WEBSITES

Finding Information on the Web

1. PREFACE

Thus far in the course, we have examined two of the most widely used information sources on the Internet: books and periodicals. We have used online catalogs (to find books) and web databases (to find periodical articles). In this lesson, we expand our search for information sources to include information found in websites.

2. WEBSITES, URL’s, AND DOMAIN NAMES

Text Box: DEFINITION: WEBSITE

A website is a coherent collection of Web pages that are linked together and reside on that part of the Internet known as the World Wide Web. Millions of websites exist, offering vast amounts of information of varying credibility and worth.

Every website (and every Web page) has a unique address known as a URL (Uniform Resource Locator) which identifies where it is located on the Web. For example, here is the URL for Skyline Library’s home page:

http://www.smccd.net/accounts/skylib/index.html

URLs have three basic parts: the protocol, the server name and the resource ID. These parts provide "clues" to where a Web page originates and who might be responsible for the information at that page or site. Let's look at each part:

· PROTOCOL: appears at the start of the URL before the double slash and identifies the method (set of rules) by which the resource is transmitted. All Web pages use HyperText Transfer Protocol (HTTP). Thus, all Web URL's begin with http://.

· SERVER NAME: appears between the double slash (//) and the first single slash (/)
The server name for the Skyline Library URL is: www.smccd.net/

The server name identifies the computer on which the resource is found. (Computers that store and "serve up" Web pages are called servers.) This part of the URL commonly identifies which r organization or company is either directly responsible for the information or simply providing the computer space where the information is stored.

The server name always ends with a dot and a three-letter or two-letter extension called the domain name (sometimes called the domain type). The domain is important because it usually identifies the type of organization that created or sponsored the resource. Sometimes it indicates the country where the server is located. The most common domain names are:

.com for company or commercial sites

.org for non-profit organization sites

.edu for educational sites (most commonly four-year universities)

.gov for government sites

.net for Internet service providers or other types of networks

.mil for a military body

If the domain name is two letters, it identifies a country, e.g. .us for the United States, .uk for the United Kingdom, .au for Australia, .mx for Mexico or .ca for Canada.

· RESOURCE ID: everything after the first single slash (/)
The resource ID for the Skyline Library URL is: accounts/skylib/index.html

The resource ID contains directories and subdirectories, thereby giving you the exact location of the document on the server. Following the last slash (/), you are given the file name for the specific page. The file name ends with a three or four letter designation that specifies the file type (e.g., .htm or .html for a standard Web page, .jpg or .gif for common graphic files).

3. GENERAL WEB SURFING

At some point in your research -- usually after searching the Deep Web using Web databases and online catalogs -- you may want to look for information and opinion found on free websites within the Visible Web. This is often referred to as general Web surfing. Be cautious, however, when searching the Visible Web because no quality control is in effect here. You may find highly accurate and reliable information at one website, and complete falsehoods at another.

Two types of Web search tools are available to help you find websites and/or web pages: subject directories and search engines. Let's examine each separately.

4. SUBJECT DIRECTORIES AND SELECTIVE DIRECTORIES

Web subject directories (such as Yahoo!, LookSmart, LII, and many others) provide lists of websites arranged by subject category. The websites included at a subject directory are chosen by people known as indexers. Each site in the directory is listed under one or more subject categories, as determined by the directory's indexers. A brief description of each site listed is usually included.

Directories are often a good place to start when you’re looking for information on relatively general subjects or if you want an overview of what’s available on the Web on a given subject.

Thus, to find websites on general subjects using subject directories:

· browse through the directory’s list of subject categories, OR

· do a keyword search using search terms that describe your general subject

If, however, you already have a specific research question in mind, a different approach can be used.

To find websites on a specific research question using subject directories:

* do a keyword search using terms that describe the overall general subject under which your topic falls (click here for an example, use the Back button to return here)

* choose a general website on that subject from your results list (click here for an example, use the Back button to return here)

* use that general website and see if it has a site-specific search engine that allows you to search that website’s collection of information (click here for an example, use the Back button to return here), AND/OR

* look for links provided by the general website to websites or webpages that focus on specific topics (click here for an example, use the Back button to return here)

There is wide variation in the number and quality of sites included in different Web subject directories. Many of the best-known directories, such as Yahoo! or Excite, try to be as comprehensive as possible, with very extensive listings. However, one disadvantage of these large directories is that they usually do little, if any, evaluation of the quality of the sites they list, thus making it difficult to find the best sites in a particular subject area.

For that reason, you are wise to use a subject directory that only lists sites known to be high quality. These directories are known as selective directories. In addition to only indexing credible websites, selective directories often provide links to other specialized sites, which in turn, provide links to even more specific high-quality documents on a particular subject or topic.

Recommended selective directories:

Librarians' Internet Index (http://www.lii.org) -- high-quality resources on a range of general subjects

InfoMine (http://infomine.ucr.edu/) -- academic resources

Scout Report Archives (http://scout.cs.wisc.edu/archives/) -- academic resources

AcademicInfo (http://www.academicinfo.net) -- scholarly sites on a wide range of subjects

5. GENERAL SEARCH ENGINES AND SITE-SPECIFIC SEARCH ENGINES

Web search engines (such as AskJeeves, Google, AltaVista, and many others) allow you to search through millions of websites using your own keyword(s). Websites gathered and indexed by search engines are not selected, organized or previewed by humans. Instead, their collection of websites is created entirely by computer programs called spiders (also known as robots) that continuously scan the Internet looking for sites to add to their index.

Since the collection of websites indexed by search engines are huge (numbering in the millions) and have no subject organization at all, it is very important to think carefully about what search words to use and be aware of the various search features available before performing a search. Always look for the "Search Help," "Search Tips," or other pages that explain the features of the search engine you're using. Remember that Web search engines, unlike library online catalogs, do not use a common set of subject headings. Therefore, to use search engines effectively, it is usually best to use very precise search words or phrases, or combine several search terms using Boolean logic (as discussed in Lesson 3).

Search engines should be used when you have a focused research question in mind or when you’re looking for a specific item of information, such as a known document (e.g. the U.S. Declaration of Independence), image, etc. They're not recommended for finding sites on broad subjects, such as "astronomy" or "history." As discussed earlier, Web subject directories should be used to find sites on general subjects.

Finally, there is a special type of search engine you should be aware of. Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information. These are known as site-specific search engines. Click HERE to see an example of a website that contains a site-specific search engine.

6. COMMON FEATURES OF GENERAL SEARCH ENGINES

Listed below are features common to many search engines. Keep in mind, however, that these features may not work the same -- or even be available -- on every search engine.

AND: many search engines use the + sign (often called the "require" sign) in front of words that must be included in the search results. For example, + immigration +economy is often used instead of immigration AND economy. Some search engines that allow the use of AND and OR require that they be capitalized. (Thus, it's a good idea to always capitalize these connectors if you use them.) Finally, some search engines, such as Google, assume that a typed space equals AND. For example, immigration economy would automatically be understood as immigration AND economy.

OR: some search engines assume that a typed space between search terms equals OR. For example, economy business would automatically be understood as economy OR business.

Phrase searching: by putting a phrase in quotation marks, documents will be retrieved that contain that exact phrase. For example: "illegal immigration" will retrieve documents containing those two words next to each other as a phrase.

Truncation: a symbol (usually an asterisk) that allows you to search for all variations of a common root. For example, econom* finds: economy, economic, economics, economist, etc.

Parentheses: to designate which operations are to be carried out first. For example, in this search statement:

("illegal immigra*" OR "undocumented workers") AND econom*

a search engine would first search for ("illegal immigra*" OR "undocumented workers"). That result would then be ANDed with econom*.

Relevance ranking: a programming method that attempts to rank search results based on various factors. Different search engines use different ranking systems. Documents returned from a search can be ranked on such factors as:

* frequency of search words in document
* words found in title or near beginning of document
* search words found close to one another

7. KEY POINTS TO REMEMBER

· A website is a coherent collection of Web pages linked together.

· URL’s have 3 basic parts: the protocol, the server name, and the resource ID.

· The server name always ends with a dot and a 3-letter or 2-letter extension called the domain name (or domain type). The domain name is important because it usually identifies the type of organization that created or sponsored the website.

· Looking for information and opinion found on free websites within the Visible Web (as opposed to the Deep Web) is known as general Web surfing. Be cautious, however, because surfing can uncover highly credible sites as well as sites containing very questionable or false information.

· Two types of Web search tools are available to help you find websites and/or web pages: subject directories and search engines.

· Web subject directories provide lists of websites arranged by subject category. The websites included in a subject directory are selected, organized, and previewed by human beings. They’re often a good place to start when you’re looking for information on relatively general subjects or if you want an overview of what is available on the Web on a given subject.

· Selective directories, such as the Librarians' Internet Index, are a type of subject directory that only list sites recognized to be high in academic quality.

· Web search engines (such as AskJeeves, Google, AltaVista, and many others) allow you to search through millions of websites using your own keyword(s). Computer programs known as spiders collect and index the websites found with a search engine. It is appropriate to use search engines when you have a focused research question in mind rather than a broad subject.

· Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information. These are known as site-specific search engines.

last revised: 11-8-05 by Eric Brenner, Skyline College, San Bruno, CA

These materials may be used for educational purposes if you inform and credit the author and cite the source as: LSCI 106 Online Research. All commercial rights are reserved. Send comments or suggestions to: Eric Brenner at brenner@smccd.net