LSCI 106: ONLINE RESEARCH 1: INTRODUCTION TO ONLINE RESEARCH


ADVANCED SEARCH STRATEGY

The use of the logical operators, AND and OR, along with truncation symbols, are the key elements needed to develop a basic search strategy for most research topics. Sometimes, however, some additional procedures are needed for more precise and focused searches. Such additional precision is especially important when searching in full-text databases.

Field Searching

While some databases are preset to search every word of every article in the database, in the most popular periodical databases, such as the InfoTrac, Gale and Proquest databases, the search mode is preset to search just in the most important "key" parts of each article. These "key" parts of each article are commonly called the "key word" field (or index) and the key word field usually includes the title, citation, abstract and subjects of each article. This means that in the default (preset) setting, the database does NOT search for the search words within all of the words in every article. Although the "key word" field is usually most effective for the majority of searches, in some cases it makes sense to broaden your search by searching within the full text of every article (referred to as the "text word" index in InfoTrac) or it may make sense to narrow a search to just the subject headings, for example. In the Advanced search mode of most periodical databases, you may select a specific field or index in which to search. This is a search strategy called "field searching."

"Field searching" is a common method of refining searches that can allow more precise searching than using just logical operators. Field searching allows you to focus your search on specific fields in all records and to bypass irrelevant information. Different databases include different fields and use different methods for field searching. Some common fields include subject, title, author, journal name, date and language. In database searching, the choice of fields in which to search are often referred to as "indexes", which may actually include a set of several fields in one. For example, the "key word" index in the InfoTrac database includes all of the words in the title, citation, abstract and subjects of every article.

When searching for a precise subject, the most common field to search is the subject or descriptor field. It is important to remember, however, that the terms included in the subject or descriptor field of most databases are usually limited to those terms included in the controlled vocabulary for that database. To search in the subject or descriptor field, therefore, it is most effective to use terms from the controlled vocabulary for the specific database being searched. In some databases, however, proper names and other exceptional terms may be included in the subject field even though they are not listed in the controlled vocabulary list.

When searching full-text databases, it is often effective to begin by limiting a search to the subject or descriptor field, if the database is adequately indexed. If the result of a subject index search is too limited, the search can be extended to other fields that are broader than the subject terms but still more limited than searching the full text. The abstract field, when available, can be a very effective field to search, since it includes more words than just those in the controlled vocabulary but is limited to words that summarize the key ideas of the full article or other document. Since many full-text databases do not include abstracts, the lead paragraph field is another field that is often used for a degree of search precision and breadth relatively similar to abstracts. The lead paragraph field, which is included in most newspaper databases as well as many other full-text databases, generally includes the first paragraph (or particular number of words comparable to a long paragraph) at the beginning of each document. In some database search programs the headline (or article title) and the lead paragraph can be searched at the same time. When this is available, adding the headline can be somewhat more effective than searching the lead paragraph alone.

Proximity Operators

The "proximity operator" is another type of "operator" that can be used in addition to logical operators to refine searches. Although different databases use various slightly different types, proximity operators are basically used to specify how near one term must be to another and, sometimes, in what word order those terms should be. One of the most common proximity operator is W/n (the "n" stands for any number). For example, in some databases (including the Proquest Newspapers database), abortion w/20 legislation would retrieve all records containing the words "abortion" and "legislation" within 20 words of each other. Other proximity operators, available in some database programs, limit search terms to the same sentence or to the same paragraph. For example, in the advanced mode of the Alta Vista Web search engine, the NEAR operator (e.g., abortion NEAR legislation ) retrieves documents containing terms within 10 words of each other.

When using proximity operators, you should keep in mind the general principle that the closer two words are to each other in a document, the more likely they are related to each other in some way. This is basically why proximity operators provide more search precision than an AND operator. For example, an AND search would retrieve a record in which one search term appeared near the beginning of an article while another term appeared only at the very end of the article. In such a case, the two words are probably not significantly related to each other and the search is not very effective. If, on the other hand, a proximity operator is used to retrieve articles in which two search terms are within the same sentence, it is quite likely that the terms are related to each other in those records.

A proximity operator could also be effective in a search in which you initially think of using a phrase (multiple-word term). For example, the search: abortion legislation would not retrieve an article with the sentence, "This legislation has limited the availability of abortion..." This article would be retrieved, however, using a proximity operator as in the previous examples, abortion w/20 legislation or abortion NEAR legislation.

The "NOT" Operator

The NOT logical operator excludes or eliminates records that contain a given term or terms. Placing a NOT between two terms instructs the computer to search for all records that contain the first term but that do NOT contain the second term. For example, if you were looking for articles on athletics-- meaning: general sports activities not the Oakland baseball team, you could enter the search: athletics NOT Oakland. The search would retrieve articles dealing with information services, but articles dealing with libraries would be eliminated. The NOT operator should be used very carefully and cautiously because it can often eliminate records unexpectedly that the searcher would not actually want eliminated. For example, in the search: athletics NOT baseball, if an article was titled "Athletics Programs in Oakland Schools", this article would be eliminated.

Relevance Ranking and Natural Language Searching

For many years computer scientists have been attempting to develop innovations in database search programs (called "search engines") that could improve on traditional Boolean searching capabilities. Various new search features that can potentially help less skilled searchers achieve more accurate search results--especially in full-text databases--are now commonly available in World Wide Web search engines and in some online database services. These features apply new database program capabilities most commonly referred to as "relevance ranking" and "natural language searching." Essentially, relevance ranking identifies documents primarily according to the frequency in which key terms appear in individual documents as compared to the frequency of those terms in the entire database. (Some programs apply similar or additional relevancy criteria.) Based on statistical analyses of these criteria, the program ranks documents according to those considered to be most relevant and displays them in that order. Natural language searching allows you to enter search descriptions in plain English and the programs then use various linguistic and other techniques to identify significant words and phrases to be searched.

Search features such as relevance ranking can help achieve more accurate searches than through the use of Boolean techniques alone, especially with certain types of searches. These types of features can be more useful when researching conceptual and complex issues as opposed to when the topic is highly specific, such as those involving names of people or organizations. The quality of search results has been commonly measured by two standards: precision and recall. Precision refers to the proportion of the documents retrieved in the search that are relevant to the research question, while recall is the proportion of all the documents in the database relevant to the research question that are actually retrieved by the search. Relevancy ranking and natural language searching may provide better recall in certain searches, but more precision still usually requires human judgment. Although some people believe that these new types of search techniques will eventually replace Boolean searching, these features should generally be used as additional tools that can supplement Boolean strategies rather than substitute for them.

| Home | Syllabus | Assignments | Text | Instructor |


last revised: 10-2-04 by Eric Brenner, Skyline College, San Bruno, CA

These materials may be used for educational purposes if you inform and credit the author and cite the source as: LSCI 106 Online Research. All commercial rights are reserved. To contact the author, send comments or suggestions to: Eric Brenner at brenner@smccd.net