A Statistical Interpretation Of Term Specificity And Its Application In Retrieval

2977 words - 12 pages

Reprinted from Journal of Documentation Volume 60 Number 5 2004 pp. 493-502 Copyright © MCB University Press ISSN 0022-0418 and previously from Journal of Documentation Volume 28 Number 1 1972 pp. 11-21

A statistical interpretation of term specificity and its application in retrieval
Karen Spärck Jones
Computer Laboratory, University of Cambridge, Cambridge, UK

Abstract: The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, ...view middle of the document...

The idea of an optimum level of indexing exhaustivity for a given document collection then follows: the average number of descriptors per document should be adjusted so that, hopefully, the chances of requests matching relevant documents are maximized, while too many false drops are avoided. Exhaustivity obviously applies to requests too, and one function of a search strategy is to vary request exhaustivity. I will be mainly concerned here, however, with document descriptions. Specificity as characterized above is a semantic property of index terms: a term is more or less specific as its meaning is more or less detailed and precise. This is a natural view for anyone concerned with the construction of an entire indexing vocabulary. Some decision has to be made about the discriminating power of individual terms in addition to their descriptive propriety. For example, the index term "beverage" may be as properly used for documents about tea, coffee, and cocoa as the terms "tea", "coffee", and "cocoa". Whether the more

general term "beverage" only is incorporated in the vocabulary, or whether "tea", "coffee", and "cocoa" are adopted, depends on judgements about the retrieval utility of distinctions between documents made by the latter but not the former. It is also predicted that the more general term would be applied to more documents than the separate terms "tea", "coffee", and "cocoa", so the less specific term would have a larger collection distribution than the more specific ones. It is of course assumed here that such choices when a vocabulary is constructed are exclusive: we may either have "beverage" or "tea", "coffee", and "cocoa". What happens if we have all four terms is a different matter. We may then either interpret "beverage" to mean "other beverages" or explicitly treat it as a related broader term. I will, however, disregard these alternatives here. In setting up an index vocabulary the specificity of index terms is looked at from one point of view: we are concerned with the probable effects on document description, and hence retrieval, of choosing particular terms, or rather of adopting a certain set of terms. For our decisions will, in part, be influenced by relations between terms, and how the set of chosen terms will collectively characterize the set of documents. But throughout we assume some level of indexing exhaustivity. We are concerned with obtaining an effective vocabulary for a collection of documents of some broadly known subject matter and size, where a given level of indexing exhaustivity is believed to be sufficient to represent the content of individual documents adequately, and distinguish one document from another. Index term specificity must, however, be looked at from another point of view. What happens when a given index vocabulary is actually used? We predict when we opt for "beverage", for example, that it will be used more than "cocoa". But we do not have much idea of how many documents there will be to...

Other Papers Like A Statistical Interpretation of Term Specificity and Its Application in Retrieval

Polymorphism: Its Application in the Development of Pharmaceutical Dosage Form

2193 words - 9 pages Polymorphism: Its application in the development of pharmaceutical dosage form Polymorphism comes from the Greek words, Polus = many and morph = shape. Polymorphism means existence of substance in more than one form. Many pharmaceutical solids can exist in different physical forms. Polymorphism is often characterized as the ability of a drug substance to exist as two or more crystalline phases that have different arrangements and/or

Interpretation of the American Dream in the Jungle, the Great Gatsby and Death of a Salesman

2138 words - 9 pages Great Gatsby and Death of a Salesman. These three novels all examine the American Dream in different decades. Written in 1906 by Upton Sinclair The Jungle is a novel that portrayed the life of immigrants and the working class in early-twentieth century America. The novel was published during the muckraking decade and its depiction of poverty, unpleasant living and working conditions and the corruption of those in power led it to be called “the

Some People Have the View That the Events of Dunkirk in 1940 Deserve to Be Remembered as a Triumph for Britain and Its People. How Far Do These Sources Support or Contradict This Interpretation

1778 words - 8 pages Some people have the view that the events of Dunkirk in 1940 deserve to be remembered as a triumph for Britain and its people. How far do these sources support or contradict this interpretation Dunkirk was an important event during WWll. By 10th May 1940 the German troops had advanced through the parts of France and had advanced the beaches and ports of Dunkirk causing the Allies to retreat and to be trapped. Due to hesitation Hitler did not

Uses of Statistical Information in Medical Management

1148 words - 5 pages government’s spending at my facility in particular, on healthcare costs and help to generate revenue in our facility. When purchased care costs are down, our revenue is up. If we see an opportunity to generate revenue in other ways we approach the directorate of business operations to see if we can implement a new plan. Descriptive Statistics According to "Basic Statistical Concepts for Nurses" (2011), “Descriptive statistics are

Child Sexual Abuse In The United States And Its Long Term Effects

2107 words - 9 pages Child Sexual Abuse in the United States and its Long Term Effects David Andrade Liberty University SOCI-201 Robert D. Clark May 5, 2014 Abstract This will be a paper to demonstrate the staggering number of sexual abuse cases in the United States and its long term effects on children and adults. My interest in this topic is due to the fact that I have been sexually abused as a child and have been researching about the long term effects

Sample of a Resume and an Application Letter

574 words - 3 pages Compostela Center, with the degree of Bachelor of Secondary Education major in English. I’ve been one of your reliable practice teacher just this year 2O13 and I believe that my strong teaching experience and education will make me a very competitive candidate for this position. Attached herewith is my Official Transcript of Records and Personal Information. I am looking forward for speaking with you with regards to this employment opportunity

Term Paper of a Hotel and Restaurant Management Student

908 words - 4 pages loan was an additional loan to the previous one and it was granted mainly because it is a guaranteed loan with collateral and there will be an indemnity insurance to be charge by the bank. He then received an additional loan of two (2 M) million pesos. With Enrico’s success and progress in real estate industry, the wonder of life has taken its course. Enrico’s fate has been judged, after a night of meetings and consultations, on his way home

Interpretation of the Extract from “Three Men in a Boat” by Jerome K. Jerome (Chapter Xiv)

1457 words - 6 pages Interpretation of the extract from “Three men in a boat” by Jerome K. Jerome (Chapter XIV) The text under interpretation is an extract from the book “Three Men in a Boat” by an English writer Jerome K. Jerome. He wrote novels Three Men in a Boat, The Idle Thoughts of an Idle Fellow, Novel Notes and Three Men on the Bummel. Jerome K. Jerome is famous for his art of story-telling, his vivid style and his humor which is generally expressed in

The Effects Of Emotion, Imagery And Negative Feelings On Memory Retrieval

2815 words - 12 pages The Effects of Emotion, Imagery and Negative Feelings on Memory Retrieval My Memory Here I was, at the 2013 U.S. Synchronized Figure Skating Championships, standing in the well-known “kiss and cry” area with my team. We just skated our second program of the competition, the long program, and it felt absolutely incredible. I remember completing each element and taking a deep breath of relief each time knowing we made it through another

Interpretation Of The King In Shakespeare's Hamlet

3004 words - 13 pages Interpretation of the King in Hamlet       Shakespeare’s tragedy Hamlet presents in the character of King Claudius an intelligent, cunning, but seemingly unselfish ruler. This essay will present both an external and internal consideration of Claudius.   For the duration of the drama an important mental contest ensues between Claudius and the protagonist. John Masefield discusses this mind battle in “Hamlet, Prince of Denmark

Understanding and Application of Economics

992 words - 4 pages Understanding and application of economics Economics is the study of how society manages its scarce resources. Put simply, economics is to discuss how to use the limited resources to produce the products as much as possible. The core idea of economics is scarce resources and efficient use of resources, and can be divided into two main branches, microeconomics and macroeconomics. The microeconomics is focus on the market forces of supply and

Related Essays

Roles Of Comparative Law To Legal Interpretation And Application

5252 words - 22 pages decision. *The reasons for these issues can be: _Each member country has its own feature of society, economy, customs, political regime… Hence, if the interpretation and application is mechanical and not flexible without suitable modification, it will affect the function of that country’s legal system. *In reality, one typical example takes place in European Union, which now seek to develop a common private law for the member states. After

Application Of Statistical Concepts In The Determination Of Weight Variation In Samples

1961 words - 8 pages Application of Statistical Concepts in the Determination of Weight Variation in Samples S.M.D. Claro National Institute of Geological Sciences, College of Science University of the Philippines, Diliman, Quezon City, Philippines Date Submitted: April 23, 2013 Abstract This experiment aimed to determine the exact weight of a 25-centavo coin through weighing out replicate samples on an analytical balance using the method of

Application Of Statistical Concepts In The Determination Of Weight Variation In Samples

1936 words - 8 pages Application of Statistical Concepts in the Determination of Weight Variation in Samples Bautista, Alyssa Marie F. 1, Elpa, Maxine Sebastianne C. 1 1 Department of Food Science and Nutrition, College of Home Economics University of the Philippines, Diliman, Quezon City, Philippines Date Due: 07 Dec 2012 Date Submitted: 07 Dec 2012 Keywords: Experimental errors, Deviations, Statistics, Sampling METHODOLOGY At the start of the

“Describe In Detail A Specific Advanced Hypnotherapeutic Approach, Giving A Clear Rationale For Its Application And Therapeutic Objective.”

2470 words - 10 pages “Describe in detail a specific advanced hypnotherapeutic approach, giving a clear rationale for its application and therapeutic objective.” Hypnotherapy provides a number of different ways for treating patients suffering from anxiety. This essay seeks to look at some of these methods, aiming to uncover what these methods are, how they specifically target anxiety, what anxiety actually means on a biological and psychological level, and why