Bilingual Intelligence Testing

• by José A. Cárdenas, Ed.D. • IDRA Newsletter • January 1995

The following article was written by Dr. José A. Cárdenas around the year 1964, when he was serving as chairman of the Education Department at St. Mary’s University. It was first published in 1972 and is included in his new reference book, Multicultural Education: A Generation of Advocacy published by Ginn Press.

Although written more than 30 years ago, the caveats raised about invalidity of intelligence testing for linguistically and culturally different children have never been addressed. There have been no further inquiries into the administration, performance and interpretation problems identified by the author in 1972. On the contrary, current literature about ethnic differences in mental abilities inferred from the results of IQ tests is being used for educational policy development, without regard to the problems identified in this article.

Dr. Cárdenas’ early experiences with IQ testing of language minority, limited-English-proficient and bilingual students is a direct contradiction to Richard J. Hernstein and Charles Murray’s assertion in their recent book, The Bell Curve, that there are no cultural biases in intelligence tests.

The past few years have seen increased concern over the testing of intelligence of minority children and particularly of the assessment of mental abilities of non-English speaking or bilingual children. Various national, regional and local studies have ascertained that bilingual children are over-represented in classes for the mentally retarded, and, in some cases, the traditional underachievement characterizing minority children in the public schools has been rationalized on the basis of below normal mental abilities.

The unfair practice of administration of invalid intelligence tests to bilingual and bicultural populations has been noted and addressed by the courts and various civil rights agencies. In general, both the courts and regulatory agencies have understood at least some of the reasons for the lack of test validity and have consistently ruled against the use of language incompatible testing.

However, the remedy formulated by the courts, often at the insistence of plaintiffs, has resulted in equally discriminatory or in some cases, even more discriminatory testing practices.

Courts have consistently ruled the use of English intelligence tests to be unfair to children of limited English speaking ability but have then ruled that intelligence testing must be conducted in the language spoken in the child’s home. Such a response has not proved to be an ideal solution to the problem, and in most cases, has resulted in worse testing practices than those being replaced.

Assumptions in Intelligence Testing

Understanding why such responses are dysfunctional requires an understanding of the rationale and methodology utilized in intelligence testing. In general, intelligence testing is based on the following four assumptions.

Intelligence, being an intangible, cannot be measured directly, therefore it must be measured indirectly and on the way intelligence influences certain behaviors. An intelligence test item is a situation in which the behavior of the testee is dependent on his or her mental abilities.
The test itself is a series of situations which represent ways in which intelligence is utilized. The test items are samples of activities influenced by itelligence. For example, it is assumed that a person’s vocabulary is influenced by his or her intelligence. An individual’s mental abilities determines how many and which (quantity and quality) words he or she understands.
Since it is difficult and time consuming to determine all the words an individual knows, a sampling of words is used, and the ability of an individual to understand words on this list is then generalized to estimate his or her entire vocabulary.
It is assumed that the individual has been exposed to certain common experiences and that his or her knowledge is not dependent on his exposure to experiences, but rather to the amount gained or retained from these experiences.In the vocabulary example used above, it is assumed that each of the words presented in the sample are words commonly perceived by the testee. If the testee fails to master the word it is because, in spite of having encountered the word, the testee is intellectually unable to conceptualize it or his or her ability to retain the concept is lacking.
It is assumed that the testee has all the necessary skills and competencies necessary for responding to the test situation; the only variable is the level of mental functioning. If, in a test situation, a testee is required to write an answer, it is assumed that the testee knows how to write and that the ability to manipulate a pencil does not influence his or her behavior.

Problems with Testing Assumptions

The four assumptions listed above immediately ascertain the invalidity of intelligence tests for persons who have atypical language, cultural and socio-economic characteristics. In fact, the invalidity is so clear that one wonders about school personnel who persist in the utilization of these tests when it is clear even to lay judges, administrators and community groups that the tests are biased, unfair and invalid.

Testing the intelligence of a Spanish-speaking youngster through a sampling of English words penalizes the testee since he or she may not have had the opportunity to hear the word in English and the test does not measure any vocabulary that he or she has acquired in Spanish.

The assumption that the testee has been exposed to experiences basic to test activities similarly leads to invalidity. For the most part, experiences utilized in intelligence test items are taken from typical White, Anglo Saxon, English-speaking, middle-class situations. Test critics go further and claim that the test items are biased in favor of Northeast, urban populations.

For example, one test item requires that a child associate a hill of snow with the type of vehicle used for transportation on this snow. A child from Key West, Florida; Brownsville, Texas; or San Diego, California, may not have the experience of sliding down a snowy hill which the test assumes everybody has, so that his or her failure in the item may be attributed to this lack of experience rather than the low level of intelligence implied by the test.

Culturally different children experience the same failure due to not having the experiences assumed by the test items rather than to lack of intelligence.

When an intelligence tester asks a Mexican American or Puerto Rican child, “What would you do if your mother sends you to the store to buy a loaf of bread and the grocer does not have any?,” it is assumed that the child is acquainted with the concept of bread among other things. If the child is better acquainted with home-produced flour tortillas or tostones and does not know how to react to the problem situation, it is dangerous to assume a low intellect.

Intelligence tests often require special skills and abilities commonly acquired at the age or grade level at which the test is administered. A fifth grade intelligence test may require third grade reading skills. The tester assumes that a fifth grader can read at least at the third grade level. However, if the fifth grade student was academically retarded because he or she did not learn to read in the first grade due to his or her having to develop fluency in the English language and subsequently did not possess third grade reading skills at the fifth grade level, the assumption that he possesses the necessary skills is false, and the test item, and subsequently the test, and the score(s) produced are invalid.

Problems with Spanish Language Intelligence Tests

As stated previously, courts, unlike educators, have not experienced difficulty in understanding the reasons for the lack of validity tests developed for White, Anglo Saxon, English-speaking, middle-class populations when applied to non-White, non-Anglo Saxon, non-English speaking or non-middle class populations.

However, the remedy implemented by the courts has frequently been equally dysfunctional and invalid.

In one case involving Mexican American children, the court addressed the administration of English language intelligence tests to be replaced by the administration of Spanish language intelligence tests. Most likely, the results were disastrous.

In the first place, there are no Spanish language intelligence tests developed for or standardized for Mexican American children. In the second place, language is not the only invalid characteristic of intelligence tests used for minority populations.

In order to illustrate the ramification and complexity of the problem, I will draw from my experience in the measurement of mental abilities of Mexican American children. Using a very simple test of mental abilities in order to avoid the complexities of analyzing tests such as the Wechsler or Binet which require and assume much more sophisticated testee skills and experiences, I did extensive testing of Mexican American elementary school children using the Peabody Picture Vocabulary Inventory.

The Peabody utilizes a simple rationale and methodology. The testee is presented a test kit which has been divided into four compartments. Each compartment contains a picture depicting either a simple object at the lower levels of the test or an activity or some complex concept at higher levels.

The tester gives an oral stimulus word, and the testee is to indicate which of the four pictures depicts the stimulus word. For example, a plate may depict a butterfly, a bird, a baseball bat and an elephant. When the testor says, “Show me the butterfly,” the child is expected to point to the picture of the butterfly. Assuming that he or she has experienced the objects depicted, it is assumed that the response to the stimulus word is dependent on, and solely on, his or her mental abilities. It is assumed that the child has seen a butterfly and that he or she has previously heard and perhaps used the word butterfly.

The fallacy of the assumptions mentioned above holds true in this test situation. It is not only possible, but extremely common, that a child from a Spanish-speaking home has never heard this insect being referred to as a butterfly. Although he or she may have heard it referred to by a Spanish word – which incidentally in no way resembles the phonetic elements of butterfly (mariposa) – and may be able to identify the picture if the stimulus word were to be presented in Spanish, the child’s failure to respond correctly assumes a low level of mental ability. Incidentally, many children from Spanish-speaking environments who are highly fluent in English have never heard the test words in the English language.

The opposite of this situation is also true. Mexican American children who are fluent in Spanish frequently have never heard the Spanish equivalent of some English words either because there is no commonly utilized Spanish language equivalent or the concept is extraneous to the racial, ethnic or socio-economic culture of the child. For instance, I have never heard a commonly used Spanish language equivalent for the English language words marshmallow, cream puff, hot dog, or bush.

For bilingual children, the validity of Spanish-language testing depreciates tremendously. The bilingual child by definition is one who has fluency in two languages. Testing in English does not reach the vocabulary content the child may possess in Spanish; testing in Spanish does not sample the child’s English vocabulary. Similarly, English sampling does not identify words associated with a child’s Mexican (Spanish, Indian, Hispanic, Latin) culturally related concepts; Spanish sampling does not identify words associated with a child’s English (American) culturally related concepts.

Problems with Translation

Many attempts have been made to validate intelligence tests through translations. For the most part, such attempts have proved fruitless. I have seen, at some time or another, at least a dozen attempts to translate the Peabody test. The following example illustrates the reason for the failure of translations to validate intelligence measures.

1. Language Competency of Translators

The Spanish language competency of some translators have left a lot to be desired. In one do-it-yourself translation of the Peabody test called to my attention, the stimulus word hot dog has been translated to “un perro caliente” which at best means a dog which is warm and at worst means a dog in heat.

2. Dialectic Differences

Translators have a difficult time identifying dialectic characteristics of the second language, often peculiar to an area or region in which the translated test is to be utilized. In the administration of the Peabody, the writer had translated the stimulus word tree into the Spanish arbol. In one school, almost every Spanish-speaking student failed the test item. After the test administration, I asked a child, “What is that?,” while pointing to the tree. The child replied, “Es un palo.” I subsequently found that in that area the word “arbol” was never used; “palo” was the accepted terminology.

As we have seen previously, the assumption that the testee was acquainted with the word and the concept did not hold true, therefore, the item was invalid.

3. Maintaining Levels of Difficulty

A third problem encountered in the translation of tests is retention of the level of difficulty of a test item. In the development of a test, the items must possess a certain level of difficulty to distinguish between age or grade levels. A test item using a stimulus word must ascertain the word that is commonly known by the members of an age group (such as eight year olds, but not commonly known by seven year olds). If an eight year old does not know the word, it is assumed that he or she has inferior intelligence. If an eight year old knows the word, he or she is assumed to be of average intelligence. If a seven year old knows the word, he or she is assumed to be of superior intelligence.

In general, words are ranked by difficulty. The level reached by a child in relation to his or her age constitutes his or her intelligence.

The translation of a word frequently changes the level of difficulty for that word. For example, the Peabody uses the stimulus word tumble which must be associated with the picture of a child tumbling down a hill. But, there is no commonly used Spanish language equivalent for this word.

Translators have used the words “tropezar”and “caer” as stimulus words in Spanish. Yet, “tropezar” means trip, which may or may not be at the same level of difficulty as tumble. It may be easier (at a lower level) or it may be harder (at a higher level). The alternative “caer” means fall and is generally at a much lower level of difficulty than is tumble.

Levels of difficulty for vocabulary words change from language to language, and, when one word has several translations, the level of difficulty changes from one translation to the other. Such a situation most definitely invalidates each such test item.

4. Index of Discrimination

A similar situation exists in the index of discrimination for test items. The value of a stimulus word in a test item is dependent upon its ability to discriminate, that is, high intelligence testees will consistently get it right, low intelligence testees will consistently get it wrong.

In the development of a test, huge quantities of test items are discarded due to their inability to produce this discrimination. The ones that are retained have been found to make the discrimination consistently. The translation of a test item changes the index of discrimination. The translated word may favor the low intelligence child and penalize the high intelligence child. Therefore, translated test items must be re-analyzed for their ability to discriminate.

The four problems listed above make it virtually impossible to utilize a translation of an intelligence test with any degree of validity. In order to utilize such a test, it is necessary to conduct the item analysis and re-norm the test for the intended population. The amount of work involved would be almost identical to the effort made in the original development of the test.

Test Administration

It should be noted that the lack of validity holds true regardless of language characteristics or ethnicity of the administrator. There is no basis for assuming that an invalid instrument becomes valid when used by experts, sensitive or skilled individuals, or by sympathetic or empathetic administrators. For school personnel to claim that translated tests are valid because the administrator has a “high degree of knowledge of Mexican American children” is asinine and is not substantiated by reason or research. Assuming that a test publishing company came up with a high quality translation of an intelligence test for bilingual children, the administration of such a test would still present some formidable problems.

In attempting to obtain more valid administrations of the Peabody, I tried administering two tests. The first administration was in English and the second in Spanish.

When the English version was given, several items were missed by the testees due to the language described previously. Surprisingly, when the test was administered in Spanish, the testees tended to make the same mistakes in the Spanish language, even when subsequent investigation indicated that the testees knew the right answers. They consistently made the same wrong choice when retested in a more familiar language.

It appears that the behavior of selecting a choice from the four pictures presented turned out to be self-reinforcing. Having made that choice, there was a strong tendency to stick to that choice even when testees subsequently discovered it to be a wrong one.

In order to prevent wrong choice reinforcement, the two tests were administered simultaneously. The stimulus word was given in English and in Spanish in order to prevent the first response from becoming self-reinforcing. The testees were not allowed to respond until the stimulus was presented in both languages.

Performance on such administrations improved dramatically, even though some new problems were discovered.

Administrations were made where the English stimulus word was presented before the Spanish word, and in some administrations the sequence was reversed with the Spanish being used before the English word. The results showed significant differences in the measurement of intelligence depending upon which language cue was used first. In general, through the first three grades, Spanish followed by English produced higher results. In grades four to six, English followed by Spanish produced significantly higher results.

Perhaps this finding can be attributed to language dominance, although no attempt was made at the time to explain the phenomenon.

Conclusion

The foregoing evidently supports the contention that it is difficult, if not impossible, to measure with a high degree of validity the intelligence or mental abilities of children who speak two languages by the use of a test in a second language. Past efforts to do so have proven to be detrimental and have led to the over-inclusion of minority children in classes for the mentally retarded. Low levels of expectancy for children assumed to be mentally retarded has led to a self-fulfilling prophecy much to the life-long detriment of minority children.

Recent attempts to remedy the situation by the administration of “translations” are equally dysfunctional and will perpetuate an unfortunate situation.

If school personnel persist in their efforts to evaluate the mental abilities of minority children, or for that matter the economically or educationally handicapped, it is necessary that an extensive research and development effort be undertaken.

Alternative ways of assessing mental abilities must be identified that will probably have to differ considerably from past practices. Psychometricians have gone so far up the wrong creek in the assessment of minority mental abilities that it is wise to heed the National Education Association’s recommendation that all intelligence testing of minority children be suspended until alternative ways may be explored.

Dr. José A. Cárdenas is founder and director emeritus of IDRA. Comments and questions may be directed to him via e-mail at feedback@idra.org.

[©1995, IDRA. This article originally appeared in the January 1995 IDRA Newsletter by the Intercultural Development Research Association. Permission to reproduce this article is granted provided the article is reprinted in its entirety and proper credit is given to IDRA and the author.]

Resource Center

Bilingual Intelligence Testing

News

Classnotes Podcast

Explore IDRA