You are doing field work in India. You go into rural villages and you interview a large number of

families who are farmers. For each family, you record the following information:

Y = annual family income (in rupees)

C = number of children

L = whether or not the male adult is literate

F = the amount of land the family farms (in hectares)

B = whether or the family has a bank account

a. Say whether each variable is discrete, continuous, or approximately continuous (and briefly

explain why).

b. The distributions of Y and F are very skewed (positively). The distribution of C is slightly

positively skewed, but not very much. All three distributions are unimodal.

For C and F, say which measure of central tendency you think is most appropriate, and give

your reason. For Y, give the ranking of the mean, median, and mode.c.

Here are values of the sample mean and sample standard deviation for Y, C, and F, and also

the pairwise covariances between these variables:


standard deviation


Y 19000 3000 s YC ? ? 1680

C 4.7 1.4 s CF ? . 196

F 1.2 .2 s YF ? 360

Originally, Y is measured in rupees. You decide to add a government subsidy for education,

in the amount of 500 rupees, that each family receives. Also, you decide to change the units

so that each family’s income is in US dollars. The exchange rate is 40 rupees per US dollar.

Call the new variable “net US dollar family income” (N). Give the values of the mean and

standard deviation of N.

d. Originally, F is measured in “hectares” (one hectare is about 2.5 acres). You decide to rescale

farm size so it is measured in acres. Call the new variable A (for area). Give the value of the

covariance between A and N. Give the value of the correlation between A and N.

e. Suppose you decide to drop the 10% of observations that have the lowest family income, and

also the 10% of observations that have the highest family income. What direction of effect

(increase, decrease, approximately no effect) will this have on the mean? What direction of

effect will this have on the standard deviation? The median? The interquartile range?

Leave a Reply