The file P02_02.xlsx contains information on over 200 movies that were released in 2006 and 2007.
Create two column charts of counts, one of the different genres and one of the different distributors.
Recode the Genre column so that all genres with a count of 10 or less are lumped into a category called Other. Then create a column chart of counts for this recoded variable. Repeat similarly for the Distributor variable.
3. The file P02_03.xlsx contains data from a survey of 399 people regarding a government environmental policy.
Which of the variables in this data set are categorical? Which of these are nominal; which are ordinal?
For each categorical variable, create a column chart of counts.
Recode the data into a new data set, making four transformations:
(1) change Gender to list “Male” or “Female”;
(2) change Children to list “No children” or “At least one child”;
(3) change Salary to be categorical with categories “Less than $40K,” “Between $40K and $70K,” “Between $70K and $100K,” and “Greater than $100K ” (where you can treat the breakpoints however you like); and
(4) change Opinion to be a numerical code from 1 to 5 for Strongly Disagree to Strongly Agree.
Then create a column chart of counts for the new Salary variable.
4. The file P02_04.xlsx contains salary data on all Major League Baseball players for each year from 2002 to 2011. (It is an older version of the data used for examples later in this chapter.) For any three selected years, create a table of counts of the various positions, expressed as percentages of all players for the year. Then create a column chart of these percentages for these years. Do they remain fairly constant from year to year?