Fall 2015 Final Exam
Due: Dec 03, 2015
1. Explain data mining and the data mining process
Use the following dataset information to answer question 2-5
Data Set Information:
Below is the Attribute Information of the dataset:
age: continuous.
workclass: Private, Self-emp, Gov, Never-worked.
education: Bachelors, Some-college, HS-grad, Assoc, Masters, Doctorate
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Protective-serv, Armed-Forces.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, India, Japan, Greece, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
Income category: low income (<$35K), Medium Income ($35K-$75K), High Income (>$75K).

2. Data understanding and preparation: Which variables are discrete and which are continuous. What steps will you take in the preparation of your dataset before you employ the use of models.

3. Suppose we had data on the above attributes for Mr. Smith except his income category. If the prediction task is to determine whether Mr. Smith makes over $75K a year, explain the steps you will take to perform make the prediction. (Hint: specify and explain whether classification, regression, clustering, association, etc, suggest some algorithms (models) you may want to use, and how to validate your analysis)

4. Alternatively, Mr. Smith’s data was collected on the above attributes except his age. If the prediction task is to determine Mr. Smith age, explain the steps you will take to perform make the prediction. (Hint: specify and explain whether classification, regression, clustering, association, etc, suggest some algorithms (models) you may want to use, and how to validate your analysis)

5. Suppose that we cannot establish any of the attributes as a class attribute, what type of analysis will be appropriate to bring some meaning to your dataset? (Hint: specify and explain whether classification, regression, clustering, association, etc, suggest some algorithms (models) you may want to use)