Что такое findslide.org?

FindSlide.org - это сайт презентаций, докладов, шаблонов в формате PowerPoint.


Для правообладателей

Обратная связь

Email: Нажмите что бы посмотреть 

Яндекс.Метрика

Презентация на тему Analyzing missing data

Содержание

Missing data and data analysisMissing data is a problem in multivariate data because a case will be excluded from the analysis if it is missing data for any variable included in the analysis.If our sample is
Analyzing Missing DataIntroductionProblemsUsing Scripts Missing data and data analysisMissing data is a problem in multivariate data Tools for evaluating missing dataSPSS has a specific package for evaluating missing Key issues in missing data analysisWe will focus on three key issues Problem 11. Based on a missing data analysis for the variables Identifying the number of cases in the data setThis problem wants to Request frequency distributionsWe will use the output for frequency distributions to find Completing the specification for frequenciesSecond, click on the OK button to complete Number of missing cases for each variableIn the table of statistics at Problem 22. Based on a missing data analysis for the variables Create a variable that counts missing dataWe want to know how many Enter specifications for new variableThird, click on the up arrow button to Enter specifications for new variableThe NMISS function is moved into the Numeric Enter specifications for new variableFirst, before we add another variable to the Complete specifications for new variableContinue adding variables to function until all of The nmiss variable in the data editorIf we scroll the worksheet to A frequency distribution for nmissTo answer the question of how many cases Completing the specification for frequenciesSecond, click on the OK button to complete The frequency distributionSPSS produces a frequency distribution for the nmiss variable.170 cases Answering the problemThe problem asked whether or not 14 cases had missing Problem 33. Based on a missing data analysis for the variables Compute valid/missing dichotomous variablesTo evaluate the pattern of missing data, we need Enter specifications for new variableFirst, move the first variable in the analysis, Enter specifications for new variableNext, type the label for the new variable Enter specifications for new variableTo specify the values for the new variable, Change the value for missing dataThe dichotomous variable should be coded 1 Change the value for valid dataFirst, mark the All other values option Complete the value specificationsHaving entered the values for recoding the variable into Complete the recode specificationsHaving entered specifications for the new variable and the The dichotomous variableThe procedure for creating a dichotomous valid/missing variable is repeated Filtering cases with excessive missing variablesTo filter cases included in further analysis, Enter specifications for selecting casesSecond, click on the If… button to enter Enter specifications for selecting casesSecond, click on the Continue button to complete Complete the specifications for selecting casesTo complete the specifications, click on the OK button. Cases excluded from further analysesSPSS marks the cases that will not be Correlating the dichotomous variablesTo compute a correlation matrix for the dichotomous variables, Specifications for correlationsSecond, click on the OK button to complete the request.First, The correlation matrixThe correlation matrix is symmetric along the diagonal (shown by The correlation matrixThe correlations marked with footnote a could not be computed The correlation matrixIn the cells for which the correlation could be computed, Using scriptsThe process of evaluating missing data requires numerous SPSS procedures and Using a script for missing dataThe script “MissingDataCheck.sbs” will produce all of Open the data set in SPSSBefore using a script, a data set Invoke the scriptTo invoke the script, select the Run Script… command in the Utilities menu. Select the missing data scriptFirst, navigate to the folder where you put The script dialogThe script dialog box acts similarly to SPSS dialog boxes. Complete the specificationsSelect the variables for the analysis. This analysis uses the The script finishesIf you SPSS output viewer is open, you will see Output from the scriptThe script will produce lots of output. Additional descriptive
Слайды презентации

Слайд 2 Missing data and data analysis
Missing data is a

Missing data and data analysisMissing data is a problem in multivariate

problem in multivariate data because a case will be

excluded from the analysis if it is missing data for any variable included in the analysis.

If our sample is large, we may be able to allow cases to be excluded.

If our sample is small, we will try to use a substitution method so that we can retain enough cases to have sufficient power to detect effects.

In either case, we need to make certain that we understand the potential impact that missing data may have on our analysis.

Слайд 3 Tools for evaluating missing data
SPSS has a specific

Tools for evaluating missing dataSPSS has a specific package for evaluating

package for evaluating missing data, but it is included

under the UT license.

In place of this package, we will first examine missing data using SPSS statistics and procedures.

After studying the standard SPSS procedures that we can use to examine missing data, we will use an SPSS script that will produce the output needed for missing data analysis without requiring us to issue all of the SPSS commands individually.


Слайд 4 Key issues in missing data analysis
We will focus

Key issues in missing data analysisWe will focus on three key

on three key issues for evaluating missing data:
The number

of cases missing per variable
The number of variables missing per case
The pattern of correlations among variables created to represent missing and valid data.

Further analysis may be required depending on the problems identified in these analyses.

Слайд 5 Problem 1
1. Based on a missing data analysis

Problem 11. Based on a missing data analysis for the variables

for the variables "employment status," "number of hours worked

in the past week," "self employment," "governmental employment," and "occupational prestige score" in the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?

The variables "number of hours worked in the past week" and "employment status" are missing data for more than half of the cases in the data set and should be examined carefully before deciding how to handle missing data.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic

Слайд 6 Identifying the number of cases in the data

Identifying the number of cases in the data setThis problem wants

set
This problem wants to know if a variable is

missing data for more than half the cases.

Our first task is to identify the number of cases that meets that criterion.

If we scroll to the bottom of the data set, we see than there are 270 cases in the data set.

270 ÷ 2 = 135.

If any variable included in the analysis has more than 135 missing cases, the answer to the problem will be true.


Слайд 7 Request frequency distributions
We will use the output for

Request frequency distributionsWe will use the output for frequency distributions to

frequency distributions to find the number of missing cases

for each variable.

Select the Frequencies… | Descriptive Statistics command from the Analyze menu.


Слайд 8 Completing the specification for frequencies
Second, click on the

Completing the specification for frequenciesSecond, click on the OK button to

OK button to complete the request for statistical output.
First,

move the five variables included in the problem statement to the list box for variables.

Слайд 9 Number of missing cases for each variable
In the

Number of missing cases for each variableIn the table of statistics

table of statistics at the top of the Frequencies

output, there is a table detailing the number of missing cases for each variable in the analysis.

None of the variables has more than 135 missing cases, although number of hours worked in the past week comes close.

The answer to the question is false.


Слайд 10 Problem 2
2. Based on a missing data analysis

Problem 22. Based on a missing data analysis for the variables

for the variables "employment status," "number of hours worked

in the past week," "self employment," "governmental employment," and "occupational prestige score" in the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?

14 cases are missing data for more than half of the variables in the analysis and should be examined carefully before deciding how to handle missing data.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic

Слайд 11 Create a variable that counts missing data
We want

Create a variable that counts missing dataWe want to know how

to know how many of the five variables in

the analysis had missing data for each case in the data set.

We will create a variable containing this information that uses an SPSS function to count the number of variables with missing data.

To compute a new variable, select the Compute… command from the Transform menu.


Слайд 12 Enter specifications for new variable
Third, click on the

Enter specifications for new variableThird, click on the up arrow button

up arrow button to move the NMISS function into

the Numeric Expression text box.

First, type in the name for the new variable nmiss in the Target variable text box.

Second, scroll down the list of functions and highlight the NMISS function.


Слайд 13 Enter specifications for new variable
The NMISS function is

Enter specifications for new variableThe NMISS function is moved into the

moved into the Numeric Expression text box.
Second, click on

the right arrow button to move the variable name into the function arguments.

To add the list of variables to count missing data for, we first highlight the first variable to include in the function, wrkstat.


Слайд 14 Enter specifications for new variable
First, before we add

Enter specifications for new variableFirst, before we add another variable to

another variable to the function, we type a comma

to separate the names of the variables.

Third, click on the right arrow button to move the variable name into the function arguments.

Second, to add the next variable we highlight the second variable to include in the function, hrs1.


Слайд 15 Complete specifications for new variable
Continue adding variables to

Complete specifications for new variableContinue adding variables to function until all

function until all of the variables specified in the

problem have been added.

Be sure to type a comma between the variable names.

When all of the variables have been added to the function, click on the OK button to complete the specifications.


Слайд 16 The nmiss variable in the data editor
If we

The nmiss variable in the data editorIf we scroll the worksheet

scroll the worksheet to the right, we see the

new variable that SPSS has just computed for us.

Слайд 17 A frequency distribution for nmiss
To answer the question

A frequency distribution for nmissTo answer the question of how many

of how many cases had each of the possible

numbers of missing value, we create a frequency distribution.

Select the Frequencies… | Descriptive Statistics command from the Analyze menu.


Слайд 18 Completing the specification for frequencies
Second, click on the

Completing the specification for frequenciesSecond, click on the OK button to

OK button to complete the request for statistical output.
First,

move the nmiss variable to the list of variables.

Слайд 19 The frequency distribution
SPSS produces a frequency distribution for

The frequency distributionSPSS produces a frequency distribution for the nmiss variable.170

the nmiss variable.

170 cases had valid, non-missing values for

all 5 variables. 85 cases had one missing value; 1 case had 2 missing values; and 14 cases had missing values for 4 variables.

Слайд 20 Answering the problem
The problem asked whether or not

Answering the problemThe problem asked whether or not 14 cases had

14 cases had missing data for more than half

the variables. For a set of five variables, cases that had 3, 4, or 5 missing values would meet this requirement.

The number of cases with 3, 4, or 5 missing values is 14.

The answer to the problem is true.

Слайд 21 Problem 3
3. Based on a missing data analysis

Problem 33. Based on a missing data analysis for the variables

for the variables "employment status," "number of hours worked

in the past week," "self employment," "governmental employment," and "occupational prestige score" in the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance.

After excluding cases with missing data for more than half of the variables from the analysis if necessary, the presence of statistically significant correlations in the matrix of dichotomous missing/valid variables suggests that the missing data pattern may not be random.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic

Слайд 22 Compute valid/missing dichotomous variables
To evaluate the pattern of

Compute valid/missing dichotomous variablesTo evaluate the pattern of missing data, we

missing data, we need to compute dichotomous valid/missing variables

for each of the five variables included in the analysis.

We will compute the new variable using the Recode command.

To create the new variable, select the Recode | Into Different Variables…
from the Transform menu.


Слайд 23 Enter specifications for new variable
First, move the first

Enter specifications for new variableFirst, move the first variable in the

variable in the analysis, wrkstat, into the Numeric Variable

-> Output Variable text box.

Second, type the name for the new variable into the Name text box. My convention is to add an underscore character to the end of the variable name.

If this would make the variable more than 8 characters long, delete characters from the end of the original variable name.


Слайд 24 Enter specifications for new variable
Next, type the label

Enter specifications for new variableNext, type the label for the new

for the new variable into the Label text box.

My convention is to add the phrase (Valid/Missing) to the end of the variable label for the original variable.

Finally, click on the Change button to add the name of the dichotomous variable to the Numeric Variable -> Output Variable text box.


Слайд 25 Enter specifications for new variable
To specify the values

Enter specifications for new variableTo specify the values for the new

for the new variable, click on the Old and

New Values… button.

Слайд 26 Change the value for missing data
The dichotomous variable

Change the value for missing dataThe dichotomous variable should be coded

should be coded 1 if the variable has a

valid value, 0 if the variable has a missing value.

First, mark the System- or user-missing option button.

Second, type 0 in the Value text box.

Third, click on the Add button to include this change in the list of Old->New list box.


Слайд 27 Change the value for valid data
First, mark the

Change the value for valid dataFirst, mark the All other values

All other values option button.
Second, type 1 in the

Value text box.

Third, click on the Add button to include this change in the list of Old->New list box.


Слайд 28 Complete the value specifications
Having entered the values for

Complete the value specificationsHaving entered the values for recoding the variable

recoding the variable into dichotomous values, we click on

the Continue button to complete this dialog box.

Слайд 29 Complete the recode specifications
Having entered specifications for the

Complete the recode specificationsHaving entered specifications for the new variable and

new variable and the values for recoding the variable

into dichotomous values, we click on the OK button to produce the new variable.

Слайд 30 The dichotomous variable
The procedure for creating a dichotomous

The dichotomous variableThe procedure for creating a dichotomous valid/missing variable is

valid/missing variable is repeated for the four other variables

in the analysis: hrs1, wrkslf, wrkgovt, and prestg80.

Слайд 31 Filtering cases with excessive missing variables
To filter cases

Filtering cases with excessive missing variablesTo filter cases included in further

included in further analysis, we choose the Select Cases…

command from the Data menu.

The problem calls for us to exclude cases that have missing data for more than half of the variables.

We do this by selecting in, or filtering, cases that have fewer than half missing variables, i.e. less than 3 missing variables.


Слайд 32 Enter specifications for selecting cases
Second, click on the

Enter specifications for selecting casesSecond, click on the If… button to

If… button to enter the criteria for including cases.
First,

click on the If condition is satisfied option button on the Select panel.

Слайд 33 Enter specifications for selecting cases
Second, click on the

Enter specifications for selecting casesSecond, click on the Continue button to

Continue button to complete the If specification.
First, enter the

criteria for including cases:

nmiss < 3

Слайд 34 Complete the specifications for selecting cases
To complete the

Complete the specifications for selecting casesTo complete the specifications, click on the OK button.

specifications, click on the OK button.


Слайд 35 Cases excluded from further analyses
SPSS marks the cases

Cases excluded from further analysesSPSS marks the cases that will not

that will not be included in further analyses by

drawing a slash mark through the case number.

We can verify that the selection is working correctly by noting that the case which is omitted had 4 missing variables.

Слайд 36 Correlating the dichotomous variables
To compute a correlation matrix

Correlating the dichotomous variablesTo compute a correlation matrix for the dichotomous

for the dichotomous variables, select the Correlate command from

the Analyze menu.

Слайд 37 Specifications for correlations
Second, click on the OK button

Specifications for correlationsSecond, click on the OK button to complete the

to complete the request.
First, move the dichotomous variables to

the variables list box.

Слайд 38 The correlation matrix
The correlation matrix is symmetric along

The correlation matrixThe correlation matrix is symmetric along the diagonal (shown

the diagonal (shown by the blue line). The correlation

for any pair of variables is included twice in the table. So we only count the correlations below the diagonal (the cells with the yellow background).

Слайд 39 The correlation matrix
The correlations marked with footnote a

The correlation matrixThe correlations marked with footnote a could not be

could not be computed because one of the variables

was a constant, i.e. the dichotomous variable has the same value for all cases.

This happens when one of the valid/missing variables has no missing cases, so that all of the cases have a value of 1 and none have a value of 0.

Слайд 40 The correlation matrix
In the cells for which the

The correlation matrixIn the cells for which the correlation could be

correlation could be computed, the probabilities indicating significance are

0.437, 0.501, and 0.877.

None of the correlations are statistically significant. The answer to the question is false. We do not need to be concerned about a missing data problem for this set of variables.

Слайд 41 Using scripts
The process of evaluating missing data requires

Using scriptsThe process of evaluating missing data requires numerous SPSS procedures

numerous SPSS procedures and outputs that are time consuming

to produce.

These procedures can be automated by creating an SPSS script. A script is a program that executes a sequence of SPSS commands.

Thought writing scripts is not part of this course, we can take advantage of scripts that I use to reduce the burdensome tasks of evaluating missing data.

Слайд 42 Using a script for missing data
The script “MissingDataCheck.sbs”

Using a script for missing dataThe script “MissingDataCheck.sbs” will produce all

will produce all of the output we have used

for evaluating missing data, as well as other outputs described in the textbook.

Navigate to the link “SPSS Scripts and Syntax” on the course web page.

Download the script file “MissingDataCheck.exe” to your computer and install it, following the directions on the web page.

Слайд 43 Open the data set in SPSS
Before using a

Open the data set in SPSSBefore using a script, a data

script, a data set should be open in the

SPSS data editor.

Слайд 44 Invoke the script
To invoke the script, select the

Invoke the scriptTo invoke the script, select the Run Script… command in the Utilities menu.

Run Script… command in the Utilities menu.


Слайд 45 Select the missing data script
First, navigate to the

Select the missing data scriptFirst, navigate to the folder where you

folder where you put the script. If you followed

the directions, you will have a file with an ".SBS" extension in the C:\SW388R7 folder.

If you only see a file with an “.EXE” extension in the folder, you should double click on that file to extract the script file to the C:\SW388R7 folder.

Third, click on Run button to start the script.

Second, click on the script name to highlight it.


Слайд 46 The script dialog
The script dialog box acts similarly

The script dialogThe script dialog box acts similarly to SPSS dialog

to SPSS dialog boxes. You select the variables to

include in the analysis and choose options for the output.

Слайд 47 Complete the specifications
Select the variables for the analysis.

Complete the specificationsSelect the variables for the analysis. This analysis uses

This analysis uses the variables for the example on

page 56 in the textbook.

Click on the OK button to produce the output.

The checkboxes are marked to produce the output we need for our problems. The only additional option is to compute the t-tests and chi-square tests for all of the variables.


Слайд 48 The script finishes
If you SPSS output viewer is

The script finishesIf you SPSS output viewer is open, you will

open, you will see the output produced in that

window.

Since it may take a while to produce the output, and since there are times when it appears that nothing is happening, there is an alert to tell you when the script is finished.

Unless you are absolutely sure something has gone wrong, let the script run until you see this alert.

When you see this alert, click on the OK button.


  • Имя файла: analyzing-missing-data.pptx
  • Количество просмотров: 143
  • Количество скачиваний: 0