What statistical tool is used to find the significant difference of two or more variables?

It is well worth spending a little time considering how you will analyse your data before you design your survey instrument or start to collect any data. This will ensure that data are collected – and, more importantly, coded – in an appropriate way for the analysis you hope to do.

By Claire Creaser

Fundamentals

Start to think about the techniques you will use for your analysis before you collect any data.

What do you want to know?

The analysis must relate to the research questions, and this may dictate the techniques you should use.

What type of data do you have?

The type of data you have is also fundamental – the techniques and tools appropriate to interval and ratio variables are not suitable for categorical or ordinal measures. [See How to collect data for notes on types of data]

What assumptions can – and can’t – you make?

Many techniques rely on the sampling distribution of the test statistic being a Normal distribution [see below]. This is always the case when the underlying distribution of the data is Normal, but in practice, the data may not be Normally distributed. For example, there could be a long tail of responses to one side or the other [skewed data]. Non-parametric techniques are available to use in such situations, but these are inevitably less powerful and less flexible. However, if the sample size is sufficiently large, the Central Limit Theorem allows use of the standard analyses and tools.

Techniques for a non-Normal distribution

Parametric or non-parametric statistics?

Parametric methods and statistics rely on a set of assumptions about the underlying distribution to give valid results. In general, they require the variables to have a Normal distribution.

Non-parametric techniques must be used for categorical and ordinal data, but for interval & ratio data they are generally less powerful and less flexible, and should only be used where the standard, parametric, test is not appropriate – e.g. when the sample size is small [below 30 observations].

Central limit theorem

As the sample size increases, the shape of the sampling distribution of the test statistic tends to become Normal, even if the distribution of the variable which is being tested is not Normal.

In practice, this can be applied to test statistics calculated from more than 30 observations.

How much can you expect to get out of your data?

The smaller the sample size, the less you can get out of your data. Standard error is inversely related to sample size, so the larger your sample, the smaller the standard error, and the greater chance you will have of identifying statistically significant results in your analysis.

Basic techniques

In general, any technique which can be used on categorical data may also be used on ordinal data. Any technique which can be used on ordinal data may also be used on ratio or interval data. The reverse is not the case.

Describing your data

The first stage in any analysis should be to describe your data, and the hence the population from which it is drawn. The statistics appropriate for this activity fall into three broad groups, and depend on the type of data you have.

What do you want to do?With what type of data? Appropriate techniques
Look at the distribution Categorical / Ordinal Plot the percentage
in each category
[column or bar chart]
  Ratio / Interval Histogram
Cumulative frequency
diagram
Describe the
central tendency
Categorical n/a
  Ordinal Median
Mode
  Ratio / Interval Mean
Median
Describe the spread Categorical n/a
  Ordinal Range
Inter-quartile range
  Ratio / Interval Range
Inter-quartile range
Variance
Standard variation

See Graphical presentation for descriptions of the main graphical techniques.

Mean – the arithmetic average, calculated by summing all the values and dividing by the number of values in the sum.

Median – the mid point of the distribution, where half the values are higher and half lower.

Mode – the most frequently occurring value.

Range – the difference between the highest and lowest value.

Inter-quartile range – the difference between the upper quartile [the value where 25 per cent of the observations are higher and 75 per cent lower] and the lower quartile [the value where 75 per cent of the observations are higher and 25 per cent lower]. This is particularly useful where there are a small number of extreme observations much higher, or lower, than the majority.

Variance – a measure of spread, calculated as the mean of the squared differences of the observations from their mean.

Standard deviation – the square root of the variance.

Differences between groups and variables

Chi-squared test – used to compare the distributions of two or more sets of categorical or ordinal data.

t-tests – used to compare the means of two sets of data.

Wilcoxon U test – non-parametric equivalent of the t-test. Based on the rank order of the data, it may also be used to compare medians.

ANOVA – analysis of variance, to compare the means of more than two groups of data.

What do you want to do?With what type of data?Appropriate techniques
Compare two groups Categorical Chi-squared test
  Ordinal Chi-squared test
Wicoxon U test
  Ratio / Interval t-test for
independent samples
Compare more than two groups Categorical / Ordinal Chi-squared test
  Ratio / Interval ANOVA
Compare two variables
over the same subjects
Categorical / Ordinal Chi-squared test
  Ratio / Interval t-test for
dependent samples

Relationships between variables

The correlation coefficient measures the degree of linear association between two variables, with a value in the range +1 to -1. Positive values indicate that the two variables increase and decrease together; negative values that one increases as the other decreases. A correlation coefficient of zero indicates no linear relationship between the two variables. The Spearman rank correlation is the non-parametric equivalent of the Pearson correlation.

What type of data? Appropriate techniques
Categorical Chi-squared test
Ordinal Chi-squared test
Spearman rank
correlation [Tau]
Ratio / Interval Pearson
correlation [Rho]

Note that correlation analyses will only detect linear relationships between two variables. The figure below illustrates two small data sets where there are clearly relationships between the two variables. However, the correlation for the second data set, where the relationship is not linear, is 0.0. A simple correlation analysis of these data would suggest no relationship between the measures, when that is clearly not the case. This illustrates the importance of undertaking a series of basic descriptive analyses before embarking on analyses of the differences and relationships between variables.

Testing validity

Significance levels

The statistical significance of a test is a measure of probability - the probability that you would have obtained that particular result of the test on that sample if the null hypothesis [that there is no effect due to the parameters being tested] you are testing was true. The example below tests whether scores in an exam change after candidates have received training. The hypothesis suggests that they should, so the null hyopothesis is that they won't.

In general, any level of probability above 5 per cent [p>0.05] is not considered to be statistically significant, and for large surveys 1 per cent [p>0.01] is often taken as a more appropriate level.

Note that statistical significance does not mean that the results you have obtained actually have value in the context of your research. If you have a large enough sample, a very small difference between groups can be identified as statistically significant, but such a small difference may be irrelevant in practice. On the other hand, an apparently large difference may not be statistically significant in a small sample, due to the variation within the groups being compared.

Degrees of freedom

Some test statistics [e.g. chi-squared] require the number of degrees of freedom to be known, in order to test for statistical significance against the correct probability table. In brief, the degrees of freedom is the number of values which can be assigned arbitrarily within the sample.

For example:

In a sample of size n divided into k classes, there are k-1 degrees of freedom [the first k-1 groups could be of any size up to n, while the last is fixed by the total of the first k-1 and the value of n. In numerical terms, if a sample of 500 individuals is taken from the UK, and it is observed that 300 are from England, 100 from Scotland and 50 from Wales, then there must be 50 from Northern Ireland. Given the numbers from the first three groups, there is no flexibility in the size of the final group. Dividing the sample into four groups gives three degrees of freedom.

In a two-way contingency table with p rows and q columns, there are [p-1]*[q-1] degrees of freedom [given the values of the first rows and columns, the last row and column are constrained by the totals in the table]

One-tail or two-tail tests

If, as is generally the case, what matters is simply that the statistics for the populations are different, then it is appropriate to use the critical values for a two-tailed test.

If, however, you are only interested to find out if the statistic for population A has a larger value than that for population B, then a one-tailed test would be appropriate. The critical value for a one-tailed test is generally lower than for a two-tailed test, and should only be used if your research hypothesis is that population A has a greater value than population B, and it does not matter how different they are if population A has a value that is less than that for population B.

For example

Scenario 1

Null hypothesis – there is no difference in mean exam scores before and after training [i.e. training has no effect on the exam score]
Alternative – there is a difference in the mean scores before and after training [i.e. training has an unspecified effect]
Use a two-tail test

Scenario 2

Null hypothesis – Training does not increase the mean score
Alternative – Mean score increases after training
Use a one-tail test, if there is an observed increase in mean score.
[If there is an observed fall in scores, there is no need to test, as you cannot reject the null hypothesis.]

Scenario 3

Null hypothesis – Training does not cause mean scores to fall
Alternative – Mean score falls after training
Use a one-tail test, if there is an observed fall in mean score.
[If there is an observed increase in scores, there is no need to test, as you cannot reject the null hypothesis.]

t-Test: Paired Two Sample for Means
  Before After 
Mean

360.4

361.1

Variance

46,547

46,830

Observations

62

62

Degrees of freedom [df]

61

 
t Stat

1.79

 
P[T

Chủ Đề