What statistical tool is used to find the significant difference of two or more variables?

It is well worth spending a little time considering how you will analyse your data before you design your survey instrument or start to collect any data. This will ensure that data are collected – and, more importantly, coded – in an appropriate way for the analysis you hope to do.

By Claire Creaser

Fundamentals

Start to think about the techniques you will use for your analysis before you collect any data.

What do you want to know?

The analysis must relate to the research questions, and this may dictate the techniques you should use.

What type of data do you have?

The type of data you have is also fundamental – the techniques and tools appropriate to interval and ratio variables are not suitable for categorical or ordinal measures. [See How to collect data for notes on types of data]

What assumptions can – and can’t – you make?

Many techniques rely on the sampling distribution of the test statistic being a Normal distribution [see below]. This is always the case when the underlying distribution of the data is Normal, but in practice, the data may not be Normally distributed. For example, there could be a long tail of responses to one side or the other [skewed data]. Non-parametric techniques are available to use in such situations, but these are inevitably less powerful and less flexible. However, if the sample size is sufficiently large, the Central Limit Theorem allows use of the standard analyses and tools.

Techniques for a non-Normal distribution

Parametric or non-parametric statistics?

Parametric methods and statistics rely on a set of assumptions about the underlying distribution to give valid results. In general, they require the variables to have a Normal distribution.

Non-parametric techniques must be used for categorical and ordinal data, but for interval & ratio data they are generally less powerful and less flexible, and should only be used where the standard, parametric, test is not appropriate – e.g. when the sample size is small [below 30 observations].

Central limit theorem

As the sample size increases, the shape of the sampling distribution of the test statistic tends to become Normal, even if the distribution of the variable which is being tested is not Normal.

In practice, this can be applied to test statistics calculated from more than 30 observations.

How much can you expect to get out of your data?

The smaller the sample size, the less you can get out of your data. Standard error is inversely related to sample size, so the larger your sample, the smaller the standard error, and the greater chance you will have of identifying statistically significant results in your analysis.

Basic techniques

In general, any technique which can be used on categorical data may also be used on ordinal data. Any technique which can be used on ordinal data may also be used on ratio or interval data. The reverse is not the case.

Describing your data

The first stage in any analysis should be to describe your data, and the hence the population from which it is drawn. The statistics appropriate for this activity fall into three broad groups, and depend on the type of data you have.

What do you want to do?With what type of data? Appropriate techniques

Look at the distribution	Categorical / Ordinal	Plot the percentage in each category [column or bar chart]
	Ratio / Interval	Histogram Cumulative frequency diagram
Describe the central tendency	Categorical	n/a
	Ordinal	Median Mode
	Ratio / Interval	Mean Median
Describe the spread	Categorical	n/a
	Ordinal	Range Inter-quartile range
	Ratio / Interval	Range Inter-quartile range Variance Standard variation

See Graphical presentation for descriptions of the main graphical techniques.

Mean – the arithmetic average, calculated by summing all the values and dividing by the number of values in the sum.

Median – the mid point of the distribution, where half the values are higher and half lower.

Mode – the most frequently occurring value.

Range – the difference between the highest and lowest value.

Inter-quartile range – the difference between the upper quartile [the value where 25 per cent of the observations are higher and 75 per cent lower] and the lower quartile [the value where 75 per cent of the observations are higher and 25 per cent lower]. This is particularly useful where there are a small number of extreme observations much higher, or lower, than the majority.

Variance – a measure of spread, calculated as the mean of the squared differences of the observations from their mean.

Standard deviation – the square root of the variance.

Differences between groups and variables

Chi-squared test – used to compare the distributions of two or more sets of categorical or ordinal data.

t-tests – used to compare the means of two sets of data.

Wilcoxon U test – non-parametric equivalent of the t-test. Based on the rank order of the data, it may also be used to compare medians.

ANOVA – analysis of variance, to compare the means of more than two groups of data.

What do you want to do?With what type of data?Appropriate techniques

Compare two groups	Categorical	Chi-squared test
	Ordinal	Chi-squared test Wicoxon U test
	Ratio / Interval	t-test for independent samples
Compare more than two groups	Categorical / Ordinal	Chi-squared test
	Ratio / Interval	ANOVA
Compare two variables over the same subjects	Categorical / Ordinal	Chi-squared test
	Ratio / Interval	t-test for dependent samples

Relationships between variables

The correlation coefficient measures the degree of linear association between two variables, with a value in the range +1 to -1. Positive values indicate that the two variables increase and decrease together; negative values that one increases as the other decreases. A correlation coefficient of zero indicates no linear relationship between the two variables. The Spearman rank correlation is the non-parametric equivalent of the Pearson correlation.

What type of data? Appropriate techniques

Categorical	Chi-squared test
Ordinal	Chi-squared test Spearman rank correlation [Tau]
Ratio / Interval	Pearson correlation [Rho]

Note that correlation analyses will only detect linear relationships between two variables. The figure below illustrates two small data sets where there are clearly relationships between the two variables. However, the correlation for the second data set, where the relationship is not linear, is 0.0. A simple correlation analysis of these data would suggest no relationship between the measures, when that is clearly not the case. This illustrates the importance of undertaking a series of basic descriptive analyses before embarking on analyses of the differences and relationships between variables.

Testing validity

Significance levels

The statistical significance of a test is a measure of probability - the probability that you would have obtained that particular result of the test on that sample if the null hypothesis [that there is no effect due to the parameters being tested] you are testing was true. The example below tests whether scores in an exam change after candidates have received training. The hypothesis suggests that they should, so the null hyopothesis is that they won't.

In general, any level of probability above 5 per cent [p>0.05] is not considered to be statistically significant, and for large surveys 1 per cent [p>0.01] is often taken as a more appropriate level.

Note that statistical significance does not mean that the results you have obtained actually have value in the context of your research. If you have a large enough sample, a very small difference between groups can be identified as statistically significant, but such a small difference may be irrelevant in practice. On the other hand, an apparently large difference may not be statistically significant in a small sample, due to the variation within the groups being compared.

Degrees of freedom

Some test statistics [e.g. chi-squared] require the number of degrees of freedom to be known, in order to test for statistical significance against the correct probability table. In brief, the degrees of freedom is the number of values which can be assigned arbitrarily within the sample.

For example:

In a sample of size n divided into k classes, there are k-1 degrees of freedom [the first k-1 groups could be of any size up to n, while the last is fixed by the total of the first k-1 and the value of n. In numerical terms, if a sample of 500 individuals is taken from the UK, and it is observed that 300 are from England, 100 from Scotland and 50 from Wales, then there must be 50 from Northern Ireland. Given the numbers from the first three groups, there is no flexibility in the size of the final group. Dividing the sample into four groups gives three degrees of freedom.

In a two-way contingency table with p rows and q columns, there are [p-1]*[q-1] degrees of freedom [given the values of the first rows and columns, the last row and column are constrained by the totals in the table]

One-tail or two-tail tests

If, as is generally the case, what matters is simply that the statistics for the populations are different, then it is appropriate to use the critical values for a two-tailed test.

If, however, you are only interested to find out if the statistic for population A has a larger value than that for population B, then a one-tailed test would be appropriate. The critical value for a one-tailed test is generally lower than for a two-tailed test, and should only be used if your research hypothesis is that population A has a greater value than population B, and it does not matter how different they are if population A has a value that is less than that for population B.

For example

Scenario 1

Null hypothesis – there is no difference in mean exam scores before and after training [i.e. training has no effect on the exam score]
Alternative – there is a difference in the mean scores before and after training [i.e. training has an unspecified effect]
Use a two-tail test

Scenario 2

Null hypothesis – Training does not increase the mean score
Alternative – Mean score increases after training
Use a one-tail test, if there is an observed increase in mean score.
[If there is an observed fall in scores, there is no need to test, as you cannot reject the null hypothesis.]

Scenario 3

Null hypothesis – Training does not cause mean scores to fall
Alternative – Mean score falls after training
Use a one-tail test, if there is an observed fall in mean score.
[If there is an observed increase in scores, there is no need to test, as you cannot reject the null hypothesis.]

t-Test: Paired Two Sample for Means

	Before	After
Mean	360.4	361.1
Variance	46,547	46,830
Observations	62	62
Degrees of freedom [df]	61
t Stat	1.79
P[T Bài Viết Liên Quan Phần mềm test tốc độ xử lý đồ họa In 1997, vernon baker was awarded the medal of honor. how was this award historically significant? How would you test the trust relationship between server and domain? According to the concept of general intelligence, a persons intelligence score actually represents The purpose of the Continental Association created at the First Continental Congress was to Which pulmonary function test provides a more sensitive index of obstruction in smaller airways? What are the benefits of studying society? Data are the facts and figures collected analyzed and summarized for presentation and interpretation Nguyên nhân chập bàn phím laptop ELSA Speak có miễn phí không The main reason for conducting a diagnostic assessment at the beginning of the school year is to: Conductive vs sensorineural hearing loss tuning fork At Kohlbergs preconventional level of moral development someone is considered moral when he Quizlet In which area would the nurse place the tuning fork to perform the Weber test on a patient? Giá trị bình thường của papp-a drivers test là gì - Nghĩa của từ drivers test Development and validation of brief measures of positive and negative affect the panas scales What is an example of agoraphobia? What impact will there be on sample size when there is an increase in the number of sampling units in the population? What is the best indicator of kidney function? Toplist mới #1 Top 9 review kem chống nắng cho bà bầu 2023 5 tháng trước #2 Top 5 tiếng anh lớp 2 unit 7 trang 46 2023 5 tháng trước #3 Top 10 tải: mẫu the nhân viên trên excel 2023 5 tháng trước #4 Top 7 tuyển dụng nhân viên chốt đơn tại nhà 2023 5 tháng trước #5 Top 7 mẫu nhà 2 tầng chữ l 100m2 mái bằng 2023 5 tháng trước #6 Top 4 truyện ngắn về quê hương lớp 2 2023 5 tháng trước #7 Top 6 sơ đồ bộ máy nhà nước thời hồ 2023 5 tháng trước #8 Top 8 trước việc nhật đảo chính pháp, đảng ta có chủ trương gì mới 2023 5 tháng trước #9 Top 7 dân số đông đã đem đến cho nước ta 2023 5 tháng trước Bài mới nhất Companies act no 07 of 2007 là gì năm 2024 Tuổi nhâm thân xây nhà hợp hướng nào năm 2024 Can mua nền mặt tiền đường phạm văn nhờ năm 2024 What have you been up to có nghĩa là gì năm 2024 Hóa học vô cơ nâng cao hoàng nhâm pdf năm 2024 Bài văn khấn chuộc nhà khi mượn tuổi năm 2024 Can thiệp mạch vành qua da là gì năm 2024 Ngày 18 tháng 5 năm 2023 là ngày gì năm 2024 Bài 2 trang 101 sgk ngữ văn 10 tập 1 năm 2024 Bài tập unit 9 lớp 6 mới nhất năm 2024 Chủ Đề Toplist programming Địa Điểm Hay Hỏi Đáp Là gì Mẹo Hay Nghĩa của từ Học Tốt Công Nghệ Khỏe Đẹp bao nhiêu mẹo hay Top List Bao nhiêu Sản phẩm tốt Bài Tập Xây Đựng Ngôn ngữ đánh giá Tiếng anh Bài tập So Sánh Ở đâu So sánh Hướng dẫn Dịch bao nhieu Tại sao Đại học hướng dẫn Máy tính Thế nào Vì sao Bao lâu Khoa Học Hà Nội About Contact Terms © MarketingBlog