Which explanation below best describes how to perform the two-sample t-test

The two-sample t-test allows one to test the null hypothesis that the means of two groups are equal. The resulting design matrix consists of three columns: the first two encode the group membership of each scan and the third models a common constant across scans of both groups. This model is overdetermined by one degree of freedom, i.e. the sum of the first two regressors equals the third regressor. Notice the difference in parameterization compared to the earlier two-sample t-test example.

Nevertheless, the resulting t-value is the same for a differential contrast. Let the number of scans in the first and second groups be J1 and J2, where J = J1 + J2. The three regressors consists of ones and zeros, where the first regressor consist of J1 ones, followed by J2 zeroes. The second regressor consists of J1 zeroes, followed by J2 ones. The third regressor contains ones only.

Let the contrast vector be c = [–1, 1, 0]T, i.e. the alternative hypothesis is H¯:β1<β2. Then:

(XTX)=(J10J10J2J2J1J2J)

This matrix is rank deficient so we use the pseudo-inverse (XTX)− to compute the t-statistic. We sandwich (XTX)− with the contrast and get cT(XTX)−c = 1/J1 + 1/J2. The t-statistic is then given by:

8.26T=βˆ2−βˆ1σˆ2/(1/J1+1/J2)∼tJ−2

and σˆ2=ϒTRϒ/(J−2). We have assumed here that we have equal variance in both groups. This assumption may not be tenable (e.g. when comparing normal subjects with patients) and we may have to take this non-sphericity into account (see Chapter 10).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123725608500085

Methods You Might Meet, But Not Every Day

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Hotelling’s T2

A two-sample t test compares means on a single continuous measure between two groups. The T2 test of Harold Hotelling compares means of two or more continuous measures simultaneously for the two groups. For example, it has been suggested that because increased levels of vitamin D have been shown to reduce tooth loss, it may increase the rate of correction of pediatric scoliosis. An investigator measures the difference between left and right thoracic height, Cobb angle, kyphosis, and lordosis on two groups of children with scoliosis. One group was given large vitamin D doses and the other group was not. The investigator uses T2 to test the difference between the two groups on the basis of the four spinal measurements simultaneously. T2 will also test a single or paired sample against a hypothesis of zero, as a generalization of the paired t test.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123848642000287

Foundations

J.M. Curran, in Encyclopedia of Forensic Sciences (Second Edition), 2013

The Two-Sample t-Test

The two-sample t-test is often used to test the hypothesis that the control sample and the recovered sample come from distributions with the same mean and variance. The inference in this situation is that if the fragments do come from distributions with the same mean and variance, then they are indistinguishable and therefore may have a common source. This is often incorrectly interpreted as “the recovered fragments come from the crime scene.”

The two-sample t-test compares the difference in the sample means to the difference that one would expect by random variation, or chance, alone. The idea is to make a probability statement about the difference in the true, but unknown, means of the sources that the samples come from. If the means are the same, then one can say that “the recovered sample cannot be distinguished from the control scene.”

Formally, let the nc measurements on the control sample be denoted xi, i = 1, …, nc, and the nr measurements on the recovered sample be denoted yj,j = 1, …, nr. The control sample is assumed to have come from a normal distribution with mean μc and standard deviation σc. Similarly, the recovered sample is assumed to have come from a normal distribution with mean μr and standard deviation σr. This is expressed as xi ~ N(μc, σc) and yj ~ N(μr, σr). The traditional (pooled) two-sample t-test formally tests the null hypothesis that the distribution means are the same under the explicit assumption that σc = σr = σ (and therefore that the sample standard deviations are each an estimate of the common standard deviation σ):

H0:μc=μror equivalentlyH0:μc−μr=0

The alternative hypothesis is that the distribution means are different:

H1:μc≠μror equivalentlyH0:μc−μr≠0

To test the null hypothesis, a test statistic is compared to the distribution of values one would expect to observe if the null hypothesis is true. For the two-sample t-test, the test statistic is given by

T0=x¯−y¯1nc+1nr(nc−1)sc2+(nr−1)sr2nc+nr−2

where x¯,y¯,sc, and sr are the sample means and sample standard deviations of the control and recovered samples, respectively. The significance of the test is evaluated by comparing the observed value of T0 to the distribution of values one would observe if the null hypothesis is true, or the null distribution. For the two-sample t-test, this is Student's t-distribution and is parameterized by its degrees of freedom. The degrees of freedom, df = nc + nr−2, reflect the sample size, and in some sense, the amount of information that is available. The comparison of the observed test statistic to the null distribution is summarized by the P-value.

For the two-sample t-test this becomes

P=Pr(T≥T0|H0true)

The absolute value of the test statistic is used here because it makes no difference whether the recovered mean is smaller or larger than the control mean, merely the fact that it is different. It is important to note that the equal variance assumption can be relaxed. There are occasional circumstances where this is a sensible option. This version of the t-test is known as Welch's t-test. The formula for the test statistic has a different denominator, and the formula for the degrees of freedom is much more complicated, but bounded by min(nc, nr)−1 and nc + nr−2.

The pooled two-sample t-test can be illustrated using the glass example. The observed test statistic is

T0=x¯−y¯1nc+1nr(nc−1)sc2+(nr−1)sr2nc+nr−2=1.529123−1.52911916+16(6−1)(4.04×10−5)2+(6−1)(3.84×10−5)26+6−2=4×10−62.278×10−5=0.1756

H0, the P-value, is calculated using a t-distribution with nc + nr−2 = 6 + 6−2 = 10 degrees of freedom. This is easily done in Microsoft Excel using the TDIST function, or in R using the pt function. The resulting P-value is 0.86. This is a large P-value, and ‘on average one would expect a result like this approximately 86 times in 100 by random chance alone.’ That is, this result is extremely likely to have occurred by random chance alone, hence H0 cannot be rejected. Note that, unlike the range test, this procedure does not omit the smallest recovered RI value. This information is included in both the recovered mean and, more importantly, in the recovered standard deviation. The inclusion of this fragment will increase the recovered variability and make it (slightly) harder to reject the null hypothesis. Some practitioners are bothered by this and use range-like tests to exclude observations from the evidence evaluation. Such practice can lead to dangerously misleading conclusions if no account is taken of the omitted information.

It is not entirely necessary to calculate a P-value in this example because this test statistic can be interpreted as “the observed difference is approximately 0.18 standard deviations away from the mean when the null hypothesis is true. If the observed difference was more than 2 standard deviations away from the mean, then we would start to suspect that it was unlikely to have occurred by random chance alone. Given that 0.18 is much smaller than 2, we would intuit that the observed difference can be attributed to random variation.”

The two-sample t-test has a multivariate analog known as Hotelling's T2. This test has been used in forensic science, but it is relatively uncommon. It is more common to perform tests on each variable. This approach is subject to the multiple testing problems discussed earlier. Hotelling‘s T2 avoids such issues, and also takes into account the potential correlations between measurements. It does, however, have large sample size requirements, which traditionally have been problematic.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012382165200194X

Contrasts and Classical Inference

J. Poline, ... W. Penny, in Statistical Parametric Mapping, 2007

Three design matrices for a two-sample t-test

The (unpaired) two-sample t-test, comparing the mean of two groups, can be implemented in the linear model framework as follows. Consider an experiment with two groups of 2 (group 1) and 3 (group 2) subjects. In imaging experiments, these numbers will be larger (at least 10 or so). We have:

X=[1010010101]

then

PXY=[1/21/20001/21/2000001/31/31/3001/31/31/3001/31/31/3]Y=Xβ=[y¯1y¯1y¯2y¯2y¯2]

where γ¯iis the mean observation in group i. We will now describe two other parameterizations of the same model (such that the matrix PX is identical in all cases) and show how to specify meaningful contrasts.

The only intuitive case is the first parameterization. In the two other cases, the interpretation of the parameter estimates is not obvious and the contrasts are not intuitive. In case 3, parameters are not estimable and not all contrasts are meaningful. Estimable contrasts are orthogonal to [11 – 1], because column 1 plus column 2 equals column 3.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123725608500097

Cerebral Gray Matter Volumes in Cocaine Dependence

Chiang-Shan R. Li, in Neuropathology of Drug Addictions and Substance Misuse, 2016

Results of VBM: Differences in GMV Between CD and HC

The results of a two-sample t test with age and years of alcohol used as covariates showed decreased GMV in a number of cortical regions and the cerebellum in CD, compared with HC, at voxel p < 0.05 FWE (Figure 1(A)).

Which explanation below best describes how to perform the two-sample t-test

Which explanation below best describes how to perform the two-sample t-test

Figure 1. Voxel-based morphometry: analysis of covariance comparing cocaine-dependent (CD) and healthy control (HC) participants, with age and years of alcohol use as covariates. (A) The results were examined at voxel p < 0.05, corrected for familywise error (FWE) of multiple comparisons. Compared with HC, CD showed GMV loss in multiple cortical regions, including the temporal cortex, middle, and posterior cingulate cortex, superior frontal cortices, and the cerebellum. Conversely, no brain regions showed greater GMV in CD compared with HC at this threshold. (B) The same contrast examined at voxel p < 0.0001, and cluster p < 0.05, few-corrected. In addition to decreased GMV in the cortical regions and cerebellum, CD also showed increased GMV in the right ventral putamen, compared with HC (Table 1). Colored bars represent the voxel T value. Neurological orientation: R, right.

From Ide et al. (2014), with permission from Elsevier.

We derived the GMV for each of these regions of interest (ROIs) for individual subjects for gender comparison and correlation with years of cocaine use. Because there was a total of 12 ROIs, we used an alpha value of 0.05/12–0.004 to guard against type I error. The results of a group by gender analysis of variance (ANOVA) showed that there was not a group by gender interaction effect (p > 0.031) except for the precuneus, which showed a trend toward higher GM volume loss in men (p = 0.010). The results of linear regression showed that the GM volume of the left superior frontal gyrus (SFG, r  =  −0.307, p < 0.004), middle/posterior cingulate cortex (r  =  −0.460, p < 0.00002), and right SFG (r  =  −0.472, p < 0.000001) was inversely correlated with years of cocaine use. Furthermore, the slopes of regression were significantly different between men and women for the right SFG, with women showing a steeper decline in GMV loss with years of cocaine use (p < 0.016; Figure 2), but not for the left SPG or middle/posterior cingulate cortex (p > 0.05).

Which explanation below best describes how to perform the two-sample t-test

Figure 2. Gender difference in the correlation of GMV with years of cocaine use in the cocaine-dependent (CD) group, for the three regions of interest. Compared with men, women showed a steeper decline in GMV of the right SFG with the duration of use (p < 0.016, test of difference in slope; Zar, 1999).

From Ide et al. (2014), with permission from Elsevier.

At cluster p < 0.05, FWE, the results demonstrated most of the same brain regions with diminished GM volume in CD, compared with HC. In addition, the right ventral putamen (x = 24, y = 15, z = −11, Z = 4.47, 447 voxels, Figure 1(B)) showed increased GM volume in CD compared with HC. Table 1 summarizes the coordinates and voxel Z values of these brain regions.

Table 1. Differences in GMV Between Cocaine-Dependent (CD) and Healthy Control (HC) Participants (Peak Voxel p < 0.0001 and Cluster p < 0.05, FWE Corrected)

Cluster Size (No. of Voxels)p-Value Corr. (Cluster Level)Z-ValueMNI Coordinate (mm)Identified Brain RegionxyzCD > HC4470.00574.472416−11R putamenHC > CD95400.00006.30∗370−37R inferior temporal C25610.00006.14∗−2−2444L middle/postcingulate C10100.00016.11∗21−60−34R cerebellum18070.00006.07∗−27962L superior frontal G4.57−35−2361L precentral G10200.00015.35∗−367−43L temporal pole7010.00095.32∗28362R superior frontal G19370.00005.12∗9−6529R precuneus4.657−637R lingual G4.29−5−6826L cuneus5850.00205.12∗12−671R superior frontal G4590.00515.05∗49−210R insula/mid temporal C4140.00744.79−3325−26L posterior orbital G1870.02024.79−29−5053L postcentral S2090.04614.75−54−1141L postcentral G1880.02724.72−142164L superior frontal G3650.01114.6846−7710R middle temporal G2830.02274.62−39−39−35L cerebellum2150.04344.4025−4458R postcentral G

Note: All voxel peaks that are 8 mm apart are identified in the same cluster. ∗, indicates clusters that also contain voxels that are significant at voxel threshold p < 0.05, FWE corrected. R, right; L, left; C, cortex; G, gyrus; S, sulcus; mid, middle; post, posterior.

From Ide et al. (2014), with permission from Elsevier.

A group by gender ANOVA showed that the GMV increase in the ventral putamen did not differ between men and women (p = 0.974, interaction effect). The GMV of the ventral putamen did not show a significant correlation with years of cocaine use (r = −0.171, p = 0.121).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128002124000248

Acquisition Methods, Methods and Modeling

S.J. Kiebel, K. Mueller, in Brain Mapping, 2015

Two-Sample t-Test

As another example, we use the two-sample t-test again (see above) but in an overparameterized version. The resulting design matrix consists of three columns: the first two encode as above the group membership of each observation and the third models a common constant across all observations of both groups. Let the number of observations in the first and second groups be J1 and J2, where J = J1 + J2. The three regressors consist of ones and zeros, where the first regressor consists of J1 ones, followed by J2 zeros. The second regressor consists of J1 zeros, followed by J2 ones. The third regressor contains ones only. Let the contrast vector be c = [− 1, 1, 0]T, that is, the alternative hypothesis is ℋ¯:β1<β2. Then,

XTX=J10J10J2J2J1J2J

This model is overparameterized so we use the pseudoinverse (XTX)+ to compute the t-statistic. We sandwich (XTX)+ with the contrast and get cTXTX+c=1J1+1J2. The t-statistic is then given by

T=βˆ2−βˆ1σ2/1J1+1J2~tJ−2

and σˆ2=yTRy/J−2. We implicitly made the assumption that we have equal variance in both groups. This assumption may not be tenable, for example, when comparing normal subjects with patients, and an unequal variance model should be used (see Glaser & Friston, 2007).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123970251003171

Mathematical Modelling in Motor Neuroscience: State of the Art and Translation to the Clinic. Gaze Orienting Mechanisms and Disease

Thomas Eggert, Andreas Straube, in Progress in Brain Research, 2019

2.4 Statistics

Comparisons of saccade or model parameters between groups were performed with two-sample t-tests. The normality assumption of this test was confirmed by using the Lilliefors test. Since the inter-trial variances of saccade duration and amplitude were not normally distributed within the population, they were compared with the Wilcoxon rank-sum test. Normal distributions were characterized by mean ± standard deviation, and non-normal distribution by median [interquartile range (iqr)]. The 95% confidence interval of the median of the amplitude-variance, shown by the whiskers in Fig. 3A, was computed using the function wilcox.test of the “stats” package in the R environment (R Core Team, 2012).

Which explanation below best describes how to perform the two-sample t-test

Fig. 3. Inter-trial variability of saccade parameters: (A) The variance of saccade amplitudes, pooled across saccades with similar initial motor error, was larger in patients than in controls. (B) Saccade duration and amplitudes of a typical subject. Filled circles: Selected trials with similar initial motor error. Crosses: trials with other initial motor errors. The increase of amplitude with duration was smaller for the selected trials (solid, slope α10) than the one averaged across all trials (dashed, slope αall). (C) The ratio α10/αall was smaller than one indicating that saccade amplitude was stabilized against variation in duration. This stabilization did not differ between patients and controls. Each symbol in A/C shows the data of one subject. Bars: median (A) or the mean (C) across the population; Whiskers: 95% confidence interval of the bars.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S0079612319300512

Statistical Inference

Petter Laake, Morten Wang Fagerland, in Research in Medical and Biological Sciences (Second Edition), 2015

Assume that the two samples comprise n1and n2independent observations. The means of the samples are X¯1and X¯2. The two-sample t-test statistic is the ratio between the effect estimate and its standard error,

(11.26)T=X¯1−X¯2s(1/n1)+(1/n2),

where s is the estimate of the common SDs of the two samples, expressed by:

(11.27)s=s12(n1−1)+s22(n2−1)n1+n2−2.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780127999432000112

Hypothesis Testing: Methodology and Limitations

T.A.B. Snijders, in International Encyclopedia of the Social & Behavioral Sciences, 2001

2.3 The Role of Assumptions

The probability statements that are required for statistical tests do not come for free, but are based on certain assumptions about the observations used for the test. In the two-sample t-test, the assumptions are that the observations of different individuals are outcomes of statistically independent, normally distributed, random variables, with the same expected value for all individuals within the same group, and the same variance for all individuals in both groups. Such assumptions are not automatically satisfied, and for some assumptions it may be doubted whether they are ever satisfied exactly. The null hypothesis H0 and alternative hypothesis H1 are statements which, strictly speaking, imply these assumptions, and which therefore are not each other's complement. There is a third possibility: the assumptions are invalid, and neither H0 nor H1 is true. The sensitivity of the probabilistic properties of a test to these assumptions is referred to as the lack of robustness of the test. The focus of robustness studies has been on the assumptions of the null hypothesis and the sensitivity of the probability of an error of the first kind to these assumptions, but studies on robustness for deviations from assumptions of the alternative hypothesis have also been done, cf. Wilcox (1998).

One general conclusion from robustness studies is that tests are extremely sensitive to the independence assumptions made. Fortunately, those assumptions are often under control of the researcher through the choice of the experimental or observational design. Traditional departures from independent observations are multivariate observations and within-subject repeated measures designs, and the statistical literature abounds with methods for such kinds of dependent observations. More recently, methods have been developed for clustered observations (e.g., individual respondents clustered within groups) under the names of multilevel analysis and hierarchical linear modeling.

Another general conclusion is that properties of tests derived under the assumption of normal distributions, such as the t-test, can be quite sensitive to outliers, i.e., single, or a few, observations that deviate strongly from the bulk of the observations. Since the occurrence of outliers has a very low probability under normal distributions, they are ‘assumed away’ by the normality assumption. The lack of robustness and sensitivity to outliers have led to three main developments.

How does a 2 sample t

The Two-Sample T-Test Works in the Same Way In the two-sample t-test, two sample means are compared to discover whether they come from the same population (meaning there is no difference between the two population means).

What are the 2 types of two sample t tests?

Independent two-sample t-test. Paired sample t-test.