Sample Size for One or Two Means in PASS

PASS contains over 60 tools for sample size estimation and power analysis of the comparison of one, two, or paired means, including t-tests, equivalence, non-inferiority, cross-over, nonparametric, and simulation, among many others. Each procedure is easy-to-use and is carefully validated for accuracy. Use the links below to jump to a means topic. For each procedure, only a brief summary of the procedure is given. For more details about a particular procedure, we recommend you download and install the free trial of the software.

Jump to:

Introduction

For most of the sample size procedures in PASS for comparing means, the user may choose to solve for sample size, power, or the population effect size in some manner. In the case of confidence intervals, one could solve for sample size or the distance to the confidence limit.

In a typical means test procedure where the goal is to estimate the sample size, the user enters power, alpha, desired population mean difference, and a value for the variation. The procedure is run and the output shows a summary of the entries as well as the sample size estimate. A summary statement is given, as well as references to the articles from which the formulas for the result were obtained.

For many of the parameters (e.g., power, alpha, sample size, means, standard deviations, etc.), multiple values may be entered in a single run. When this is done, estimates are made for every combination of entered values. A numeric summary of these is results is produced as well as easy-to-read sample size or power curve graphs.

Several simulation procedures are available for the comparison of means. These procedures can be used to examine the effects of non-normal underlying distributions, or to analyze sample size or power in scenarios where closed form solutions are not available.

Technical Details

This page provides a brief description of the tools that are available in PASS for power and sample size analysis of the comparison of means. If you would like to examine the formulas and technical details relating to a specific PASS procedure, we recommend you download and install the free trial of the software, open the desired means procedure, and click on the help button in the top right corner to view the complete documentation of the procedure. There you will find summaries, formulas, references, discussions, technical details, examples, and validation against published articles for the procedure.

An Example Setup and Output

When the PASS software is first opened, the user is presented with the PASS Home window. From this window the desired procedure is selected from the menus, the category tree on the left, or with a procedure search. The procedure opens and the desired entries are made. When you click the Calculate button the results are produced. You can easily navigate to any part of the output with the navigation pane on the left.

PASS Home Window

PASS Home Window

Procedure Window for Two-Sample T-Tests

PASS T-Test Procedure Window

PASS Output Window

PASS Output Window

Sample Size for One Mean Tests and Confidence Intervals

Sample Size for Tests for One Mean

The one-sample t-test is used to test whether the mean of a population is greater than, less than, or not equal to a specific value. Because the t distribution is used to calculate critical values for the test, this test is often called the one-sample t-test. If the standard deviation is known, the normal distribution is used instead of the t distribution and the test is officially known as the z test.

This module also calculates the power of the nonparametric analog of the t-test, the Wilcoxon test.

Sample Size for Tests for One Mean (Simulation)

This procedure allows you to study the power and sample size of several statistical tests of the hypothesis that the population mean is equal to a specific value versus the alternative that it is greater than, less than, or not equal to that value. The one-sample t-test is commonly used in this situation, but other tests have been developed for situations where the data are not normally distributed. These additional tests include the Wilcoxon signed-rank test, the sign test, and the computer-intensive bootstrap test. When the population follows the exponential distribution, a test based on this distribution should be used.

The t-test assumes that the data are normally distributed. When this assumption does not hold, the t-test is still often used, hoping that its robustness will produce accurate results. This procedure allows you to study the accuracy of various tests using simulation techniques. A wide variety of distributions can be simulated to allow you to assess the impact of various forms of non-normality on each test’s accuracy.

Sample Size for Tests for One Mean using Effect Size

This procedure provides sample size and power calculations for a one- or two-sided one-sample t-test when the effect size is specified rather than the means and variance, as described by Cohen (1988). In this design, a single population of independent, normally distributed data is sampled and its mean is compared to a specified quantity by forming the difference scaled by the standard deviation.

Sample Size for Tests for One Exponential Mean

This procedure analyzes studies for testing hypotheses about the mean of the exponential distribution. Such tests are often used in reliability acceptance testing, also called reliability demonstration testing.

Results are calculated for plans that are time censored or failure censored, as well as for plans that use with replacement or without replacement sampling. We adopt the basic methodology outlined in Epstein (1960), Juran (1979), Bain and Engelhardt (1991), and Schilling (1982).

Sample Size for Tests for One Poisson Mean

The Poisson probability law gives the probability distribution of the number of events occurring in a specified interval of time or space. The Poisson distribution is often used to fit count data, such as the number of defects on an item, the number of accidents at an intersection during a year, the number of calls to a call center during an hour, or the number of meteors seen in the evening sky during an hour.

The Poisson distribution is characterized by a single parameter, λ, which is the mean number of occurrences during the interval. This procedure calculates the power or sample size for testing whether λ is less than or greater than a specified value. This test is usually called the test of the Poisson mean.

The test is described in Ostle (1988) and the power calculation is given in Guenther (1977).

Sample Size for Non-Inferiority Tests for One Mean

This module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random variable. Sample size formulas for non-inferiority tests of a single mean are presented in Chow et al. (2003) page 50. This module also calculates the power of the nonparametric analog of the t-test, the Wilcoxon test.

Sample Size for Superiority by a Margin Tests for One Mean

This module computes power and sample size for tests in one-sample designs with a superiority margin in which the outcome is distributed as a normal random variable. Sample size formulas for non-inferiority and superiority hypothesis tests of a single mean are presented in Chow et al. (2003) page 50. This module also calculates the power of the non-parametric analog of the t-test, the Wilcoxon test.

Sample Size for Multiple One-Sample T-Tests

This procedure analyzes power and sample size (number of arrays) for paired and one sample high-throughput studies. False discovery rate and experiment-wise error rate control methods are available in this procedure. Values that can be varied in this procedure are power, false discovery rate and experiment-wise error rate, sample size (number of arrays), the minimum mean difference detected, the standard deviation, and in the case of false discovery rate control, the number of genes with minimum mean difference.

Sample Size for Conditional Power of One-Sample T-Tests

In sequential designs, one or more intermediate analyses of the emerging data are conducted to evaluate whether the experiment should be continued. This may be done to conserve resources or to allow a data monitoring board to evaluate safety and efficacy when subjects are entered in a staggered fashion over a long period of time. Conditional power (a frequentist concept) is the probability that the final result will be significant, given the data obtained up to the time of the interim look. Predictive power (a Bayesian concept) is the result of averaging the conditional power over the posterior distribution of effect size. Both of these methods fall under the heading of stochastic curtailment techniques. Further reading about the theory of these methods can be found in Chow and Chang (2007), Chang (2008, 2008), Proschan et. al (2006), and Dmitrienko et. al (2005).

This program module computes conditional and predicted power for the case when a one-sample t-test is used to test whether the mean of a population is greater than, less than, or not equal to a specific value.

Sample Size for Confidence Intervals for One Mean

This procedure calculates the sample size necessary to achieve a specified distance from the mean to the confidence limit(s) at a stated confidence level for a confidence interval about the mean when the underlying data distribution is normal.

This procedure assumes that the standard deviation of the future sample will be the same as the standard deviation that is specified. If the standard deviation to be used in the procedure is estimated from a previous sample or represents the population standard deviation, the Confidence Intervals for One Mean with Tolerance Probability procedure should be considered. That procedure controls the probability that the distance from the mean to the confidence limits will be less than or equal to the value specified.

Sample Size for Confidence Intervals for One Mean with Tolerance Probability

This procedure calculates the sample size necessary to achieve a specified distance from the mean to the confidence limit(s) with a given tolerance probability at a stated confidence level for a confidence interval about a single mean when the underlying data distribution is normal.

Power Curve from the Tests for One Mean Procedure

One Mean Power Curve

Sample Size for Paired Means Tests and Confidence Intervals

Sample Size for Tests for Paired Means

The paired t-test may be used to test whether the mean difference of two populations is greater than, less than, or not equal to a specific value. This procedure calculates sample size or power of a study based on the specified mean and standard deviation of paired differences.

This module also calculates the power of the nonparametric analog of the t-test, the Wilcoxon test.

Sample Size for Tests for Paired Means (Simulation)

This procedure allows you to study the power and sample size of several statistical tests of the null hypothesis that the difference between two correlated means is equal to a specific value versus the alternative that it is greater than, less than, or not-equal to that value. The paired t-test is commonly used in this situation. Other tests have been developed for the case when the data are not normally distributed. These additional tests include the Wilcoxon signed-ranks test, the sign test, and the computer-intensive bootstrap test.

Paired data may occur because two measurements are made on the same subject or because measurements are made on two subjects that have been matched according to other, often demographic, variables. Hypothesis tests on paired data can be analyzed by considering the differences between the paired items. The distribution of differences is usually symmetric. Thus, the paired t-test and the Wilcoxon signed-rank test are often appropriate for paired data even when the distributions of the individual items are not normal.

Sample Size for Tests for Paired Means using Effect Size

This procedure provides sample size and power calculations for a one- or two-sided paired t-test when the effect size is specified rather than the means and variance, as described in Cohen (1988). In this design, a single population of paired, normally distributed data is sampled and the mean difference is compared to zero by forming the difference scaled by the standard deviation of the differences.

Sample Size for Tests for the Matched-Pair Difference of Two Means in a Cluster-Randomized Design

Cluster-randomized designs are those in which whole clusters of subjects (classes, hospitals, communities, etc.) are sampled, rather than individual subjects. This sample size and power procedure is used for the case where the subject responses are continuous (mean outcome). To reduce the variation (and thus increase power), clusters are matched, with one cluster of each pair assigned to the control group, and the other assigned the treatment group. This procedure gives the number of pairs needed for the desired power requirement.

Sample Size for Non-Inferiority Tests for Paired Means

This module computes power and sample size for non-inferiority tests in paired designs in which the outcome difference is distributed as a normal random variable. Sample size formulas for non-inferiority tests of a single mean (and hence paired differences) are presented in Chow et al. (2003) page 50. This module also calculates the power of the nonparametric analog of the t-test, the Wilcoxon test.

Sample Size for Superiority by a Margin Tests for Paired Means

This module computes power and sample size for tests in paired-sample designs with a superiority margin in which the outcome difference is distributed as a normal random variable. Sample size formulas for non-inferiority and superiority hypothesis tests of a single mean are presented in Chow et al. (2003) page 50. This module also calculates the power of the non-parametric analog of the t-test, the Wilcoxon test.

Sample Size for Equivalence Tests for Paired Means (Simulation)

This procedure allows you to study the power and sample size of tests of equivalence of means of two correlated variables. Schuirmann’s (1987) two one-sided tests (TOST) approach is used to test equivalence. The paired t-test is commonly used in this situation. Other tests have been developed for the case when the data are not normally distributed. These additional tests include the Wilcoxon signed-ranks test, the sign test, and the computer-intensive bootstrap test.

Sample Size for Conditional Power of Paired T-Tests

This procedure computes conditional and predicted power for the case when a paired t-test is used to test whether the paired mean difference is greater than, less than, or not equal to a specific value.

Sample Size for Multiple Paired T-Tests

This procedure estimates power and sample size (number of arrays) for paired and one sample high-throughput studies. False discovery rate and experiment-wise error rate control methods are available in this procedure. Values that can be varied in this procedure are power, false discovery rate and experiment-wise error rate, sample size (number of arrays), the minimum mean difference detected, the standard deviation, and in the case of false discovery rate control, the number of genes with minimum mean difference.

Sample Size for Confidence Intervals for Paired Means

This procedure calculates the sample size necessary to achieve a specified distance from the paired sample mean difference to the confidence limit(s) at a stated confidence level for a confidence interval about the mean difference when the underlying data distribution is normal.

Sample Size for Confidence Intervals for Paired Means with Tolerance Probability

This procedure calculates the sample size necessary to achieve a specified distance from the paired sample mean difference to the confidence limit(s) with a given tolerance probability at a stated confidence level for a confidence interval about a single mean difference when the underlying data distribution is normal.

Sample Size Curve from the Tests for Paired Means Procedure

Paired Means Sample Size Curve

Sample Size for Two Independent Means Tests and Confidence Intervals

Sample Size for T-Test (Inequality)

There are five procedures in PASS for the comparison of two means using t-tests. Two of the procedures (enter means and enter mean difference) assume equal variance and use the standard t-test formulas. Two more of the procedures (enter means and enter mean difference) assume the variances are unequal and use formulas based on Welch’s unequal variance t-test. The fifth procedure uses simulation to determine the effect of various underlying distributions on the sample size or power result.

For each of the procedures, one-sided or two-sided test options are available, as well as equal or unequal sample sizes in the two groups.

Sample Size for Two-Sample T-Tests using Effect Size

This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the effect size is specified rather than the means and variance(s), as described in Cohen (1988). The design corresponding to this test procedure is sometimes referred to as a parallel-groups design. In this design, two groups from independent, normally distributed populations are compared by considering the difference in their means scaled be their common standard deviation.

This procedure is specific to the two-sample t-test assuming equal variance. If the variances are known to be significantly different, this procedure can still be used if the group sample sizes are equal and the average of the variances is used.

Sample Size for Z-Test (Inequality)

There are four procedures in PASS for the comparison of two means using z-tests. Two of the procedures (enter means and enter mean difference) assume equal variance and use the corresponding z-test formulas. Two more of the procedures (enter means and enter mean difference) assume the variances are unequal. For each of the procedures, one-sided or two-sided test options are available, as well as equal or unequal sample sizes in the two groups.

Sample Size for Nonparametric Tests of Two Independent Means

This procedure uses simulation to determine the power or sample size for the Mann-Whitney-Wilcoxon test. Some of the available distributions include Beta, Normal, Multinomial, Weibull, Tukey’s Lambda, Gamma, and Normal.

Sample Size for Tests of Two Independent Means Using Ratios

This procedure calculates power and sample size for t-tests from a parallel-groups design in which the logarithm of the outcome is a continuous normal random variable. This routine deals with the case in which the statistical hypotheses are expressed in terms of mean ratios instead of mean differences.

Sample Size for Tests of Two Independent Means with Non-Normal Data

PASS has specific tools for the calculation of sample sizes or power for the case where the compared means are based on samples from a Poisson distribution or an exponential distribution. There are also two simulation procedures for t-tests and Mann-Whitney-Wilcoxon tests where the underlying distribution can be specified directly.

Sample Size for Confidence Intervals for Comparing Two Independent Means

Two confidence interval procedures can be used in PASS to calculate sample size for a given distance from the mean difference to the confidence interval limits. One procedure has the additional option of accounting for the variability in a future estimate of the pooled standard deviation through a tolerance probability.

Sample Size for Non-Inferiority Tests of Two Independent Means

One procedure examines power and sample size for non-inferiority of two independent means based on the difference, while the other procedure is based on the mean ratio. There is also a group-sequential non-inferiority test procedure.

Sample Size for Equivalence Tests of Two Independent Means

There are three equivalence test procedures in PASS. One procedure is based on mean differences, another on the ratios, and the third is a simulation based procedure with options for t-tests, Mann-Whitney-Wilcoxon tests, and trimmed mean tests.

Sample Size for Superiority by a Margin Tests of Two Independent Means

These two procedures in PASS are used when one wishes to calculate sample size or power for the case of showing one group mean is higher (or lower) than another by a specified amount.

Sample Size for Cluster-Randomized Tests of Two Independent Means

In this procedure, cluster randomization refers to the situation in which the means of two groups, made up of M clusters of N individuals each, are to be tested using a modified t-test. This procedure permits the user to solve for the power, the number of clusters, or the number of individuals within a cluster.

Sample Size for Tests for Two Means in a Stepped-Wedge Cluster-Randomized Design

A stepped-wedge cluster-randomized design is similar to a cross-over design in that each cluster receives both the treatment and control over time. In a stepped-wedge design, however, the clusters switch or cross-over in one direction only (usually from the control group to the treatment group). Once a cluster is randomized to the treatment group, it continues to receive the treatment for the duration of the study. In a typical stepped-wedge design the all clusters are assigned to the control group at the first time point and then individual clusters are progressively randomized to the treatment group over time. The stepped-wedge design is particularly useful for cases where it is logistically impractical to apply a particular treatment to half of the clusters at the same time.

This procedure computes power and sample size for tests for the difference between two means in cross-sectional stepped-wedge cluster-randomized designs. In cross-sectional designs, different subjects are measured within each cluster at each point in time. No one subject is measured more than once. (This is not to be confused with cohort studies (i.e. repeated measures) where individuals are measured at each point in time. The methods in this procedure should not be used for cohort or repeated measures designs.)

Sample Size for Multiple Tests of Two Independent Means

This procedure estimates power and sample size (number of arrays) for 2 group (two-sample) high-throughput studies. False discovery rate and experiment-wise error rate control methods are available in this procedure. Values that can be varied in this procedure are power, false discovery rate and experiment-wise error rate, sample sizes (numbers of arrays) in each group, the minimum mean difference detected, the standard deviations in each group, and in the case of false discovery rate control, the number of tests/genes with minimum mean difference.

Sample Size for Repeated Measures Tests of Two Independent Means

There are two repeated measures procedures for the comparison of two independent means. One of the procedures calculates the power for testing the time-averaged difference (TAD) between two means in a repeated measures design. The other procedure is used for the scenario of comparing two groups of pre-post scores.

Time-averaged difference analysis is often used when the outcome to be measured varies with time. The precision of the experiment is increased by taking multiple measurements from each individual and comparing the time-averaged difference between the two groups. Care must be taken in the analysis because of the correlation that is introduced when several measurements are taken from the same individual. The covariance structure may take on several forms depending on the nature of the experiment and the subjects involved. This procedure allows you to calculate sample sizes using four different covariance patterns: Compound Symmetry, AR(1), Banded(1), and Simple. This procedure can be used to calculate sample size and power for tests of pairwise contrasts in a mixed models analysis of repeated measures data.

The Tests for Two Groups of Pre-Post Scores procedure calculates the power for testing the interaction in a 2-by-2 repeated measures design. This particular repeated measures design is one in which subjects are observed twice over time, as is the case in a pre, post design. Measurements are taken at two, pre-determined time intervals. It is important that the time interval remains constant from subject to subject. The test of the interaction compares the average change in measurement for group 1 with that of group 2.

Sample Size for Group-Sequential Tests of Two Independent Means

There are four procedures in PASS for the comparison of two means with group-sequential tests. One of the procedures uses an analytic solution, but is much less flexible than the others. Two simulation procedures allow for various underlying distributions. The fourth uses simulation for non-inferiority group-sequential testing.

A variety of spending function options are available in these procedures, including Hwang-Shi-DeCani, O’Brien-Fleming, and Pocock types. Either the T-test or the Mann-Whitney-Wilcoxon test may be examined. For one-sided tests, significance and futility boundaries can be produced. The spacing of the looks can be equal or custom specified. The distributions of each of the groups under the null and alternative hypotheses can be specified directly using over ten distributions including normal, exponential, Gamma, Uniform, Beta, and Cauchy. Boundaries can also be input directly to verify alpha- and/or beta-spending properties. Futility boundaries can be binding or non-binding.

Group Sequential Boundary Plot

Sample Size for Conditional Power Tests of Two Independent Means

The Conditional Power of Two-Sample T-Tests procedure computes conditional and predicted power for the case when a two-sample t-test is used to test whether the means of two populations are different.

In sequential designs, one or more intermediate analyses of the emerging data are conducted to evaluate whether the experiment should be continued. This may be done to conserve resources or to allow a data monitoring board to evaluate safety and efficacy when subjects are entered in a staggered fashion over a long period of time. Conditional power is the probability that the final result will be significant, given the data obtained up to the time of the interim look. Predictive power is the result of averaging the conditional power over the posterior distribution of effect size. Both of these methods fall under the heading of stochastic curtailment techniques.

Sample Size Curve from the Two-Sample T-Tests Assuming Equal Variance (Enter Means) Procedure

Two Means Sample Size Curve

Sample Size for Comparing Two Means in a Cross-Over Design

Sample Size for Tests for Two Means in a Cross-Over Design

PASS has two procedures for comparing two means in a 2-by-2 cross-over design. One procedure is based on the difference of the two means, while the other procedure is based on the ratio of the two means. In the difference procedure, the standard deviation may be specified as the square root of the within mean square error from a repeated measures ANOVA, as the standard deviation of the period differences for each subject within each sequence, or as the standard deviation of the paired differences.

Sample Size for Non-Inferiority Tests for Two Means in a Cross-Over Design

Cross-over non-inferiority tests for two means procedures are available in PASS for differences and ratios, as wells for both 2-by-2 and higher-order cross-over designs.

Sample Size for Equivalence Tests for Two Means in a Cross-Over Design

PASS has cross-over equivalence tests for two means procedures available for differences and ratios, as wells for both 2-by-2 and higher-order cross-over designs.

Sample Size for Superiority by a Margin Tests for Two Means in a Cross-Over Design

Superiority by a margin cross-over designs are used to show that one group mean is higher than another by a specified amount. In PASS there procedures for 2-by-2 and higher-order designs, as well as for both differences and ratios.

Sample Size for Conditional Power Tests for Two Means in a Cross-Over Design

This procedure computes conditional and predicted power for the case when a t-test, computed from data obtained from a 2×2 cross-over design, is used to test whether two population means are different.

Power Curve for a Two Means Cross-Over Design

Power Curve for a Two Means Cross-Over Design

Start Your Free 30 Day Trial Now Buy Now

"PASS 13 is fantastic! Better than my new dishwasher and microwave combined."

Dr. Barbara Tabachnick, CSUN, Author

"I have been using NCSS for almost 20 years. I absolutely love it. It does everything [I need that] SPSS or SAS does, is more reasonably priced and user friendly and has wonderful customer support. If you have a problem, when you call a real person (who created the software) gently talks you through it. It is a great product and a great company."

Cheryl L. Meyer, PhD, Wright State University