Distribution Fitting in NCSS
There are a number of tools available in NCSS for distribution fitting. NCSS includes both graphical tools, such as probability plots and survival plots, and numeric analysis tools like beta, gamma, and Weibull distribution fitting, normality tests, and Grubbs’ Outlier Test. All of the procedures in NCSS are validated for accuracy and are easy to learn and use.
Use the links below to jump to the distribution fitting topic you would like to examine. To see how these tools can benefit you, we recommend you download and install the free trial of NCSS.
- Technical Details
- Distribution (Weibull) Fitting
- Beta Distribution Fitting
- Gamma Distribution Fitting
- Grubbs’ Outlier Test
- Normality Tests
- Probability Plots
Distribution fitting is often used in survival and reliability applications for modeling lifetime or failure rates. By fitting the data to a distribution, one is able to make predictions about future events and draw conclusions from the data based on the fitted distribution. The beta, gamma, and Weibull distributions are often used to fit survival or reliability data.
Another application of distribution fitting is in checking assumptions in data modeling and analysis. A common assumption in many analyses is that the data or the residuals are normally distributed. Normality tests and probability plots are useful for checking these important distributional assumptions.
This page is designed to give a general overview of the capabilities of NCSS for distribution fitting. If you would like to examine the formulas and technical details relating to a specific NCSS procedure, click on the corresponding ‘[Documentation PDF]’ link under each heading to load the complete procedure documentation. There you will find formulas, references, discussions, and examples or tutorials describing the procedure in detail.
The Distribution (Weibull) Fitting procedure in NCSS estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions by maximum likelihood. It can fit complete, right censored, left censored, interval censored (readout), and grouped data values. It also computes the nonparametric Kaplan-Meier and Nelson-Aalen estimates of survival and associated hazard rates. It outputs various statistics and graphs that are useful in reliability and survival analysis. When the choice of the probability distribution is in doubt, the procedure helps select an appropriate probability distribution from those available.
Features of this procedure include:
- 1. Probability plotting, hazard plotting, and reliability plotting for the common life distributions. The data may be any combination of complete, right censored, left censored, and interval censored data.
- 2. Maximum likelihood and probability plot estimates of distribution parameters, percentiles, reliability (survival) functions, hazard rates, and hazard functions.
- 3. Confidence intervals for distribution parameters and percentiles.
- 4. Nonparametric estimates of survival using the Kaplan-Meier procedure.
Sample Distribution Fitting Plots Made with NCSS
The Beta Distribution Fitting procedure fits the beta probability distributions to a complete set of individual or grouped data values. It outputs various statistics and graphs that are useful in reliability and survival analysis. The beta distribution is useful for fitting data which have an absolute maximum (and minimum). It finds some application as a lifetime distribution.
The Gamma Distribution Fitting procedure fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics and graphs that are useful in reliability and survival analysis.
The gamma distribution competes with the Weibull distribution as a model for lifetime. Since it is more complicated to deal with mathematically, it has been used less. While the Weibull is a purely heuristic model (approximating the data well), the gamma distribution does arise as a physical model since the sum of exponential random variables results in a gamma random variable. At times, you may find that the distribution of log lifetime follows the gamma distribution.
The Grubbs’ Outlier Test procedure in NCSS computes Grubbs’ test (1950) for detecting outliers in normal populations. It is well known that outliers (or extreme points) often distort the results of an analysis. Because of this, every analysis should begin with either a graphical or statistical check about the possibility of outliers. This procedure also computes Rosner’s (2011) test for many outliers. We also recommend Barnett and Lewis (1994) for many more outlier tests.
The Normality Tests procedure in NCSS provides seven different tests of data normality:
- 1. Shapiro-Wilk W Test
- 2. Anderson-Darling Test
- 3. Martinez-Iglewicz Test
- 4. Kolmogorov-Smirnov Test
- 5. D’Agostino Skewness Test
- 6. D’Agostino Kurtosis Test
- 7. D’Agostino Omnibus Test
If a variable is normally distributed, then you can use parametric statistics that are based on this assumption. If a variable fails a normality test, it is critical to look at the histogram and the normal probability plot to see if an outlier or a small subset of outliers has caused the non-normality. If there are no outliers, you might try a transformation (such as, the log or square root) to make the data normal. If a transformation is not a viable alternative, nonparametric methods that do not require normality may be used.
Always remember that a reasonably large sample size is required to detect departures from normality. Only extreme types of non-normality can be detected with samples less than fifty observations. Normality tests generally have small statistical power (probability of detecting non-normal data) unless the sample sizes are at least over 100.
There is a common misconception that a histogram is always a valid graphical tool for assessing normality. Since there are many subjective choices that must be made in constructing a histogram, and since histograms generally need large sample sizes to display an accurate picture of normality, preference should be given to other graphical displays such as the box plot, the density trace, and the normal probability plot.
Probability plots are used to determine visually how closely the data follow the probability distribution of interest. If the points in the probability plot all fall along a relatively straight line, you can assume that the data follow that probability distribution, or, at least, that the actual distribution is well approximated by the distribution in the plot.
NCSS constructs probability plots for 8 different probability distributions:
- 1. Normal
- 2. Weibull
- 3. Chi-Square
- 4. Exponential
- 5. Gamma
- 6. Half-Normal
- 7. Log-Normal
- 8. Uniform
Approximate confidence limits are drawn to help determine if a set of data follows a given distribution. If a grouping variable is specified, a separate line is drawn and displayed for each unique value of the grouping variable. NCSS also includes a procedure that allows you to generate probability plots from any of the 8 distributions together for comparison.
Probability plots emphasize problems that may occur in the tails of the distribution, not in the middle (since there are so many points clumped together there). Remember that the confidence limits displayed on the plot are only approximate. They depend heavily on a reasonable sample size. For samples of under twenty points, the confidence limits may not be very accurate.