STATISTICA







VALIDATION BENCHMARKS


Tests of numerical precision and accuracy of statistical algorithms of the main computational engines of STATISTICA (by StatSoft, Inc.)

The following selection of 52 datasets and analysis designs included in these validation benchmarks represent:

The accuracy criterion for all benchmarks presented below are either the respective published sources or, where applicable, the internal consistency of the results.

To the best of our knowledge, STATISTICA is the only statistics package available on the market which has successfully passed every test included in this set of benchmarks (and some tests reported here cannot be passed by any program other than STATISTICA).


* We are grateful to Dr. Lynn Brecht (UCLA), Dr. John Castellan (Indiana University), Dr. Elazar Pedhazur (New York University), Dr. Dallas Johnson (Kansas State University), Dr. Geoffrey Keppel (University of California, Berkeley), Dr. Michael Kutner (Emory University), Dr. George Milliken (Kansas State University), Dr. Paul Switzer (Stanford University), Dr. William Wasserman (Syracuse University), Dr. Thomas Wickens (UCLA), and Dr. Arthur Woodward (UCLA) for their advice, and for recommending to us some of the datasets used in these validation benchmarks, and to Drs. A. Woodward and L. Brecht for allowing us to use datasets from the technical documentation for Ganova. We are also grateful to all those researchers and practitioners who generously provided us with their raw datasets and allowed us to use them in the validation benchmarks. We would appreciate readers' suggestions concerning any additional benchmarks which could be included in this set.


Example 1: The "small relative variance test" of numerical precision

In the following sample dataset, variable var2 (the second column) which features a small relative variance is a linear function of var3 (the third column); thus, the correlation coefficient between any variable (e.g., variable var1) and var2 should be identical to the correlation between that variable and var3.

var1var2var3
1.0 100000.000000011.0
2.0 100000.00000002 2.0
3.0 100000.00000001 1.0
4.0 100000.00000002 2.0
5.0 100000.00000001 1.0
6.0 100000.00000002 2.0
7.0 100000.00000005 5.0

Here are the two correlation coefficients (var1*var2 and var1*var3) calculated by STATISTICA (using its extended precision optimization algorithm), and displayed with the highest precision available.

variablesPearson rp-level
var1 * var20.654653670707980.111
var1 * var30.654653670707980.111

To our knowledge, STATISTICA is the only program available on the market that will correctly compute these correlations (or correlations from other datasets featuring very small relative variances).

Example 2: A medium size multi-factor unbalanced ANOVA design

The following design is a 5 x 5 x 5 x 3 (between-group) x 3 x 3 x 3 (repeated measures) design (with unequal N). Thus there are 375 groups and 27 dependent variables (data file ANOVA4 is available from StatSoft). The between-group design matrix for the highest order interaction has 128 degrees of freedom. Shown below are the univariate and multivariate results for the highest order (7-way) interaction.

general
manova
INTERACTION: 1 x 2 x 3 x 4 x 5 x 6 x 7
1-IV1, 2-IV2, 3-IV3, 4-IV4, 5-RFACT1,
6-RFACT2, 7-RFACT3
Univar.
Test
Sum of
Squares
dfMean
Square
Fp-level
Effect
Error
8664.99
24854.14
1024
3008
8.461903
8.262680
1.0241131744
TestValuep-level
Wilks' Lambda
Rao R (1024,2966)

Pillai-Bartlett Trace
V(1024.3008)
.088651
1.027036

2.071145
1.026166
.29812


.30355

Example 3: A medium size multi-factor unbalanced ANOVA design (very large and very small values)

Example 3.1. In the first part of this test, the dataset used in the previous example (Example 2, original range of values 0.1 to 10.0) was transformed by multiplying each dependent variable in the original dataset by 100,000; then, the analysis of variance reported in the previous example was performed on the transformed data. Shown below are the univariate and multivariate results for the highest order (7-way) interaction (cf. Example 2).

Univar.
Test
Sum of
Squares
dfMean
Square
Fp-level
Effect
Error
8664.99
24854.14
1024
3008
8.461903
8.262680
1.0241131744
TestValuep-level
Wilks' Lambda
Rao R (1024,2966)

Pillai-Bartlett Trace
V(1024.3008)
.088651
1.027036

2.071145
1.026166
.29812


.30355

Example 3.2. In the second part of this test, the dataset used in Example 2 (original range of values 0.1 to 10.0) was transformed by dividing each dependent variable in the original set by 100,000; the analysis of variance reported in Example 2 was then performed on the transformed data. Shown below are the univariate and multivariate results for the highest order (7-way) interaction (cf. the first part of this example and Example 2).

Univar.
Test
Sum of
Squares
dfMean
Square
Fp-level
Effect
Error
8664.99
24854.14
1024
3008
8.461903
8.262680
1.0241131744
TestValuep-level
Wilks' Lambda
Rao R (1024,2966)

Pillai-Bartlett Trace
V(1024.3008)
.088651
1.027036

2.071145
1.026166
.29812


.30355

Example 4: A large multi-factor unbalanced ANOVA design

The following design is a 20 x 10 x 2 x 2 (between-group) x 3 (repeated measures) design with unequal N. Thus, there are 800 groups and 3 dependent variables (data file ANOVA44 is available from StatSoft). The between-group design matrix for the highest order interaction has 171 degrees of freedom. Shown below are the univariate and multivariate results for the highest order (5-way) interaction.

general
manova
INTERACTION: 1 x 2 x 3 x 4 x 5
1-COUNTRY, 2-RAINFALL,3-REGION,
4-STATUS, 5-RFACTOR
Univar.
Test
Sum of
Squares
dfMean
Square
Fp-level
Effect
Error
17.9462
181.8289
342
3202
.052474
.056786
.92406.82876
general
manova
INTERACTION: 1 x 2 x 3 x 4 x 5
1-COUNTRY, 2-RAINFALL, 3-REGION,
4-STATUS, 5-RFACTOR
TestValuep-level
Wilks' Lambda
Rao R (342,inf)

Pillai-Bartlett Trace
V(342,3202)
.826507
.935296

.181690
.935531
.78876


.78788

To our knowledge, STATISTICA is the only program available on the market that can process ANOVA designs of this size.

Example 5: Precision of ANOVA routines (small within-cell variances relative to the between-group variance)

Here is a test of the precision of computations in ANOVA: A data file was created with 10 cases and 5 groups (2 cases per group), and 12 dependent variables. The groups in the grouping variable IV were coded 1 through 5. The dependent variables DVi (i =1 to 12) were then computed as DVi = IV + casenumber/10**i (each successive dependent variable was computed as a constant plus the case number divided by 10 to the power of i). This results in small within-cell variances relative to the between-group variance.

general
manova
MAIN EFFECT: IV
1-IV
depend.
variable
Mean Sqr
Effect
Mean Sqr
Error
F(df1,2)
4,5
DV1
DV2
DV3
DV4
DV5
DV6
DV7
DV8
DV9
DV10
DV11
DV12
5.202000
5.020020
5.002000
5.000200
5.000020
5.000002
5.000000
5.000000
5.000000
5.000000
5.000000
5.000000
.00005
.0000005
.5E-8
.5E-10
.5E-12
.5E-14
.5E-16
.5E-18
.5E-20
.500E-22
.500E-24
.502E-26
104040.
1004E4
10004E5
100004E6
1E13
1E15
1E17
1E19
1E21
99996E18
99996E20
99584E22

To our knowledge, STATISTICA is the only program available on the market that will correctly compute the within ms error component for all dependent variables in this design.

Examples 6 and 7: Logistic regression, maximum likelihood

Example 6. Cox (1970, p. 86) reports data describing the failure (variable Failure) of objects as a function of time (Time). Cox fitted the data by the logistic model. Shown below are the maximum likelihood estimates and their standard errors produced via STATISTICA: Nonlinear Estimation (see also Brown et al., 1983, p. 317).

nonlin.
estimat.
Parameter Estimates
Std. Errs were computed
after scaling ms-err. to 1.
Param.Const.TIME
Estimate
Std. Err.
t(5)
p-level
-5.415
.728
-7.438
.00069
.0807
.0224
3.6099
.0154

Example 7. A dataset reported in Neter, Wasserman, and Kutner (1985, p. 365) describes the results of a study of coupon redemption. The coupons differed in their value, that is, with regard to the price reduction offered. The dependent variable of interest is how many coupons of each type were redeemed. Shown below are the maximum likelihood parameter estimates for the logistic regression model computed by STATISTICA: Nonlinear Estimation (weighted least squares estimates are reported in Neter et al., p. 365).

nonlin.
estimat.
Parameter Estimates
Std. Errs were computed
after scaling ms-err. to 1.
Param.Const.REDUCTN
Estimate
Std. Err.
t(8)
p-level
-2.185
.165
-13.267
.000
.1087
.0089
12.2894
.0000

Example 8: Exponential regression, ordinary least squares

This example is based on a dataset reported in Neter, Wasserman, and Kutner (1985, p. 469). The data contain information on the number of days that each of 15 severely injured patients were hospitalized (variable Days) and an index of the prognosis for long-term recovery for each patient (variable Prognos). Shown below are the parameter estimates produced by STATISTICA: Nonlinear Estimation for the exponential regression model: Prognos=g0 * exp(g1*Days) [g0 and g1 are parameters], the loss function is least squares (see also Neter et al., p. 478, Table 14.3).

nonlin.
estimat.
Parameter Estimates
Param.g0g1
Estimate
Std.Err.
t(13)
p-level
58.60662
1.54984
37.81474
.00000
-.0396
.0019
-20.8667
.0000

Example 9: User-defined (exponential) regression, ordinary least squares

The dataset for this example is again based on Neter, Wasserman, & Kutner (1985, p. 484). To study the efficiency of two new manufacturing plants, a ratio was computed of the per-unit-production cost expected in a modern facility after learning has occurred, over the actual per-unit-production cost for selected weeks over a 90-week span. Neter et al. fit the following model to these data: y = b0 + b1 * xg + b3 * exp(b2*x), where xg is an indicator variable to denote the two plants, x denotes the number of weeks, y is the efficiency index, and b0, b1, b2, and b3 are parameters. This formula can be typed "as is" into the user-defined model specification editor. Shown below are the results computed by STATISTICA: Nonlinear Estimation (using the Rosenbrock pattern search method to find start values, followed by quasi-Newton iterations; Neter et. al. report the results on p. 484-485).

nonlin.
estimat.
Parameter Estimates
y = b0 + b1*xg + b3*exp(b2*x)
Param.B0B1B3B2
Estimate
Std. Err.
t(26)
p-level
1.0156
.0037
274.5491
0.0000
-.0473
.0041
-11.5026
.0000
-.5524
.0083
-66.6689
.0000
-.1348
.0046
-29.5186
.0000

Example 10: Discontinuity (breakpoint) in regression function

This example is also based on a dataset reported in Neter, Wasserman, & Kutner (1985, p. 348). Specifically, the dataset pertains to a production process in which the per-unit cost is related to the lot size. Supposedly, for lots greater than 500, the relationship between the variables changes; Neter et al. (1985) fit a linear model that allowed for different slopes for lots of sizes less than or equal to 500, and lots greater than 500. Specifically, Neter et al. fit the following model: y = b0 + b1*x + b2*(x-500)*(x>500) (b0, b1, and b2 are parameters). In this model, the logical expression (x>500) serves as a multiplier: If the expression is true, it will evaluate to 1, if it is false, it will evaluate to 0. Therefore, this equation actually represents two models: y = b0 + b1*x for x<=500, and y = b0 + b1*x + b2*(x-500) for x>500. The model can again be typed in to the user-model specification editor "as is"; shown below are the parameter estimates computed by STATISTICA: Nonlinear estimation (see Neter et al., p. 348).

Parameter Estimates
y = b0 + b1*x + b2*(x-500)*(x>500)
Param.B0B1B2
Estimate
Std. Err.
t(5)
p-level
5.895447
.604213
9.757232
.000192
-.00395
.00149
-2.64990
.04543
-.00389
.00231
-1.68515
.15277

Example 11: Weighted Least Squares

Weighted least squares or any other (user-specified) loss function can be specified in STATISTICA: Nonlinear Estimation. An example of weighted least squares is presented in Neter et al. (1985, p. 169). The example dataset contains information concerning the cost for preparing a bid and the size of the bid. Neter et al. fit a linear regression model (Bid cost = b0 + b1 * Bid size), using the residuals weighted by the inverse of the squared Bid size values in the loss function [Loss = ((Predicted-Observed) **2)*(1/Bid size**2)]. Here are the results computed by STATISTICA: Nonlinear Estimation (see Neter et al., p. 169-170).

Parameter Estimates
y=b0+b1*x
Param.B0B1
Estimate
Std.Err.
t(10)
p-level
5.656852
.965238
5.860577
.000159
4.19055
.40366
10.38127
.00000

Example 12: Robustness against collinearity problems (a linear model test of accuracy of nonlinear estimation)

The so-called Longley data (Longley, 1967) is a well-known dataset for testing linear-least-squares regression programs for their ability to handle regression problems with redundant predictor variables (this dataset is also referenced below for STATISTICA: Multiple Regression, Example 27). In this example, it will be used to test the accuracy of the general nonlinear estimation module of STATISTICA. In the user-model specification editor of STATISTICA: Nonlinear Estimation, we can specify the linear regression model, and request least squares parameter estimates. The parameter estimates computed by STATISTICA: Nonlinear Estimation (via quasi-Newton iterations) and their (asymptotic) standard errors (computed via finite difference approximation) are shown below (for comparison, see also Elliott, Reisch, & Campbell, 1989, p. 296). Note that STATISTICA: Multiple Regression will reproduce the parameter estimates with all 12 digits of precision.

multiple
regress.
Parameter Estimates
Param.AB1B2B3
Estimate
Std.Err.
-34822E2
890420.
15.06195
84.91493
-.03582
.03349
-2.02023
.48840
multiple
regres.
Parameter Estimates
Param.B4B5B6
Estimate
Std.Err.
-1.03323
.21427
-.051103
.226073
1829.155
455.479

Example 13: Unbalanced ANOVA designs (Type I and III Sums of Squares)

Milliken and Johnson (1984, p. 129) discuss in some detail the analysis of a 2 x 3 unbalanced (due to unequal N) between-group design. Shown below are the summary ANOVA tables for that design; both the results for Type I Sequential Sums of Squares (see Milliken & Johnson, p. 142) and Type III Sums of Squares (see Milliken & Johnson, p. 132) are shown below.

general
manova
Summary of all Effects (Type I SS); design:
1-T, 2-B
Effectdf
Effect
ms
Effect
df
Error
ms
Error

F

p
*1
*2
*12
1
2
2
76.563
45.372
35.815
10
10
10
2.0000
2.0000
2.000
38.281
22.686
17.908
.0001
.0002
.0005
general
manova
Summary of all Effects (Type I SS); design:
1-T, 2-B
Effectdf
Effect
ms
Effect
df
Error
ms
Error

F

p
*1
*2
*12
1
2
2
61.714
38.585
35.815
10
10
10
2.0000
2.0000
2.000
30.857
19.292
17.908
.0002
.0004
.0005

Example 14: A 2-way nested design

Lindman (1974, p. 167) discusses a two-way nested design where factor A has three levels, and factor B has six levels, with two levels each nested in each level of factor A. Here is the results summary computed by STATISTICA: ANOVA/MANOVA (see Lindman, p. 172).

general
manova
Summary of all Effects; design:
1-A, 2-B
Effectdf
Effect
ms
Effect
df
Error
ms
Error

F

p
*1
*2
2
3
114.67
46.83
12
12
4.8889
4.8889
23.455
9.580
.0001
.0017

Example 15: A 3-way nested design with customized error term

Milliken & Johnson (1984, p. 418) present an example of a 3-way nested design. In this experiment, male and female subjects were randomly assigned to one of 9 environmental chambers; the 9 environmental chambers, in turn, were assigned to 3 levels of a temperature factor. Thus, in this design Chamber is nested in Temperature, and subjects are nested in chambers. To produce the table of sums of squares as presented in Milliken & Johnson (1984, p. 419), the Gender by Chambers interaction was pooled into the error term before computing the table of all effects.

general
manova
Summary of all Effects; design:
1-TEMPERAT, 2-GENDER, 3-CHAMBER
Customized Error Term
Effectdf
Effect
ms
Effect
1
2
3
12
Error
2
1
6
2
24
79.194
3.361
11.083
7.861
1.653

Example 16: A nested design with a random effect

STATISTICA: ANOVA/MANOVA will automatically handle random effects. Lindman (1974, p. 173) shows an example of a nested design, where the nested factor is random. Factor A has four levels, factor B has 3 levels, and factor C (subjects) is a random effect with 9 levels. Shown below is the summary table for this design (see Lindman, 1974, p. 178).

general
manova
Summary of all Effects; design:
1-A, 2-B,3-C
Effectdf
Effect
ms
Effect
df
Error
ms
Error

F

p
*1
*2
3
12
13
3
2
6
6
18
86.44
365.08
44.31
11.19
8.53
18
6

18

8.53
44.31

8.53

10.14
8.24

1.31

.0004
.0190

.3016

Example 17: Weighted means analysis of a nested design with unequal N (and missing cells)

The next example was taken from Searle (1987, p. 62). The data presented there describe a two-way nested classification of student opinions concerning computers. There were two classes -- English and Geology (factor Course) -- with different numbers of sections (taught by different teachers): English had two sections, Geology had 3 sections. To test the main effect for Course, Searle constructs a weighted means comparison. Shown below is the result of that comparison as computed by STATISTICA: ANOVA/MANOVA (see also Searle, 1987, p. 71).

general
manova
Planned Comparison
1-COURSE,2-SECTION
Univar.
Test
Sum of
Squares

df
Mean
Square

F

p
Effect
Error
24.0000
26.0000
1
7
24.0000
3.7143
6.462.0386

Example 18: A split-plot design with customized error term

Milliken and Johnson (1984, p. 297) present an example of a split-plot design. The design pertains to the effectiveness of 4 different fertility regimes on two varieties of wheat. Each of the four fertilizer levels was randomly assigned to one whole plot within each of one of two blocks. Shown below are the results for the Fertility and the Variety factor with the appropriate error terms (see Milliken & Johnson, 1984, p. 299).


general
manova
MAIN EFFECT: FERTILTY
ERROR: FERTILTY x BLOCK
Univar.
Test
Sum of
Squares
dfMean
Squares
Fp
Effect
Error
40.1900
6.9275
3
3
13.3967
2.3092
5.802.0914

general
manova
MAIN EFFECT: VARIETY
ERROR: VARIETY x BLOCK
FERTILTY x VARIETY x BLOCK
Univar.
Test
Sum of
Squares
dfMean
Squares
Fp
Effect
Error
2.25000
8.43000
1
4
2.2500
2.1075
1.068.3599

Example 19: Strip-plot designs

Milliken and Johnson (1984, p. 320) discuss an experiment on the relationship between two irrigation methods and three levels of nitrogen on the yield of wheat. Again, the analysis requires the specification of custom error terms. All sums of squares are automatically computed by STATISTICA: ANOVA/MANOVA for the Table of all Effects. Note that there is a typographical error in the table presented in Milliken and Johnson (p. 320); specifically the sum of squares for factor Irrigation is 570.4 (and not 507.4).

manovaSummary of all Effects; design:
1-REPLICAT, 2-IRRIGAT, 3-NITROGEN
Effectdf
Effect
ms
Effect
1
2
3
12
13
23
123
3
1
2
3
6
2
6
41.154
570.375
169.542
10.931
2.819
47.375
1.431

Example 20: Split-plot designs with unequal numbers of subplots

Milliken and Johnson (1984, p. 385) discuss an example of a such a design. Five patients suffering from depression were randomly assigned to one of two treatment conditions (Treatment: Placebo vs. Drug). They were then examined after one week and after five weeks; the dependent variable was the patients' depression score during those examinations. Two patients did not return for the second examination, creating an unequal number of subplots in the design. In STATISTICA: ANOVA/MANOVA the results were produced via analysis of covariance, with covariates coding the effect for subjects within-treatment conditions. Here are the Type III sums of squares for the effects of interest (for a discussion of the choice of error terms see Milliken & Johnson, p. 394).

manovaSummary of all Effects; design:
1-TREATMNT, 2-WEEK
Effectdf Effectms Effect
1
2
12
1
1
1
15.5648
24.0833
4.0833

Example 21: Youden square designs

An example of a 4 x 4 Youden square with three factors A, B, and C is presented in Lindman (1974, p. 209). Factor A is "rotated" in its position with respect to factor B. Here is the table of all effects computed by STATISTICA: ANOVA/MANOVA, with the B x C interaction as the error term (see also the results table in Lindman, page 209).

general
manova
Summary of all Effects; design:
1-B, 2-C
ERROR: B x C
Effectdf
Effect
ms
Effect
df
Error
ms
Error
Fp
1
2
3
2
47.000
39.000
3
3
7.000
7.000
6.7143
5.5714
.076
.098

Example 22: A 4 x 11 nested design with unequal numbers of levels (missing cells)

Milliken and Johnson (1984, p. 415) present an example dataset, comparing 11 insecticides produced by four different companies. One company makes three insecticides, another makes four, and the remainder make two each. The effect for Insecticide (nested within Company) was tested via planned comparisons. Shown below are the results computed by STATISTICA: ANOVA/MANOVA (note that these results are slightly different than those reported in Milliken and Johnson on page 422; the analysis reported there is not consistent, and a typographical error must have found its way into the presentation; e.g., compare the mean reported on page 417 for the last group with the data from page 415).


general
manova
Planned Comparison
1-COMPANY 2-PRODUCT
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
1500.58
1260.00
7
2
214.369
57.273
3.743.0081

Example 23: A complex design with many missing cells (testing Type IV hypotheses)

Milliken and Johnson (1984, p. 202) discuss a complex example of a design with many missing cells. The design contains 3 factors: Group (2 levels; whether or not subject received food stamps), Age (classified into three groups), and Race (black, hispanic, white). For brevity, only the results for the main effect for Race, and for the Race by Group interaction are shown below (see also results reported for the so-called Type IV analysis in Milliken and Johnson, Table 17.2, p. 203).

manovaPlanned Comparison (RACE)
1-AGE, 2-GROUP, 3-RACE
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
11.68
2627.47
2
92
5.8385
28.5595
1.991.1424
manovaPlanned Comparison (RACE x GROUP)
1-AGE, 2-GROUP, 3-RACE
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
113.70
2627.47
2
92
56.8517
28.5595
1.991.1424

Example 24: A 2 (between) x 3 x 3 (repeated measures) design with missing cells

This example is based on a (fictitious) dataset reported in Winer (1962, p. 324). The design has two repeated measures factors, each with 3 levels. Shown below is the summary univariate ANOVA table as computed by STATISTICA: ANOVA/MANOVA (see also Winer, p. 328); the multivariate tests for the Noise x Time interaction are also shown.

manovaSummary of all Effects; design:
1-NOISE, 2-TIME, 3-DIALS
Effectdf
Effect
ms
Effect
df
Error
ms
Error
Fp
1
*2
*3
*12
13
23
123
1
2
2
2
2
4
4
468.17
1861.17
1185.17
166.50
25.17
2.67
2.83
4
8
8
8
8
16
16
622.78
29.36
13.19
29.36
13.19
7.94
7.94
.75
63.39
89.82
5.67
1.91
.34
.36
.435
.000
.000
.029
.210
.850
.836
manovaINTERACTION: 1 x 2
1-NOISE, 2-TIME, 3-DIALS
TestValuep-level
Wilks' Lambda
Rao R Form 2 (2,3)

Pillai-Bartlett Trace
V (2,3)
.15607
8.11102

.84393
8.11102
.06166


.06166

Example 25: A multivariate repeated measures split-plot design

This example is based on data reported in the documentation for Ganova (Brecht & Woodward, 1985). The design is a multivariate repeated measures split-plot design with two between-group factors (2x3), two repeated measures factors (2x3), and two dependent variables. Shown below are the summary results computed by STATISTICA: ANOVA/MANOVA.

general
manova
Summary of all Effects; design:
1-A, 2-B, 3-FACTOR3, 4-FACTOR4
EffectWilks'
Lambda
Rao's Rdf 1df 2p
1
2
3
4
12
13
23
14
24
34
123
124
*134
234
1234
.75263
.71467
.62400
.66764
.28476
.52986
.40968
.54561
.20559
.81136
.23354
.24201
.06450
.13046
.02388
.8217
.4573
1.5064
.3734
2.1849
2.2183
1.4059
.6246
.9041
.1744
2.6732
.7746
10.8778
1.3265
4.1035
2
4
2
4
4
2
4
4
8
4
4
8
4
8
8
5
10
5
3
10
5
10
3
6
3
10
6
3
6
6
.4914
.7656
.3076
.8175
.1442
.2044
.3008
.6777
.5654
.9376
.0945
.6410
.0394
.3756
.0512

Example 26: Multivariate analysis of covariance, multivariate tests of parallelism

In this example we will specify a multivariate analysis of variance design with multiple covariates and test the parallelism hypothesis. The example is based on a dataset reported by Finn (1974); the design has 4 groups, 2 dependent variables, and 3 covariates. Shown below are the results for the between-group factor (see Finn, 1974, p. C-54; see also Enslein, Ralston, & Wilf, 1977, p. 262), the summary for the covariates (see Finn, 1974, p. C49/50; Enslein et al., p. 258), and tests of the parallelism hypothesis (see Finn, 1974, p. C-45; Enslein et al., p. 255).

general
manova
MAIN EFFECT: GROUP
1-GROUP
TestValuep-level
Wilks' Lambda
Rao R Form 1(6,80)

Pillai-Bartlett Trace
V(6,82)
.69357
2.67672

.31790
2.58289
.02031


.02422
general
manova
MULTIVARIATE TESTS
Within Cells Regression
3 Covariates
TestValuep-level
Wilks' Lambda
Rao R Form 1 (6,80)

Pillai-Bartlett Trace
V (6,82)
.90843
.65590

.09279
.66493
.68527


.67810
general
manova
MULTIVARIATE TESTS
OF PARALLELISM
TestValuep-level
Wilks' Lambda
Rao R Form 1 (18,62)

Pillai-Bartlett Trace
V (18,64)
.74076
.55759

.27393
.56428
.91589


.91194

Example 27: Longley dataset (linear regression) The so-called Longley data (Longley, 1967) is a well known dataset for testing multiple regression programs for their ability to handle regression problems with redundant predictor variables. Shown below are the (partial) results computed by STATISTICA: Multiple Regression (see Longley, 1967; Elliott, Reisch, & Campbell, 1989, p. 296).

Dependent Variable: TOTAL
Multiple R: .997736942
Multiple R-Square: .995479005
Adjusted R-Square: .992465008
Number of cases: 16
F ( 6, 9 ) = 330.2853 p < .000000
Standard Error of Estimate: 304.85407356
Intercept: -3482258.635 Std.Error: 890420.4

multiple
regress.
Parameters
variableBSt. Err.
of B
DEFLATOR
GNP
UNEMPLOY
ARMFORCE
POPULATN
TIME
15.06187227143
-.03581917929
-2.02022980382
-1.03322686717
-.05110410565
1829.15146461400
84.914925774771
.033491007772
.488399681652
.214274163162
.226073200069
455.478499142310

Note that there is a typographical error in the table presented in Elliott et al., 1989 (Table 4.3.1, p. 296); specifically, the B coefficient for TIME is 1829.151464614 (as reported in STATISTICA) and not 1829.15146416 (6 and 1 are reversed).

To our knowledge, STATISTICA is the only statistics program available on the market that will correctly compute and report regression coefficients for the Longley dataset with this level of precision (Excel will correctly report the first 8 significant digits, Lotus will correctly report all 12 digits).

Example 28: Polynomial regression

Elliott, Reisch, and Campbell (1989, p. 295) present a data file to test polynomial regression. Shown below are the (partial) results computed by STATISTICA: Multiple Regression for the sixth degree polynomial fit (see Elliott, Reisch, and Campbell, 1989, p. 297). Note that this test is even more "demanding" than the previous one and an extremely low setting of the minimum tolerance parameter is required to obtain the parameter estimates.

Dependent Variable: Y_HR
Multiple R: .996793635
Multiple R-Square: .993597550
Adjusted R-Square: .990396325
Number of cases: 19
F ( 6, 12) = 310.3804 p < .000000
Standard Error of Estimate: .308965061
Intercept: 157.88215543 Std.Error: 73.68338

multiple
regress.
Regression Weights
variableBSt. Err. of B
X_KG
P2
P3
P4
P5
P6
-330.97580114610
364.04271758509
-199.36108558038
58.11303781881
-8.60698967739
.50963834084
192.284963071360
201.286163909620
108.400947552150
31.758798390784
4.813032615799
.295596359040

Example 29: Kaplan-Meier product limit estimates

Lee (1992, p. 25) discusses a dataset first presented by King et. al. (1979). Shown below is part of the product-limit analysis for the low-fat group of rats as computed by STATISTICA: Survival Analysis (see also Lee, 1992, p. 74-75).

survivalKaplan-Meier (Product-limit) analysis
Note: Censored cases are marked with +
Case
Number
TimeCumulatv
Survival
Standard
Error
3
12
4
13
14
9
10
5
11
. . .
50.0000
56.0000
65.0000
66.0000
73.0000
77.0000
84.0000
86.0000
87.0000
. . .
.966667
.933333
.900000
.866667
.833333
.800000
.766667
.733333
.700000
. . .
.032773
.045542
.054772
.062063
.068041
.073030
.077220
.080737
.083666
. . .

Example 30: Comparing multiple samples of censored survival times

Lee (1992, p. 127) presents a dataset of initial remission times for leukemia patients as a function of three treatments. Shown below is the summary of the comparison computed by STATISTICA: Survival Analysis (see also Lee, p. 127).

Variable: TIME
Variable with censoring indicator: CENSORED
Grouping variable: GROUP (3 Groups)
Total number of valid observations: 66
uncensored: 52 ( 78.79%)
censored: 14 ( 21.21%)
Chi-square = 3.61183 df = 2 p = .16434

Example 31: Proportional hazard regression for censored data

Crowley and Hu (1977) present an analysis of the well-known Stanford heart transplant data. Shown below are the (partial) results of the (Cox) proportional hazard regression analysis computed by STATISTICA: Survival Analysis (see also Brown, Engelman, Jennrich, 1990, p. 773).

Regression Results: Proportional hazard (Cox) regression
Total number of valid observations: 65
uncensored: 29 ( 44.62%)
censored: 36 ( 55.38%)

survivalParameter Estimates
Log-Likelihood of final solution: -87.867
VariableBetaStandard
Error
t-levelexponent
Beta
AGE
ANTIGEN
MISMATCH
.10909
-.04878
1.06372
.03329
.47165
.39460
3.27658
-.10342
2.69570
1.11526
.95239
2.89713

Example 32: Exponential regression model for censored data

Lawless (1982, p. 287) discusses an example censored dataset pertaining to lung cancer survival and fits to it an exponential regression model with six covariates (plus a constant). Shown below are the parameter estimates and their asymptotic standard errors computed by STATISTICA: Survival Analysis (see also Lawless, p. 288).

survivalParameter Estimates
VariableBetaStd. Errt-level
X1
X2
X3
X4
X5
X6
X7
Constant
.05442
.00887
.00336
.33865
-.12069
-.86560
-.28398
4.74008
.01082
.01977
.01166
.44556
.48623
.58663
.38902
.40562
5.0302
.4484
.2882
.7601
-.2482
-1.4756
-.7300
11.6861

Example 33: Stepwise discriminant function analysis and canonical analysis

The "classic" Iris dataset (Fisher, 1936) is widely referenced to discuss discriminant function analysis. Shown below is the summary of the stepwise discriminant function analysis for those data, and the summary of the canonical analysis with all variables in the model (see also Jennrich 1977, pp. 92-94; Brown et al., 1990, p. 341-342).

Number of variables in the model: 4
Wilks' Lambda: .023439
Approx. F (8,288) = 199.145 p <0.00000

discrim.Summary of Stepwise Analysis
Variable
Entered
No. of
vars.in
LambdaF-leveldf 1df 2
PETALLEN
SEPALWID
PETALWID
SEPALLEN
1
2
3
4
.05863
.03688
.02498
.02344
1180.16
307.11
257.50
199.15
2
4
6
8
147
292
290
288
discrim.Standardized Coefficients
for Canonical Variables
VariableRoot 1Root 2
PETALLEN
SEPALWID
PETALWID
SEPALLEN
Eigenval.
-.94726
.52124
-.57516
.42695
32.19193
-.401038
.735261
.581040
.012408
.285391

Example 34: Log-linear model (a 5-way frequency table)

Bishop, Fienberg, & Holland (1978, p. 103) present a complex 5-way frequency table describing the three-year survival of cancer patients in different locations. Shown below are the tests of all models of full order (see also Brown et al., 1983, p. 180; note that delta=0.5 was added to each cell in the frequency table).

log-lin.Results of Fitting all K-Factor
Interactions
K-FactordfMax.Lik.
Chi-squ.
pPearson
Chi-sq.
p
1
2
3
4
8
23
28
12
632.156
134.425
30.909
9.012
.0000
.0000
.3212
.7019
881.251
141.228
31.233
8.928
.0000
.0000
.3069
.7091

Example 35: Experimental Design: A 2**(7-4) fractional factorial design

Box, Hunter, and Hunter (1978, p. 391) present an example data set for a 2-level fractional factorial design; specifically the design is a 2**(7-4) fractional factorial. Shown below are the effect estimates as computed by STATISTICA: Experimental Design (see also Box, Hunter, & Hunter, p. 392).

experim.
design
2**(7-4) design of resolution R = III
TIME; m = 66.50000 s = 13.84609
EffectEffect
Estimate
Sums of
Squares
1:SEAT
2:DYNAMO
3:HANDLBRS
4:GEAR
5:RAINCOAT
6:BREAKFST
7:TIRES
3.50000
12.00000
1.00000
22.50000
.50000
1.00000
2.50000
24.50
288.00
2.00
1012.50
.50
2.00
12.50

Example 36: Experimental Design: A second-order central composite (response surface) design

Box, Hunter, and Hunter (1978, p. 519) present an example data set for a 2-factor second-order central composite (response surface) design with two blocks. Shown below are the parameter estimates computed by STATISTICA: Experimental Design (see Box et al., p. 520).

experim.
design
Parameter Estimates; Variable: YIELD
2**(2-0) 2nd order central composite design
m=83.88333 s=4.39293 Intercept=87.3750
EffectParamet.Std.Err.
of Par.
C vs. S
TIME
DEGREES
1**2
2**2
1 by 2
-.85003
-1.38374
.36199
-2.14377
-3.09379
-4.87500
.506913
.620843
.620843
.694129
.694129
.878000

Example 37: Experimental Design: A Taguchi robust design experiment (L18, S/N: Smaller-the-Better)

Phadke (1989, p. 82-83) discusses in detail the analysis of a robust design experiment pertaining to the manufacture of silicon wafers. Shown below is the summary ANOVA table computed by STATISTICA: Experimental Design for the Surface Defect data (a smaller-the-better problem; see also Phadke, p. 88, Table 4.6); note that as described in Phadke (p. 88), factor Cleaning was pooled into the error term.

experim.
design
Analysis of Variance
m = -45.362 s = 24.4841
* - effect pooled into error
EffectSSdfmsF
{1}TEMPERAT
{2}PRESSURE
{3}NITROGEN
{4}SILANE
{5}SETT_TIM
*CLEANING
4427.24
3415.55
1029.52
371.93
378.28
163.52
2
2
2
2
2
2
2213.62
1707.77
514.76
185.97
189.14
27.26
21.03
6.34
2.29
2.33
Residual568.46781.21

Examples 38-52: Analysis of Benchmark datasets

A standard set of benchmark datasets for the most common analyses was originally proposed by Elliott, Reisch, & Campbell (1989) and has since then been used in published reviews of statistical packages. Shown below are the results for all proposed benchmark analyses (and extensions of some of those tests designed to make them more demanding) as computed by STATISTICA.

Example 38: Descriptive statistics with small relative variances

Here are the results computed for the example dataset proposed by Elliott et al. (p. 290). To demonstrate the precision of STATISTICA we have extended the test to extremely small relative variances (100000000001 to 100000000009).

basic
stats
Descriptive Statistics
N. of Cases = 9
(MD pairwise deleted)
MeanSt. Err.ST. Dev.
V1
V2
V3
V4
V5
V6
V7
V8
V9
1005.0000
10005.0000
100005.0000
1000005.0000
10000005.0000
100000005.0000
1000000005.0000
10000000005.0000
100000000005.000
.91287092917528
.91287092917528
.91287092917528
.91287092917528
.91287092917528
.91287092917528
.91287092917528
.91287092917528
.91287092917528
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720
2.73861266370720

Example 39: Independent group t-test

Here are the results computed for the t-test benchmark dataset proposed by Elliott et al. (p. 290).

basic
stats
T-test; indep.var: FERTLZR
[1 gr.= PRESENT] [2 gr.= NEWER]
N. of Cases = 18
t2-tailed p
Height-2.988440.008686

Example 40: Paired t-test

Here are the results computed for the paired t-test benchmark dataset proposed by Elliott et al. (p. 290).

basic statsSingle t-Tests
ComparisontpNE(X-Y)D(X-Y)
HINDLEG-FORELEG3.41379.00770103.30003.0569

Example 41: One-way ANOVA (test 1)

Here are the results of the one-way ANOVA benchmark (Example 1) proposed by Elliott et al. (p. 291).

manovaMAIN EFFECT: FEED
1-FEED
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
4226.348
28.350
3
15
1408.783
8.557
164.64.0000

Example 42: One-way ANOVA (test 2)

Here are the results of the one-way ANOVA benchmark (Example 2) proposed by Elliott et al. (p. 291).

manovaMAIN EFFECT: CONDITN
1-CONDITN
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
10.6622
7.1663
4
9
2.66556
.79626
3.3476.0611

Example 43: One-way repeated measures ANOVA

Here are the results for the one-way repeated measures ANOVA benchmark proposed by Elliott et al. (p. 292).

manovaMAIN EFFECT: Drug
1-Drug
Univar.
Test
Sum of
Squares
dfMean
Square
Fp
Effect
Error
698.200
112.800
3
12
232.733
9.400
24.759.0000

Example 44: Two-way ANOVA (balanced)

Here are the results for the two-way balanced ANOVA benchmark proposed by Elliott et al. (p. 292).

manovaSummary of all Effects; design:
1-GENDER, 2-HORMONE
Effectdf
Effect
ms
Effect
df
Error
ms
Error
Fp
1
*2
12
1
1
1
70.31
1386.11
4.90
16
16
16
22.898
22.898
22.898
3.071
60.534
.214
.0989
.0000
.6449

Example 45: Two-way ANOVA (unbalanced)

Here are the results for the unbalanced ANOVA benchmark data proposed by Elliott et al. (p. 293). We show here only the results for the Type III analysis (as "recommended" by Elliott et al., Table 3.7.2); note that Type I and II analyses can also be performed with STATISTICA: ANOVA/MANOVA.

manovaSummary of all Effects (Type III SS)
Design: 1-DRUG, 2-DISEASE
Effectdf
Effect
ms
Effect
df
Error
ms
Error
Fp
*1
2
12
3
2
6
999.157
207.937
117.878
46
46
46
110.453
110.453
110.453
9.046
1.883
1.067
.0001
.1637
.3958

Example 46: Simple linear regression

Here are the results (computed via STATISTICA: Multiple Regression) for the data proposed by Elliott et al. (p. 294; note that the result reported in Elliott as r-square is in fact the result for the simple Pearson correlation coefficient r).

R: .726305400
R-Square: .527519535
Intercept: 4.910512449 St.Er: 6.627462 t(11)=.7409 p<.47

VariableBt(11)p
HANDGUNS.03761144230943.5044808186082.00493

Example 47: Multiple linear regression (Example 1)

Here are the results for the data proposed by Elliott et al. (p. 295; note that the result reported in Elliott as r-square is in fact the result for the multiple correlation coefficient r).

Multiple R: .922119692
Multiple R-Square: .850304726
Intercept: 2.085724401

multiple
regress.
Regression Weights
VariableBSt. Err. of B
X1
X2
.0569873379910
1.0500229564602
2.6131042380235
.3262103147516

Example 48: Multiple linear regression (Example 2)

The next multiple regression benchmark proposed by Elliott et al., 1989 (p. 295, Example 2) is based on the well-known Longley dataset (with redundant predictor variables, Longley, 1967). The results of this test are reported in Example 27, above. As mentioned before (see our Example 27), there is a typographical error in the table presented in Elliott et al., 1989 (Table 4.3.1, p. 296). Specifically, the B coefficient for TIME is 1829.151464614 (as reported in STATISITCA) and not 1829.15146416 (6 and 1 are reversed).

To our knowledge, STATISTICA is the only statistics program available on the market that will correctly compute and report regression coefficients for the Longley dataset with this level of precision (Excel will correctly report the first 8 significant digits, Lotus will correctly report all 12 digits).

Example 49: Multiple linear regression (Example 3)

Here again are the (partial) results for the polynomial regression problem reported in Elliott et al. (1989, p. 297). Note that this test is even more "demanding" than the previous one and an extremely low setting of the minimum tolerance parameter is required to obtain the parameter estimates for the sixth order polynomial.

Dependent Variable: Y_HR
Multiple R: .996793635
Intercept: 157.88215543 Std.Error: 73.68338

multiple
regress.
Regression Weights
variableBSt. Err. of B
X_KG
P2
P3
P4
P5
P6
-330.97580114610
364.04271758509
-199.36108558038
58.11303781881
-8.60698967739
.50963834084
192.284963071360
201.286163909620
108.400947552150
31.758798390784
4.813032615799
.295596359040

Example 50: A 2 x 2 contingency table and Fisher exact test

Here are the results for the 2x2 contingency table presented in Elliott et al. (p. 295).

Chi-square (N = 29) = 4.89 p < .0271
Phi-Square = .168521
Fisher Exact Probability (one-tailed): .032884

Example 51: An R x C contingency table (Example 1)

Here are the (partial) results for the 2x4 contingency table presented in Elliott et al. (p. 298).

df p
Maximum Likelihood Chi-square: 9.51215 3 .023216
Pearson Chi-square: 8.98718 3 .029477

log=lin.
analysis
Expected Freq.: GENDER by HAIR_COL
GENDERHAIR_COL
BLACK
HAIR_COL
BROWN
HAIR_COL
BLONDE
HAIR_COL
RED
TOTAL
MALE
FEMALE
29.00000
58.00000
36.0000
72.0000
26.66667
53.33333
8.33333
16.66667
100.0000
200.0000
Total87.00000108.000080.0000025.00000300.0000

Example 52: An R x C contingency table (Example 2)

Here are the (partial) results for the 2x4 contingency table presented in Elliott et al. (p. 298). Note that the expected frequency for group Negative/Days_0 is incorrectly reported in Elliott et al. as 14628 (and thus the expected frequencies in the first column do not add up to the marginal frequency); the correct expected frequency for this cell is 14628.5.

df p
Maximum Likelihood Chi-square: 62.4336 3 .000000
Pearson Chi-square: 283.047 3 .000000

log=in.
analysis
Expected Freq.: DAYS by GROUP
crossprd
DAYS
GROUP
DAYS_0
GROUP
DAYS 1_2
GROUP
DAYS 3_5
GROUP
DAYS 6_9
Total
NEGATIVE
POSITIVE
14628.50
42.50
312.0932
.9068
147.5712
.428855
.83777
.16223
15144.00
44.00
Total14671.00313.0000148.000056.0000015188.00


References

Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis. Cambridge, MA: MIT Press.

Box, G. E. P., Hunter, W. G., & Hunter, S. J. (1978). Statistics for experimenters: An introduction to design, data analysis, and model building. New York: Wiley.

Brecht, L., & Woodward, A. (1985). Ganova (Technical Documentation). Unpublished Manuscript.

Brown, M. B., Engelman, L., & Jennrich, R. I. (1990). BMDP, Statistical software manual. Los Angeles: University of California Press.

Brown, M. B., Engelman, L., Frane, J. W., Hill, M. A., Jennrich, R. I., & Toporek, J. D. (1983). BMDP, Statistical software manual. Los Angeles: University of California Press.

Cox, D. R. (1970). The analysis of binary data. New York: Halsted Press.

Crowley, J., & Hu, M. (1977). Covariance analysis of heart transplant survival data. Journal of the American Statistical Association,72, 27-36.

Elliott, A. C., Reisch, J. S., & Campbell, N. P. (1989). Benchmark datasets for evaluating microcomputer statistical programs. Collegiate Microcomputer,11, 289-299.

Enslein, K., Ralston, A., & Wilf, H. S. (1977). Statistical methods for digital computers. New York: Wiley.

Fienberg, S. E. (1977). The analysis of cross-classified categorical data. Cambridge, MA: MIT Press.

Finn, J. D. (1974). A general model for multivariate analysis. New York: Holt, Rinehart & Winston.

Finn, J. D. (1977). Multivariate analysis of variance and covariance. In K. Enslein, A. Ralston, and H. S. Wilf (Eds.). Statistical methods for digital computers. Vol. III, New York: Wiley.

King, M., Bailey, D.M., Gibson, D.G., Pitha, J.V., & McCay, P. B. (1979). Incidence and growth of mammary tumors induced by 7,12-dimethylbenz (alpha) antheacene as related to the dietary content of fat and antioxidant. Journal of the National Cancer Institute, 63, 656-664.

Lawless, J. F. (1982). Statistical models and methods for lifetime data. New York: Wiley.

Lee, E. T. (1992). Statistical methods for survival data analysis (2nd edition). New York: Wiley.

Lindman, H. R. (1974). Analysis of variance in complex experimental designs. San Francisco: W. H. Freeman & Co.

Longley, J. W. (1967). An appraisal of least squares programs for the electronic computer from the point of view of the user. JASA,62, 819-831.

Milliken, G. A. & Johnson, D. E. (1984). Analysis of messy data. Vol. I: Designed experiments. New York: Van Nostrand Reinhold, Co.

Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statistical models: Regression, analysis of variance, and experimental designs. Homewood, Ill.: Irwin.

Phadke, M. S. (1989). Quality engineering using robust design. Englewood Cliffs, NJ: Prentice Hall.

Searle, S. R. (1987). Linear models for unbalanced data. New York: Wiley.

Winer, B. J. (1962). Statistical principles in experimental design. New York: McGraw-Hill. (2nd edition, McGraw-Hill, 1971).

Woodward, J. A., Bonett, D. G., & Brecht, M. L. (1990). Introduction to linear models and experimental design. New York: Harcourt, Brace, Jovanovich.


This material was developed by StatSoft, Inc. StatSoft, Inc. does not copyright the selection of the benchmark materials used in this text and explicitly encourages the use of those tests by others to the benefit of all statistics software. The proper citation for this selection of validation benchmarks is: Validation Benchmarks for Statistical Algorithms (version 1B). (1992). Tulsa, OK: StatSoft. We would appreciate your comments or questions.

Back to Top
Request Quote
StatSoft Home Page



[StatSoft] Pacific
Suite 1, 46-48 Howard Street
North Melbourne VIC 3051
Australia
Phone: +61 3 9348 9422
Fax: +61 3 9348 9420

[StatSoft]e-mail: info@statsoft.com.au

©Copyright StatSoft, Inc., 1984-2006.
StatSoft, StatSoft logo, STATISTICA, Enterprise/QC, Enterprise, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.