
![]() |
Features of STATISTICA Multivariate Exploratory Techniques
STATISTICA Multivariate Exploratory Techniques offers a broad selection of exploratory techniques, from
cluster analysis to advanced classification trees methods, with an endless array of interactive visualization tools
for exploring relationships and patterns; built-in complete Visual Basic scripting.
STATISTICA Multivariate Exploratory Techniques is compatible with Windows 2000 and Windows XP. It features the following modules:
Cluster Analysis Techniques
Factor Analysis
Principal Components & Classification Analysis
Canonical Correlation Analysis
Reliability/Item Analysis
Classification Trees
Correspondence Analysis
Multidimensional Scaling
Discriminant Analysis
General Discriminant Analysis Models (GDA)
![]() CLUSTER ANALYSIS.
This module includes a comprehensive implementation of clustering methods (k-means, hierarchical clustering, two-way joining).
The program can process data from either raw data files or matrices of distance measures.
The user can cluster cases, variables, or both based on a wide variety of distance measures (including Euclidean, squared
Euclidean, City-block (Manhattan), Chebychev, Power distances, Percent disagreement, and 1-r) and amalgamation/linkage rules
(including single, complete, weighted and unweighted group average or centroid, Ward's method, and others). Matrices of distances
can be saved for further analysis with other modules of the STATISTICA system. In k-means clustering, the user has full control
over the initial cluster centers.
Extremely large analysis designs can be processed; for example, hierarchical (tree) joining can analyze matrices with over 1,000
variables, or with over 1 million distances.
In addition to the standard cluster analysis output, a comprehensive set of descriptive statistics and extended diagnostics
(e.g., the complete amalgamation schedule with cohesion levels in hierarchical clustering, the ANOVA table in k-means clustering)
is available. Cluster membership data can be appended to the current data file for further processing.
Graphics options in the Cluster Analysis module include customizable tree diagrams, discrete contour-style two-way joining matrix
plots, plots of amalgamation schedules, plots of means in k-means clustering, and many others.
|
FACTOR ANALYSIS.
The Factor Analysis module contains a wide range of statistics and options, and provides a comprehensive implementation of
factor (and hierarchical factor) analytic techniques with extended diagnostics and a wide variety of analytic and exploratory
graphs. It will perform principal components, common, and hierarchical (oblique) factor analysis, and can handle extremely large
analysis problems (e.g., with thousands of variables). Confirmatory factor analysis (as well as path analysis) can also be
performed via the Structural Equation Modeling and Path Analysis (SEPATH) module found in the add-on
STATISTICA Advanced Linear/Non-Linear Models.
|
PRINCIPAL COMPONENTS & CLASSIFICATION ANALYSIS. STATISTICA also includes a designated program for principal components and classification analysis. The output includes eigenvalues (regular, cumulative, relative), factor loadings, factor scores (which can be appended to the input data file, reviewed graphically as icons, and interactively recoded), and a number of more technical statistics and diagnostics. Available rotations include Varimax, Equimax, Quartimax, Biquartimax (either normalized or raw), and Oblique rotations. The factorial space can be plotted and reviewed "slice by slice" in either 2D or 3D scatterplots with labeled variable-points; other integrated graphs include Scree plots, various scatterplots, bar and line graphs, and others.
After a factor solution is determined, the user can recalculate (i.e., reconstruct) the correlation matrix from the respective
number of factors to evaluate the fit of the factor model.
Both raw data files and matrices of correlations can be used as input. Confirmatory factor analysis and other related analyses
can be performed with the Structural Equation Modeling and Path Analysis (SEPATH) module available in STATISTICA Advanced Linear/Non-Linear Models, where a designated Confirmatory Factor Analysis Wizard will guide you step by step through the process of specifying the model.
|
CANONICAL CORRELATION ANALYSIS.
This module offers a comprehensive implementation of canonical analysis procedures; it can process raw data files or correlation
matrices and it computes all of the standard canonical correlation statistics (including eigenvectors, eigenvalues, redundancy
coefficients, canonical weights, loadings, extracted variances, significance tests for each root, etc.) and a number of extended
diagnostics. The scores of canonical variates can be computed for each case, appended to the data file, and visualized via integrated icon plots. The Canonical Analysis module also includes a variety of integrated graphs (including plots
of eigenvalues, canonical correlations, scatterplots of canonical variates, and many others).
Note that confirmatory analyses of structural relationships between latent variables can also be performed via the SEPATH (Structural Equation Modeling and Path Analysis)
module in STATISTICA Advanced Linear/Non-Linear Models; advanced stepwise and best-subset selection of predictor variables
for MANOVA/MANCOVA designs (with multiple dependent variables) is available in the General Regression Models (GRM) module in STATISTICA Advanced Linear/Non-Linear Models.
|
RELIABILITY/ITEM ANALYSIS.
This module includes a comprehensive selection of procedures for the development and evaluation of surveys and questionnaires.
As in all other modules of STATISTICA, extremely large designs can be analyzed. The user can calculate reliability statistics
for all items in a scale, interactively select subsets, or obtain comparisons between subsets of items via the "split-half"
(or split-part) method. In a single run, the user can evaluate the reliability of a sum-scale as well as subscales. When
interactively deleting items, the new reliability is computed instantly without processing the data file again.
The output includes correlation matrices and descriptive statistics for items, Cronbach alpha, the standardized alpha, the
average inter-item correlation, the complete ANOVA table for the scale, the complete set of item-total statistics (including
multiple item-total R's), the split-half reliability, and the correlation between the two halves corrected for attenuation.
A selection of graphs (including various integrated scatterplots, histograms, line plots and other plots) and a set of
interactive what-if procedures are provided to aid in the development of scales. For example, the user can calculate the
expected reliability after adding a particular number of items to the scale, and can estimate the number of items that
would have to be added to the scale in order to achieve a particular reliability. Also, the user can estimate the
correlation corrected for attenuation between the current scale and another measure (given the reliability of the current scale).
|
![]() CLASSIFICATION TREES.
STATISTICA's Classification Trees module provides a comprehensive implementation of the most recently
developed algorithms for efficiently producing and testing the robustness of classification trees (a classification tree is
a rule for predicting the class of an object from the values of its predictor variables). STATISTICA Data Miner offers additional advanced methods for tree classifications such as Boosted Trees, Random Forests, General Classification and Regression Tree Models (GTrees) and General CHAID (Chi-square Automatic Interaction Detection) models facilities.
Classification trees can be produced using categorical predictor variables, ordered predictor variables, or both, and using
univariate splits or linear combination splits. Analysis options include performing exhaustive splits or discriminant-based splits; unbiased variable selection (as in QUEST); direct stopping rules (as in FACT) or bottom-up pruning
(as in C&RT); pruning based on misclassification rates or on the deviance function; generalized Chi-square, G-square, or Gini-index
goodness of fit measures. Priors and misclassification costs can be specified as equal, estimated from the data, or user-specified.
The user can also specify the v value for v-fold cross-validation during tree building, v value for v-fold cross-validation for
error estimation, size of the SE rule, minimum node size before pruning, seeds for random number generation, and alpha value for
variable selection. Integrated graphics options are provided to explore the input and output data. See Also: General Classification and Regression Trees (GTrees) General CHAID (Chi-square Automatic Interaction Detection) Models
|
![]() CORRESPONDENCE ANALYSIS.
This module features a full implementation of simple and multiple correspondence analysis techniques, and can analyze even
extremely large tables. The program will accept input data files with grouping (coding) variables that are to be used to
compute the crosstabulation table, data files that contain frequencies (or some other measure of correspondence, association,
similarity, confusion, etc.) and coding variables that identify (enumerate) the cells in the input table, or data files with
frequencies (or other measure of correspondence) only (e.g., the user can directly type in and analyze a frequency table).
For multiple correspondence analysis the user can also directly specify a Burt table as input for the analysis.
The program will compute various tables, including the table of row percentages, column percentages, total percentages,
expected values, observed minus expected values, standardized deviates, and contributions to the Chi-square values.
The Correspondence Analysis module will compute the generalized eigenvalues and eigenvectors, and report all standard
diagnostics including the singular values, eigenvalues, and proportions of inertia for each dimension. The user can either
manually choose the number of dimensions, or specify a cutoff value for the maximum cumulative percent of inertia. The program
will compute the standard coordinate values for column and row points. The user has the choice of row-profile standardization,
column-profile standardization, row and column profile standardization, or canonical standardization. For each dimension and
row or column point, the program will compute the inertia, quality, and cosine-square values. In addition, the user can display
(in spreadsheets) the matrices of the generalized singular vectors; like the values in all spreadsheets, these matrices can be
accessed via STATISTICA Visual Basic, for example, in order to implement non-standard methods of computing the coordinates.
The user can compute coordinate values and related statistics (quality and cosine-square values) for supplementary points
(row or column), and compare the results with the regular row and column points. Supplementary points can also be specified
for multiple correspondence analysis.
In addition to the 3D histograms that can be computed for all tables, the user can produce a line plot for the eigenvalues,
and 1D, 2D, and 3D plots for the row or column points. Row and column points can also be combined in a single graph, along
with any supplementary points (each type of point will use a different color and point marker, so the different types of
points can easily be identified in the plots). All points are labeled, and an option is available to truncate the names
for the points to a user-specified number of characters.
|
MULTIDIMENSIONAL SCALING.
The Multidimensional Scaling module includes a full implementation of (nonmetric) multidimensional scaling. Matrices of
similarities, dissimilarities, or correlations between variables (i.e., "objects" or cases) can be analyzed. The starting
configuration can be computed by the program (via principal components analysis) or specified by the user. The program
employs an iterative procedure to minimize the stress value and the coefficient of alienation. The user can monitor the
iterations and inspect the changes in these values.
The final configurations can be reviewed via spreadsheets, and via 2D and 3D scatterplots of the dimensional space with
labeled item-points. The output includes the values for the raw stress (raw F), Kruskal stress coefficient S, and the
coefficient of alienation. The goodness of fit can be evaluated via Shepard diagrams (with d-hats and d-stars). Like all
other results in STATISTICA, the final configuration can be saved to a data file.
|
![]() DISCRIMINANT ANALYSIS.
The Discriminant Analysis module is a full implementation of multiple stepwise discriminant function analysis.
STATISTICA also includes the General Discriminant Analysis Models module (below) for fitting ANOVA/ANCOVA-like designs
to categorical dependent variables, and to perform various advanced types of analyses (e.g., best subset selection
of predictors, profiling of posterior probabilities, etc.) .
The Discriminant Analysis program will perform forward or backward stepwise analyses, or enter user-specified
blocks of variables into the model. In addition to the numerous graphics and diagnostics describing the discriminant
functions, the program also provides a wide range of options and statistics for the classification of old or new cases
(for validation of the model).
The output includes the respective Wilks' lambdas, partial lambdas, F to enter (or remove), the p levels, the tolerance
values, and the R-square. The program will perform a full canonical analysis and report the raw and cumulative eigenvalues
for all roots, and their p levels, the raw and standardized discriminant (canonical) function coefficients, the structure
coefficient matrix (of factor loadings), the means for the discriminant functions, and the discriminant scores for each case
(which can also be automatically appended to the data file).
Integrated graphs include histograms of the canonical scores within each group (and all groups combined), special scatterplots
for pairs of canonical variables (where group membership of individual cases is visibly marked), a comprehensive selection of
categorized (multiple) graphs allowing the user to explore the distribution and relations between dependent variables across
the groups (including multiple box-and-whisker plots, histograms, scatterplots, and probability plots), and many others.
The Discriminant Analysis module will also compute the standard classification functions for each group. The classification
of cases can be reviewed in terms of Mahalanobis distances, posterior probabilities, or actual classifications, and the scores
for individual cases can be visualized via exploratory icon plots and other multidimensional graphs integrated directly with the
results spreadsheets. All of these values can be automatically appended to the current data file for further analyses. The summary
classification matrix of the number and percent of correctly classified cases can also be displayed. The user has several options
to specify the a priori classification probabilities and can specify selection conditions to include or exclude selected cases
from the classification (e.g., to validate the classification functions in a new sample).
|
GENERAL DISCRIMINANT ANALYSIS MODELS (GDA).
The STATISTICA General Discriminant Analysis Models (GDA) module is an application and extension of the General Linear Model to
classification problems. Like the Discriminant Analysis module, GDA allows you to perform standard and stepwise
discriminant analyses. GDA implements the discriminant analysis problem as a special case of the general linear model, and thereby
offers extremely useful analytic techniques that are innovative, efficient, and extremely powerful. As in traditional discriminant
analysis, GDA allows you to specify a categorical dependent variable. For the analysis, the group membership (with regard to the
dependent variable) is then coded into indicator variables, and all methods of GRM can be applied. In the results dialogs, the
extensive selection of residual statistics of GRM and GLM are available in GDA as well. GDA provides powerful and efficient tools
for data mining as well as applied research. GDA will compute all standard results for discriminant analysis, including
discriminant function coefficients, canonical analysis results (standardized and raw coefficients, step-down tests of canonical
roots, etc.), classification statistics (including Mahalanobis distances, posterior probabilities, actual classification of cases
in the analysis sample and validation sample, misclassification matrix, etc.), and so on. To read more about GDA's unique features,
click here.
|
Pacific
Suite 1, 46-48 Howard Street
North Melbourne VIC 3051
Australia
Phone: +61 3 9348 9422
Fax: +61 3 9348 9420
e-mail: info@statsoft.com.au
©Copyright StatSoft, Inc., 1984-2006.
StatSoft, StatSoft logo, STATISTICA, Enterprise/QC, Enterprise, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.