STATISTICA











Quality Application: SEWSS Successfully Integrates with
Steelcase Shop Floor Coordinate Measuring Machine (CMM)

StatSoft's STATISTICA Enterprise-wide SPC System (SEWSS) was recently featured as a "Quality Application" in the September 2002 issue of Quality Digest. The Case Study describes the successful SEWSS implementation in the basic division at Steelcase Inc., a company that produces interior architectural products worldwide.

The case study describes how Steelcase was able to improve process control for automatic die presses and manual press brakes and quickly identify that measurement error was a key source of variation using the Interactive QC Charts module of SEWSS. After switching from caliper measurment to a coordinate measuring machine (CMM), a team at Steelcase worked with StatSoft technical support so that the data collected by the CMM would be directly output into the SEWSS database and the latest data is always available for quick analysis with any of the SEWSS modules. Now, machine operators using the shop-floor CMM for inspection receive immediate feedback on the quality of key part features, and engineers have real-time access to the CMM inspection data for solving problems, making continuous improvements and verifying ongoing capability.

According to Mike Linde, quality engineer at Steelcase, "The enterprise-wide nature of the software makes it easy to share data for collaborative problem solving. Current data is available for review almost anywhere. Real-time charts and reports can be reviewed on the shop floor, at an engineers's desk or at a team meeting." To read the entire application story, click here.



Case Study: Oklahoma State University Students
Uncover, Explain and Predict with STATISTICA Data Miner

OSU Students Uncover, Explain and Predict with STATISTICA Data Miner

STATISTICA Data Miner's impressive array of data mining algorithms solved a variety of classification and prediction problems for Oklahoma State University graduate IT students.

STATISTICA Data Miner was the "premier choice" for OSU's Department of Management Science and Information Systems, according to Dr. Dursun Delen, Assistant Professor. Delen states, "Based on my years of experience in industry and in academia, I can confidently say that STATISTICA Data Miner has one of the most comprehensive data mining algorithms on the market." He also calls the graphical tools and their outputs "phenomenal."

Delen lists many other advantages over other data mining tools, including a graphical, interactive, user-friendly interface, rich set of visualization tools, processing speed, and its ability to be launched as a Web application. In spite of these many strengths, Delen also found STATISTICA Data Miner to be "less expensive than other comprehensive toolkits in the market."

OSU students used STATISTICA Data Miner for the prediction of diabetic illnesses based on demographic, social and recreational parameters; target marketing models for better promotional mailing; prediction of financial indicators such as the S&P 500; and foreign exchange rates, among others.

Click here to read the entire case study.



STATISTICA Data Warehouse - the ultimate, high-performance, scalable system for intelligent management of unlimited amounts of data, distributed across locations worldwide.

STATISTICA Document Management System - a scalable solution for flexible, productivity-enhancing management of local or Web-based document repositories (FDA/ISO compliant).

STATISTICA OLAP - a powerful exploratory, analytic, and reporting add-on application that integrates OLAP services with the analytic and data mining power of STATISTICA tools.

WebSTATISTICA Knowledge Portal - the ultimate knowledge-sharing tool incorporates the latest Internet technology and includes a powerful, flexible report generation tool and a secure system for information delivery.

Click here for the complete list of STATISTICA 6 products.



Featured Textbook Topic - Association Rules

A common task in many data mining projects, as well as in the data mining subcategory text mining, is to detect relationships or associations between specific values of categorical variables in large data sets. The powerful exploratory techniques have a wide range of applications in many areas of business practice and also research - from the analysis of consumer preferences or human resource management, to the history of language. These techniques enable analysts and researchers to uncover hidden patterns in large data sets, such as "customers who order product A often also order product B or C" or "employees who said positive things about initiative X also frequently complain about issue Y but are happy with issue Z." The implementation of the so-called a-priori algorithm (see Agrawal and Swami, 1993; Agrawal and Srikant, 1994; Han and Lakshmanan, 2001; see also Witten and Frank, 2000) allows you to process rapidly huge data sets for such associations, based on predefined "threshold" values for detection.

How association rules work. The usefulness of this technique to address unique data mining problems is best illustrated in a simple example. Suppose you are collecting data at the check-out cash registers at a large book store. Each customer transaction is logged in a database, and consists of the titles of the books purchased by the respective customer, perhaps additional magazine titles and other gift items that were purchased, etc. Hence, each record in the database will represent one customer (transaction), and may consist of a single book purchased by that customer, or it may consist of many (perhaps hundreds of) different items that were purchased, arranged in an arbitrary order depending on the order in which the different items (books, magazines, etc.) came down the conveyor belt at the cash register. The purpose of the analysis is to find associations between the items that were purchased, i.e., to derive association rules that identify the items and co-occurrences of different items that appear with the greatest (co-)frequencies. For example, you want to learn which books are likely to be purchased by a customer who you know already purchased (or is about to purchase) a particular book. This type of information could then quickly be used to suggest to the customer those additional titles. You may already be "familiar" with the results of these types of analyses, if you are a customer of various on-line (Web-based) retail businesses; many times when making a purchase on-line, the vendor will suggest similar items (to the ones purchased by you) at the time of "check-out", based on some rules such as "customers who buy book title A are also likely to purchase book title B," etc.

Unique data analysis requirements. Crosstabulation tables, and in particular Multiple Response tables can be used to analyze data of this kind. However, in cases when the number of different items (categories) in the data is very large (and not known ahead of time), and when the "factorial degree" of important association rules is not known ahead of time, then these tabulation facilities may be too cumbersome to use, or simply not applicable: Consider once more the simple "bookstore-example" discussed earlier. First, the number of book titles is practically unlimited. In other words, if we would make a table where each book title would represent one dimension, and the purchase of that book (yes/no) would be the classes or categories for each dimension, then the complete crosstabulation table would be huge and sparse (consisting mostly of empty cells). Alternatively, we could construct all possible two-way tables from all items available in the store; this would allow us to detect two-way associations (association rules) between items. However, the number of tables that would have to be constructed would again be huge, most of the two-way tables would be sparse, and worse, if there were any three-way association rules "hiding" in the data, we would miss them completely. The a-priori algorithm implemented in Association Rules will not only automatically detect the relationships ("cross-tabulation tables") that are important (i.e., cross-tabulation tables that are not sparse, not containing mostly zero's), but also determine the factorial degree of the tables that contain the important association rules.

To summarize, Association Rules will allow you to find rules of the kind If X then (likely) Y where X and Y can be single values, items, words, etc., or conjunctions of values, items, words, etc. (e.g., if (Car=Porsche and Gender=Male and Age<20) then (Risk=High and Insurance=High)). The program can be used to analyze simple categorical variables, dichotomous variables, and/or multiple response variables. The algorithm will determine association rules without requiring the user to specify the number of distinct categories present in the data, or any prior knowledge regarding the maximum factorial degree or complexity of the important associations. In a sense, the algorithm will construct cross-tabulation tables without the need to specify the number of dimensions for the tables, or the number of categories for each dimension. Hence, this technique is particularly well suited for data and text mining of huge databases.

Click here for more information on Association Rules.

Go to the Electronic Statistics Homepage for the complete textbook.



What is PNG and what are its advantages?

PNG is a graphic file format and provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-color, grayscale, and truecolor images are supported, plus an optional alpha channel. Sample depths range from 1 to 16 bits.

One reviewer of the .png format writes, “PNG is designed to work well in online viewing applications, such as the World Wide Web, so it is fully streamable with a progressive display option. PNG is robust, providing both full file integrity checking and simple detection of common transmission errors. Also, PNG can store gamma and chromaticity data for improved color matching on heterogeneous platforms.”

If you are a graphics designer or develop web pages, PNG files are used in HTML documents and STATISTICA 6.1 graphs embed into the HTML documents as PNG files. Such graphic design packages as Adobe Photoshop and web-design programs as Macromedia’s Flash support the excellent design features that a PNG file offers.

Andrew Zolli, senior technologist at Siegel & Gale, an international strategic communications, design, and interactive media development company and one of the developers of the software PNG Live has outlined these important advantages:

  • More colors. While GIF images are limited to 8-bit color, PNG images can be of any bit depth up to 48-bit color, allowing for images that contain literally trillions of colors - more colors than the human eye can see. This greatly expanded palette is essential for professional Web site developers, who must faithfully reproduce color logos and marketing materials for their clients.
  • Better compression. Because it uses a better compression method than the LZW (Lempel Zev Welch) algortihm used in GIF, 8-bit PNG images are 10-30% smaller than identical GIF images. Obviously, smaller files mean faster pages: the W3C has estimated that replacing GIF with PNG, using Cascading Style Sheets (CSS), and implementing HTTP 1.1 will make the Web run two to eight times faster, with no other infrastructure improvements. Such speed improvement will allow developers to be more visually expressive without damaging the user's experience.
  • Alpha channels. The PNG format supports a completely different kind of transparency than GIF. In a GIF image, each pixel is either transparent or opaque: This is called binary transparency. In a PNG image, each pixel can have one of 256 levels of relative transparency, from completely opaque, to semitransparent, to completely transparent. This improved transparency has several benefits. First, a wide range of creative possibilities emerge, such as allowing Web page designers to layer semitransparent images on top of one another. Second, with PNG, images become much more portable. Today, each transparent GIF image which appears in a Web page must be antialiased to the specific background against which it appears. If the background of the page is changed in any way, a "halo" of pixels appears around the GIF image content.With the PNG format's built-in alpha channels, however, this halo disappears, and images can be made to blend seamlessly with any background. This is a subtle but critical improvement, since the reproduction of GIF images for different backgrounds is an enormous drain on many developers schedules and inflates the cost of site development.
  • Gamma and chromaticity correction. PNG supports true gamma correction and color correction, so that images created on one operating system look "correct" when viewed on any other operating system. This is particularly important for many Web designers who create images on the Mac OS, which typically has a much "brighter" display than the Windows operating system. In principal, this means that Web page designers will be able to reduce the amount of cross-platform quality assurance testing that they do, lowering the cost and development time for Web sites.
  • Searchable Meta-Data. A PNG image can contain meta-data, non-displaying information that identifies the image's contents, revision history, or authorship. Users will be able to use search engines to look for individual images rather than having to search for pages that contain those images. Also, developers will be able to track image assets in a much more straightforward and powerful way.
  • Improved Interlacing. PNG employs a new interlacing method that "blurs" an image onto the screen horizontally and vertically. This method is far superior to the GIF "window shade" approach, and delivers a usable image eight times faster than GIF. Also, according to several usability analysis studies, small, bitmapped text in an interlaced PNG image can generally be read two times faster than in a GIF. This improved usability comes in addition to the speed improvements measured by the World Wide Web Consortium's study.
  • Royalty-free license. Work on PNG began in response to the announcement by Compuserve and UNISYS that they would charge users and developers for using the GIF format. Unlike GIF, PNG was developed from the start as a royalty-free format that could be used and developed without cost to developers and users. What's more, there are already source code libraries for parsing and creating PNG images available freely online, including some early Java classes for displaying PNG images. This royalty-free source code is a tremendous aid to developers who wish to embed PNG support in their existing or planned applications.
  • Easy extensibility. Unlike GIF, PNG uses an easily extensible format, similar in concept to HTML: data in a PNG image is encapsulated in "chunks" (like HTML tags) which are read by the PNG parser. If a PNG parser encounters an unrecognized chunk type, it will ignore that chunk. This allows developers to easily add new, application-specific data to PNG images.
  • "Smart" signing. A PNG file contains an internal signature that can detect the most common types of file corruption and report on the nature of the problem. Unlike most file signatures, the PNG file signature contains network sensitive characters - such as the escape, new line, and line reset characters - which are often altered if the PNG file is transmitted improperly, for example when the PNG MIME type is not set correctly in the server's MIME.TYPES file. This allows developers to much more easily diagnose what went wrong when an image was transmitted improperly. [i]

If you are working with graphic design needs in mind, in the important areas of file size, importing, masks, and transparent gradients, you should be “PNG-ing” to capture the excellent advantages.[ii]

[i] http://developer.netscape.com/viewsource/png.html
[ii] http://www.eyewire.com/magazine/columns/scott/png/



STATISTICA 6 Products "Take Traditional Quality Further"
According to Quality Digest Software Review (September 2002, p. 61)

Felix Grant, a well-known lecturer and reviewer in the UK, recently commented on the latest versions (version 6) of STATISTICA QC Charts, Data Miner, and Neural Networks in a software review published in the September issue of Quality Digest magazine.

In the review, Grant remarks on STATISTICA QC Chart's ability to handle large data sets, multiple databases, and real-time auto-updating saying, "There must be limits to the density and variability of data flow, but I’ve not yet discovered them -- despite some very demanding work that would make most software packages cry."

In the same review, Grant calls STATISTICA Data Miner "the best tool I’ve seen yet for actually applying what you learn at the sharp end of industry" and "the easiest to use data mining control I’ve encountered." He goes on to praise its friendly user interface and strong set of exploratory analytic routines.

Lastly, the reviewer addresses STATISTICA Neural Networks. Although admittedly tentative when it comes to using nn software, Grant finds that STATISTICA Neural Networks has "particular strengths for unfamiliar users and those with reporting or team working priorities." Grant summarizes,

    "These new additions to the STATISTICA 6 product range take traditional quality further, make it more accessible, and introduce a new level of integration that yield considerable synergy payoffs. If you have unsolved analytical problems, try STATISTICA 6."

To read the complete review go to the Quality Digest Web site.

Click here to view other STATISTICA awards, comments from users, and a complete summary of STATISTICA's unmatched record of reviews.





International Biometric Society - ENAR Spring Meeting
Tampa, Florida, USA - Marriott, Waterside
March 30 - April 2, 2003

Quality Expo International
Rosemont, Illinois, USA - Donald E. Stephens Convention Center
April 15 - 17, 2003
Booth #11090

17th Annual Control - International Trade Fair for Quality Assurance
Germany - Exhibition Centre Sinsheim
May 6 - 9, 2003
Hall 4, Booth #4316

57th Annual Quality Congress
Kansas City, Missouri, USA - Kansas City Convention Center
May 19 - 21, 2003
Booth #513

Click here to view the complete list of exhibits StatSoft will attend in 2003.




StatSoft, Inc. is pleased to announce the 2003 STATISTICA training schedule for the United States.

Featuring a variety of introductory and advanced training courses in major U.S. cities, StatSoft training classes offer:
  • Practical hands-on experience with the program
  • An introduction to real-world example applications
  • Energetic, helpful, knowledgeable instructors
  • Comprehensive take-home course manual
  • Personal attention, small class size
  • Interactive, class-paced learning

    In addition to a two day course on the Introduction to STATISTICA, StatSoft offers one day training for SPC, DOE, Multivariate Analysis, Anova/Regression, Neural Networks, Graphical Data Analysis, Visual Basic applications, and Six Sigma Statistics.

    Sign up today to enhance your knowledge and understanding of STATISTICA tools!

    Click here to view the dates and locations of courses in 2003.
    Click here to register, or email training@statsoft.com.au.

  • April 2003

    April 2, 2003 Introduction to Visual Basic and STATISTICA Visual Basic Tulsa, OK
    April 3, 2003 Visual Basic Applications in STATISTICA Tulsa, OK
    April 21-22, 2003 Introduction Tulsa, OK
    April 23, 2003 DOE Tulsa, OK
    April 24, 2003 SPC Tulsa, OK
    April 25, 2003 Graphical Data Analysis Tulsa, OK


    May 2003

    May 6-7, 2003 Introduction Ft. Lauderdale, FL
    May 8, 2003 ANOVA/Regression Ft. Lauderdale, FL
    May 9, 2003 Introduction to Visual Basic and STATISTICA Visual Basic Ft. Lauderdale, FL
    May 19-20, 2003 Introduction Tulsa, OK
    May 21, 2003 ANOVA/Regression Tulsa, OK
    May 22, 2003 Multivariate Analysis Tulsa, OK
    May 23, 2003 Introduction to Visual Basic and STATISTICA Visual Basic Tulsa, OK


    June 2003

    June 3-4, 2003 Introduction Philadelphia, PA
    June 5, 2003 DOE Philadelphia, PA
    June 6, 2003 SPC Philadelphia, PA
    June 16-17, 2003 Introduction Tulsa, OK
    June 18, 2003 Introduction to Visual Basic and STATISTICA Visual Basic Tulsa, OK
    June 19, 2003 ANOVA/Regression Tulsa, OK
    June 20, 2003 Multivariate Analysis Tulsa, OK
    June 23-24, 2003 Introduction Dallas, TX
    June 25, 2003 SPC Dallas, TX
    June 26, 2003 DOE Dallas, TX

    Register Now!


    Please note: StatSoft, Inc. will never share the email addresses of its subscribers with any company or organization and will never make them public. The list of our subscribers is treated as privileged information and is well protected. Also, you may unsubscribe from the StatSoft News letter at any time by creating an e-mail with "Unsubscribe" in the subject line and sending it to subscribe@statsoft.com.

    Back to Top
    Request Quote
    StatSoft Home Page



    [StatSoft] Pacific
    Suite 1, 46-48 Howard Street
    North Melbourne VIC 3051
    Australia
    Phone: +61 3 9348 9422
    Fax: +61 3 9348 9420

    [StatSoft]e-mail: info@statsoft.com.au

    ©Copyright StatSoft, Inc., 1984-2006.
    StatSoft, StatSoft logo, STATISTICA, Enterprise/QC, Enterprise, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.