Data Mining On A Budget: Choose Wisely

Report this content

Salford Systems Purchasing Guide

SAN DIEGO -- The SPM software suite is known for its all-inclusive suite of modern decision tree products - built for speed and accuracy. However, the entire suite of products may not always be feasible for analysts working with smaller budgets. Salford Systems' team of experts has put together a guide to determine what version of software will likely be most suitable for various analysts' needs:

Ultra: The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.

ProEx: For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.

Pro: A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.

Basic: Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines. (usually recommended for students)

Even if you're able to build a power-house business case to support funding for Ultra SPM v7.0, many budgets are limited and can only support one or two predictive modeling software products. So how do you choose data mining software that will produce the results you need, with enough bang for the buck? There are several options within the Salford Predictive Modeler® software suite.

Once you've identified the right version for your needs - it's time to choose the right product(s) that will accomplish your goals.

If you're looking for a segmentation tool, that handles missing values well, and will continuously produce reliable models, CART is likely the product for you. If you can afford a little bit more, have complex data, and require the utmost accuracy and prediction power available, then TreeNet is recommended.

According to Salford Systems' Senior Scientist Mikhail Golovnya,TreeNet stocahstic gradient boosting is recommended for this idea of "if only one product."

So what is the loss in functionality, besides the algorithms, if an analyst can only afford the TreeNet version? What are the advantages of CART, MARS, and Random Forests that will be missed out on?

CART's addition to TreeNet functionality

  • CART shows a different level of interpretability and simplicity as compared to TreeNet.
  • CART is fast. CART provides quick initial analysis to spot obvious inconsistencies and problematic areas.
  • CART has a phenomenal ability to handle missing values and can also be used to impute missing values in preparation for a TreeNet analysis.
  • CART's ability to be run in battery priors mode allows you to quickly identify and zero in on hotspots described by simple rules.

MARS' addition to TreeNet functionality

  • The beauty of MARS is that it is closest in its form to conventional regression and logistic regression so it gives you simple piecewise linear equations that are understandable by classically trained statisticians.
  • MARS give you the ability to construct non-linear regression solutions described by simple collections of equations that are easily defendable to upper level management, regulators, etc.

Random Forests' addition to TreeNet functionality

  • Random Forests is a known rival of TreeNet in terms of predictive accuracy, especially when high order interactions are present.
  • TreeNet is almost always more accurate that Random Forests for large datasets. However, Random Forests is often the product of choice if you are working with wide datasets (typical of biostatistical datasets). Although it would be ideal to be able to compare the model accuracy of these two powerful, drastically different modeling engines --- it is not necessary.

In a nutshell, if you would only be able to purchase one tool, TreeNet is recommended. If you were only able to purchase two tools, Salford Systems recommends TreeNet and CART. Random Forests and MARS are at the lowest of priority if price is a concern.

Check out this on-demand webinar for how to Combine the best of CART and TreeNet methodologies. Also, see this webinar to understand all of the recent advances in TreeNet and post-processing available in SPM v7.0.

About Salford Systems
Founded in 1983, Salford Systems specializes in providing new generation data mining and choice modeling software and consultation services. Applications in both software and consulting span market research segmentation, direct marketing, fraud detection, credit scoring, risk management, bio-medical research and manufacturing quality control. Industries using Salford Systems products and consultation services include telecommunications, transportation, banking, financial services, insurance, health care, manufacturing, retail and catalog sales, and education. Salford Systems software is installed at more than 3,500 sites worldwide, including 300 major universities. Key customers include AT&T Universal Card Services, Pfizer Pharmaceuticals, General Motors, and Sears, Roebuck and Co.

Media Contact
Heather Hinman
Salford Systems
619-543-8880 ext. 130
hhinman@salford-systems.com

Tags: