Parabon Crush™
Statistical Data Mining

Harnessing the power of the grid
for extreme-scale predictive analytics

Pinpoint the real value in your data

By harnessing the extreme-scale computational power of the Frontier® Grid Platform, Parabon Crush revolutionizes statistical data mining, enabling you to answer deep and valuable questions about your data — fast!

Statistical Modeling

Enhanced with a search algorithm developed for NASA, Crush employs both exhaustive and evolutionary regression analysis to discover optimal statistical models among the vast set of models that are possible over high-dimensional datasets. It is ideal for a variety of statistically challenging problems, from genome-wide association studies to forecasting stock prices. By systematically exhausting the space of possible models or smartly and deeply sampling it when exhaustive techniques are prohibitively time consuming, Crush finds solutions that are demonstrably superior to traditional heuristic approaches.

Given a set of input data, Crush determines which combinations of independent variables (X1, X2, …) provide the most statistically significant explanatory models of the dataset.

Predictive Analytics

Using models generated by Crush, analysts can predict outcomes for a given set of new data. For example, by studying the demographic and genomic characteristics of a sample population, cancer researchers have used Crush to uncover hidden relationships that correlate to a test subject's metabolic response to a particular chemotherapy treatment. Armed with these models, doctors can now predict the likelihood of a given patient's response to this treatment based on his or her age, race, weight and DNA profile.

The Result

With Crush, the result is always the discovery of optimal statistical models that would never be found using traditional approaches.

Why is the Power of the Grid Necessary?

Unlike traditional statistical modeling tools, which use simple heuristics to rapidly produce answers that are often suboptimal, Crush can systematically exhaust the entire space of possible combinations in its search for the best model. And thanks to the power of the Frontier Grid Platform, Crush can perform many thousands of calculations simultaneously, reducing the amount of time the analysis would take from months or years, to hours or days.

Opportunistic Evolution Search Algorithm

As the number of independent variables (columns of input) grows, the number of possible models (combinations of variables) grows exponentially. Eventually, the search space reaches a point where it's impractical to exhaust every possible model. For these cases, Crush employs a novel algorithm, called Opportunistic Evolution (OE), which efficiently searches arbitrarily large model spaces deeply and effectively. This optimization technique is part of Parabon's Origin™ Evolutionary Software Development Kit, a suite of genetic algorithms specifically designed to maximally leverage grid-scale capacity.

Easy to Use

  • Crush can be run right from within Microsoft® Excel® or, for larger datasets, from the command line.
  • Progress can be monitored from any browser via the Frontier Dashboard.
  • Jobs that would take years to complete on a single computer can be completed in hours or minutes, depending upon the grid capacity used.

Crush in Action: It's Easy, Fast and Powerful

Crush has been used in a wide variety of domains, from genetic analysis and generating psychometric models of consumer preferences, to predicting contract overruns and determining the causal factors for the spread of West Nile Virus. Its statistical models can be used in practically any domain to:

CRUSH YOUR HIGH-DIMENSIONAL DATASETS

Learn More about Crush

With use of the [Crush] application on Frontier, we were able to look at 34 different independent variables (producing about 17.2 billion linear combinations) in less than 24 hours of computing time." Dr. Frondorf, US Geological Survey

Crush revealed significant explanatory models that previous conventional approaches used by experienced, Ph.D. statisticians failed to find.Dr. William Petros, Pharm. D., of the Mary Babb Randolph Cancer Center

Parabon's computational grid allowed us, for the first time, to [programmatically] examine each of the billions of possible combinations of factors to find the most [statistically] powerful models. As a result, we found relationships that we had not anticipated." Dr. Eddie Reed, MD, Oncology Researcher

Access to massive computations helped find correlations that we were not otherwise able to consider." Dr. Karen Balzer, Medical Researcher at WVU