Case Studies

Large Scale Genomic Sequence Comparison

Read How the Raw Power of Frontier Enables Large Genomic Dataset Comparisons

Accelerate R&D Cycles with Prospector™

Grid-Powered Sequence Analysis

Prospector, a grid computing software package built specifically to address the widening "data and computation gap" in the bioinformatics field, lets users run analyses faster, easier, and more comprehensively than ever before. Prospector, built with patented software technology1 by Parabon Computation, is a suite of software tools that focuses the raw computing power of the Frontier Grid Platform to perform large dataset comparisons, such as genome-to-genome, genome-to-database, and database-to-database.

Until now, the main challenge for researchers running BLAST (Basic Local Alignment Search Tool) has been memory capacity and throughput. Specifically, BLAST processes information so quickly that it exhausts data stored in RAM. When more data is required for analysis, jobs get "stuck" when information is retrieved from hard disks, resulting in a slow and cumbersome computing process.

To avoid accessing hard disk drives, the answer to this problem has been to put entire databases into main memory. But as GenBank exceeds 85 billion nucleotides, this is an increasingly painful and costly solution — especially when hundreds of machines are involved. At the same time, it's essential to repeatedly search against the most current version of GenBank and other databases early in R&D to find the most promising targets.

The Solution: Break the Data Bottleneck

Prospector employs a novel approach to database division and analysis that actually turns the amount of data into an advantage on Frontier. How? By sending sets of query and database sequences of approximately equal total length to each machine, the search space is divided into searchable "squares".


Traditionally, entire databases had to be downloaded to memory for analysis. Prospector avoids this sticking point by partitioning the databases themselves:

This division minimizes the amount of data that needs to be sent to each computer in order for it to do a useful amount of work. Now, the total amount of main memory needed — which used to be linearly proportional to the number of machines — is now proportional to the square root of the number of machines. Prospector turns the quadratic nature of the problem to your advantage, both in disassembly and reconstruction.

Benefits

  • Don't Sweat the Memory. By partitioning databases, massive analyses, inconceivable before, become the norm. Best of all, you get this boost without investing in costly memory upgrades.
  • Use All the Power at Your Disposal. Prospector can use the power of any computer. In addition to clusters or supercomputers, you can wring every drop of power from desktops, servers and other existing computational resources to boost your throughput. If you require additional burst power, the massive Parabon Computation Grid, available online, is only a mouse click away.
  • Validate Quickly. Using tried and true algorithms, Prospector provides results quickly and in a familiar format, making validation painless.
  • Minimize Investment, Maximize Return. Prospector is a software solution that takes advantage of existing IT infrastructure to turbo-charge your analyses.
  • Save Time and Power. Prospector comes with graphical tools to monitor the status of any computation, at either a macro- or micro-level, directly from your desktop or any workstation on the network. With this ability, you can monitor, cancel, and re-launch tasks on the fly by attaching a job controller to the system.
  • Ease of Use. Prospector's online documentation makes development, testing, and implementation an easy process.
  • Results You Can Count On

    Prospector ensures the integrity of results by overlapping Eigen values. Get the results you expect and verify easily:

  • Save time. Large databases are split automatically across multiple tasks.
  • Monitor Your Way. You determine the granularity of results by selecting task size and overlap.
  • No Superfluous Output. Don't wade through meaningless results, duplicate alignments are removed by the client process.
  • Get the Full Picture. Alignments smaller than overlap size are reported exactly. Alignments that span task boundaries and exceed overlap size may be reported in pieces.
  • Proven Industry Algorithms

    You can use Prospector to compare two libraries in FASTA or GenBank format using the Smith-Waterman or BLAST algorithm. Prospector performs the most valuable bioinformatic comparisons in a new way:

    Smith-Waterman

    Protein-to-Protein
    DNA-to-Protein (translated in all 6 frames)
    Protein-to-DNA (translated in all 6 frames)
    DNA-to-DNA (translated ina ll 36 frames)

    NCBI BLAST

    blastp: Protein-to-Protein
    blastx: translated DNA-to-Protein
    tblastn: Protein-to-translated DNA
    tblastx: translated DNA-to-translated DNA
    blastn: DNA-to-DNA

    Monitor Jobs Anytime, Quickly and Easily

    Prospector's job controller provides a simple interface that allows you to quickly check the status of your job.

    The Score Histogram provides a snapshot of your job's results.

    Updated as the results from each task are received, the histogram illustrates the distribution of comparison scores.



    -----
    1Blair, et al. Apparatus and method for providing sequence database comparison, US Patent 7,231,390, June 2007.