Structural variation (SV) within the genome is likely a primary mechanism leading to phenotypic diversity. As an important driver of heritable disease, these variant types are of interest to both biomedical and evolutionary researchers. In the canine genome this is particularly evident; breed demography and selection not only convey unique morphological and behavioral characteristics, but differential susceptibility to disease.
There are numerous hurdles to efficiently examine SVs, which may hinder researchers’ ability to understand the underlying mechanisms of disease, morphological variation, or basic biological paradigms. Unlike single nucleotide variation (SNV), SVs require a much more algorithmic and therefore inferential approach. Many methods have been developed across dozens of tools to facilitate the identification of genomic structure, though no definitive gold standard approach has emerged. Methods utilize overlapping, though not fully analogous techniques, and therefore produce results with varying levels of concordance. Additionally, variant callers may specialize in one of the multitude of characteristics of one or many variant types. Hence, it is prudent to utilize several algorithms to analyze a series of genomes; a consensus approach to confidence. Each algorithm produces a unique and non-standardized output, implements algorithm-specific support metrics, and contains structural variant categories that do not necessarily overlap.
To address these issues, we have developed SVStore, an integrated suite of tools for genomicists interested in the complexity of genomic variation. It is a relational database-driven tool that allows for the easy compilation of data from multiple SV detection algorithms. First, a set of scripts converts the heterogeneous output of the most common SV detection algorithms into the standard Variant Call Format (VCF) in order to normalize algorithm output. Second, records are loaded into a relational database that utilizes the simplicity and utility of VCF, allowing for the centralized collection and curation of algorithm output. This database also allows users to leverage the power of SQL queries to search, sort and analyze the vast quantity and diversity of information present in contemporary genomic data sets. To simplify data interaction, a graphical user interface was created to allow common queries to be executed without the need for a deep understanding of database structure. Results of any queries from the web interface can be viewed through a standard browser, or as downloadable output in VCF format. These three components make SVStore into a powerful tool for enabling the analysis and study of structural variation.