University of Texas at Arlington researchers use supercomputers to study snake evolution, function
Evolution takes eons, but it leaves marks on the genomes of organisms that can be detected with DNA sequencing and analysis.
As methods for studying and comparing genetic data improve, scientists are beginning to decode these marks to reconstruct the evolutionary history of species, as well as how variants of genes give rise to unique traits.
A research team at the University of Texas at Arlington led by assistant professor of biology Todd Castoe has been exploring the genomes of snakes and lizards to answer critical questions about these creatures’ evolutionary history. For instance, how did they develop venom? How do they regenerate their organs? And how do evolutionarily-derived variations in genes lead to variations in how organisms look and function?
“Some of the most basic questions drive our research, including ‘why do different organisms look and function differently,’ says Castoe. “Yet trying to understand the genetic explanations of such questions is surprisingly difficult considering most vertebrate genomes, including our own, are made up of literally billions of DNA bases that can determine how an organism looks and functions. Understanding these links between differences in DNA and differences in form and function is central to understanding biology and disease, and investigating these critical links requires massive computing power.”
To uncover new insights that link variation in DNA with variation in vertebrate form and function, Castoe’s group uses supercomputing and data analysis resources at the Texas Advanced Computing Center or TACC, one of the world’s leading centers for computational discovery.
Recently, they used TACC’s supercomputers to understand the mechanisms by which Burmese pythons regenerate their organs — including their heart, liver, kidney, and small intestines — after feeding.
Burmese pythons (as well as other snakes) massively downregulate their metabolic and physiological functions during extended periods of fasting. During this time their organs atrophy, saving energy. However, upon feeding, the size and function of these organs, along with their ability to generate energy, dramatically increase to accommodate digestion.
Within 48 hours of feeding, Burmese pythons can undergo up to a 44-fold increase in metabolic rate and the mass of their major organs can increase by 40 to 100 percent.
Writing in BMC Genomics in May 2017, the researchers described their efforts to compare gene expression in pythons. They sequenced pythons that were fasting, one day post-feeding and four days post-feeding. From those sequences, they identified 1,700 genes that were significantly different pre- and post-feeding, and performed statistical analyses to identify the key drivers of organ regeneration across different types of tissues.
They found that a few key sets of genes were driving the wholesale change of pythons’ internal organ structure. A small number of key proteins, produced and regulated by these important genes, activated a cascade of diverse, tissue-specific signals that led to regenerative organ growth.
Even mammalian cells have been shown to respond to serum produced by post-feeding pythons, suggesting that the signaling function is conserved across species and could one day be used to improve human health.
“We’re interested in understanding the molecular basis of this phenomenon to see what genes are regulated related to the feeding response,” says Daren Card, a doctoral student in Castoe’s lab and one of the authors of the study. “Our hope is that we can leverage our understanding of how snakes accomplish organ regeneration to one day help treat human diseases.”
The Role of Supercomputing in Genomics Research
The studies performed by members of the Castoe lab rely on advanced computing for several aspects of the research. First, they use advanced computing to create genome assemblies – putting millions of small chunks of DNA in the correct order.
“Vertebrate genomes are typically on the larger side, so it takes a lot of computational power to assemble them,” says Card. “We use TACC a lot for that.”
Next, the researchers use advanced computing to compare the results among many different samples, from multiple lineages, to identify subtle differences and patterns that would not be distinguishable otherwise.
Castoe’s lab has their own in-house computers, but they fall short of what is needed to perform all of the studies the group is interested in working on.
“In terms of genome assemblies and the very intensive analyses we do, accessing larger resources from TACC is advantageous,” Card says. “Certain things benefit substantially from the general output from TACC machines, but they also allow us to run 500 jobs at the same time, which speeds up the research process considerably.”
A third computer-driven approach lets the team simulate the process of genetic evolution over millions of generations using synthetic biological data to deduce the rules of evolution, and to identify genes that may be important for adaptation.
For one such project, the team developed a new software tool called GppFst that allows researchers to differentiate genetic drift – a neutral process whereby genes and gene sequences naturally change due to random mating within a population – from genetic variations that are indicative of evolutionary changes caused by natural selection.
The tool uses simulations to statistically determine which changes are meaningful and can help biologists better understand the processes that underlie genetic variation. They described the tool in the May 2017 issue of Bioinformatics.
Lab members are able to access TACC resources through a unique initiative, called the University of Texas Research Cyberinfrastructure, which gives researchers from the state’s 14 public universities and health centers access to TACC’s systems and staff expertise.
“It’s been integral to our research,” said Richard Adams, another doctoral student in Castoe’s group and the developer of GppFst. “We simulate large numbers of different evolutionary scenarios. For each, we want to have hundreds of replicates, which are required to fully vet our conclusions. There’s no way to do that on our in-house systems. It would take 10 to 15 years to finish what we would need to do with our own machines – frankly, it would be impossible without the use of TACC systems.”
Though the roots of evolutionary biology can be found in field work and close observation, today, the field is deeply tied to computing, since the scale of genetic material – tiny but voluminous -- cannot be viewed with the naked eye or put in order by an individual.
“The massive scale of genomes, together with rapid advances in gathering genome sequence information, has shifted the paradigm for many aspects of life science research,” says Castoe.
“The bottleneck for discovery is no longer the generation of data, but instead is the analysis of such massive datasets. Data that takes less than a few weeks to generate can easily take years to analyze, and flexible shared supercomputing resources like TACC have become more critical than ever for advancing discovery in our field, and broadly for the life-sciences.”
UT System researchers that would like to learn more about expanding their research by leveraging UT System-supported TACC resources should visit the University of Texas Research Cyberinfrastructure page or email firstname.lastname@example.org.