In an off-the-beaten-path conjunction of botany and genetic investigation with big data, scientists at The University of Texas at Austin are focusing the power of the iPlant, Stampede, Lonestar and Ranger supercomputers of the Texas Advanced Computing Center to process evidence of plant climate adaptation.
Using the TACC supercomputers, the researchers have found that genes sensitive to cold and drought help plants to withstand the disruptions caused by climate change — findings that increase basic scientific understanding of plant adaptation and which can also be applied to crop improvement.
The hybrid multinstitutional computational biology investigation focused on the flowering mustard weed Arabidopsis thaliana, was published in the journal Molecular Biology Evolution September 2014 edition. The iPlant Collaborative as well as the TACC supercomputers contributed to the research, which was funded by the National Science Foundation (NSF) and the U.S. Department of Agriculture.
“We found pretty good evidence, certainly the best evidence to date, that the evolution of gene expression is an important way that plant populations adapt to local environments,” explains study co-author and an Earth Institute fellow at Columbia University Jesse Lasky, “The evolution of gene expression is an important way that plant populations adapt to local environments.”
University of Texas at Austin biology professor Thomas Juenger is another co-author of the paper, and the Juenger lab at UTA has been studying Arabidopsis thaliana for over a decade, making it an ideal subject for this computational study. “It’s one of the model plants that biologists study,” says Dr. Juenger in a UTA release, which also notes that rabidopsis has one of the smallest plant genomes, the latter which in 2000 was the first plant genome to be fully sequenced.
The article further notes that Arabidopsis is regarded by Plant biologists as the “fruit fly” of their area of genetic research. However, rather than tweaking genes by applying genetic engineering processes, Dr. Juenger and his team study natural genetic variations, focusing on the interface of ecological and evolutionary processes in natural plants. “We want to understand how they’ve evolved in response to the processes of natural selection and gene flow and mutation in the field,” he observes, noting that he is generally interested in phenotypic evolution — which occurs primarily by mutation of genes that interact with one another in the developmental process. The Juenger Lab’s current focus is identification and characterization of genes underlying variation in drought adaptation among Arabidopsis thaliana ecotypes collected from around the world.
This work is motivated by a desire to understand how climate and habitat variation have influenced the evolution of plant physiology, Dr Juenger says. “Our approach usually couples quantitative genetic experiments [classic breeding designs & QTL/LD mapping], population genetic approaches, and selection analyses in studies of natural genetic variation. Ultimately, we’d like to understand the forces shaping patterns of genetic variability in natural populations across a variety of selective regimes.”
Scientists’ understanding of how plant life adapts to climate change, particularly details of genetic expression, which can have wide variance in species like the hardy Arabidopsis that thrives in a range of environments from Scandinavia to North Africa, and to Central Asia, has proved elusive. Genes — molecular snippets of DNA — carry both the genetic code made by proteins but also instructions for how many to make, or express. Gene expression, and “is the part of the organism that we show here is strongly involved in local adaptation to environment,” Jesse Lasky notes.
Being rooted, plants are obliged to stand their ground, literally, against temperature change, excessive soil moisture or lack of, and insect attacks to cite a few examples. Dr. Juenger explains that one coping mechanism for environmental change is to alter their gene expression. “As a plant starts to sense dropping temperatures, a cascade of gene expression can allow the plant to acclimatize to cold temperatures, and in effect prepare itself for the coming freezing conditions,” he explains. The Juenger lab team used previous that exposed Arabidopsis seedlings to cold and drought stress in order to measure gene expression across the genome.
The scientists took the genes they found and compared them with genomic data from previous studies that sampled Arabidopsis from populations throughout Europe and Asia. They narrowed that reference data to 1,003 strains of the flowering mustard weed. Of those genes that showed changes in their response to their environment, the scientists needed to know if they also showed changes in DNA along environmental gradients. Such a pattern “suggests that there are changes in the DNA sequence that are adapted to those local conditions and that are associated with changes in gene expression,” Lasky said.
The research team statistically tested for associations between climate and SNP polymorphism by making the hypothesis null, or assuming no association. They did that by shuffling the data and doing permutation testing. “We can randomize climatic variation with respect to SNP polymorphism variation and do that thousands and thousands of times and ask, what sort of test statistic might we observe by chance alone,” Juenger said. “We can compare that to our real, empirical data.”
The daunting computational challenge involved comparing thousands of individual Arabidopsis strains against hundreds of thousands of markers across the genome while testing for a dozen environmental variables. “It’s impossible to do this on a standard desktop computer, and it requires some of the throughput that we can have on a cluster like Stampede or Lonestar,” Dr. Juenger notes. “The computational time on the clusters at TACC allowed us to evaluate the hypothesis that generated from the SNP data.”
“To run these models across the genome, you quickly run out of time,” adds Jesse Lasky. “It’s really just a problem where you do lots of little things many, many times. It’s much easier to accomplish that when you can run that problem on many cores across a cluster. That was the challenge. I didn’t have any experience with high performance computing before this.” He credits Lasky called on Weijia Xu, group lead for the TACC Data Mining and Statistics Group, with helping him “To orient myself to what kind of problem I had and how to scale that up to run it on some of the clusters.” Dr. Xu also helped by writing the parametric job launcher that allowed Lasky get his data runs across the genome started more easily.
“It was a code I developed to launch multiple R jobs in parallel using an MPI interface,” Xu is cited referencing the launcher. Dr. Juenger observes that “iPlant, associated with TACC, has certainly been developing lots of new tools, simplifying computational tools for biologists, and giving us access to data storage as well as service units through high performance computing clusters like those at TACC. It’s a helpful, timely program that’s impacting plant biologists in individual labs around the country.”
Lasky concludes that while results derived from the Arabidopsis experiment are promising, “We have experimental work here, but we haven’t experimentally shown that the genes that we identified are causing localized adaptations.”
The TACC Lonestar supercomputer cluster is funded by The University of Texas at Austin, UT’s Institute of Computational Engineering and Sciences (ICES), the UT System, Texas A&M, Texas Tech, and the National Science Foundation, and serves researchers at all 15 UT System institutions.
The University of Texas at Austin
Texas Advanced Computing Center
The University of Texas at Austin