The University of Texas at Austin’s Texas Advanced Computing Center (TACC) Stampede and Lonestar supercomputers have helped scientists find a surprising link between cross-shaped (or cruciform) pieces of DNA and human cancer, according to a study at The University of Texas at Austin (UT Austin).
In a podcast, Texas Advanced Computing Center Technology Writer and Editor Jorge Salazar explains that DNA naturally folds itself into cross-shaped structures called cruciforms that protrude along the sprawling length of its double helix, noting that there is an abundance of DNA cruciforms, with scientists estimating that as many as 500,000 cruciform-forming sequences may exist on average in a single normal human genome. Among these, over 80 percent of DNA cruciforms are considered small, meaning under 100 base pairs of DNA, and small cruciforms enable the DNA replication and gene expression that are essential for human life. However, Salazar says scientists also suspect these small cruciforms — an essential structure of DNA itself — may be linked to mutations that can elevate cancer risk.
The UT Vasquez lab Open Access study, entitled “Short Inverted Repeats Are Hotspots for Genetic Instability: Relevance to Cancer Genomes“ is published in the journal Cell Reports, coauthored by co-first authors Steve Lu and Guliang Wang, with Albino Bacolla, Junhua Zhao, Scott Spitser, and Karen M. Vasquez — all of the Dell Pediatric Research Institute and the Division of Pharmacology and Toxicology, College of Pharmacy, at The University of Texas at Austin.
High performance computing using UT Austin’s Texas Advanced Computing Center supercomputers Stampede and Lonestar helped the researchers to discover short inverted repeats of 30 base pairs and under in a reference database of mutations in human cancer that are somatic, meaning not inherited.
The research team found that small DNA cruciforms are mutagenic, altering DNA in a way that can increase risk of cancer in yeast, monkeys, and in humans, noting that analyses of chromosomal aberrations in human genetic disorders have revealed that inverted repeat sequences (IRs) often co-localize with endogenous chromosomal instability and breakage hotspots. They observe that with approximately 80 percent of all IRs in the human genome being short, DNA cruciforms are created by short inverted repeats of the nucleotides Adenine-Thymine-Cytosine-Guanine that form the bases of DNA structure. Inverted repeats are DNA nucleotide sequences are followed by their reverse compliment sort of like a palindrome — a word or phrase that spells the same forwards or backwards (e.g.,: “A man, a plan, a canal, Panama!” or “Never a foot too far, even”). The coauthors suggest that their discoveries implicate short IRs as endogenous sources of DNA breakage involved in disease etiology and suggest that these repeats represent a feature of genome plasticity that may contribute to the evolution of the human genome by providing a means for diversity within the population.
DNA strands commonly break in human cells, which have a built-in healing mechanism whereby repair proteins fuse the broken end of one DNA strand to the broken end of another. However, the UT scientists note that when formed in certain ways, these “gene fusions, or translocations” can lead to cancer development.
“We found that short inverted repeats are indeed enriched at translocation breakpoints in human cancer genomes,” supervising coauthor author Karen Vazquez told Jorge Salazar. Dr. Vasquez is a professor in the Division of Pharmacology and Toxicology at the UT Austin College of Pharmacy, and has been recognized for pioneering contributions concerning genome instability, particularly by demonstrating that noncanonical DNA structures can be mutagenic, and for discovering new roles for DNA repair factors. She is also the James T. Doluisio Regents Professor in the Division of Pharmacology and Toxicology at UT Austin.
“In many cases, translocations are what turn a normal cell into a cancer cell,” Vasquez Lab research associate and study co-author Albino Bacolla explains in the UTCC podcast. “What we found in our study was that the sites of chromosome breaks are not random along the DNA double helix; instead, they occur preferentially at specific locations. Cruciforms structures in the DNA, built by the short inverted repeats, mark the spots for chromosome breaks, mutations, and potentially initiate cancer development.”
The Vasquez Lab’s current research efforts are focused on an overall theme of genome instability, DNA damage and mechanisms of repair. A unique feature of our approach is an emphasis on the role of DNA structure, including non-canonical structures such as triplex DNA, as recognition sites for repair machinery, sources of genomic instability, and as a basis for technology to target DNA damage to specific genomic sites.
Dr. Vasquez adds that “DNA double-strand breaks can increase the risk of cancer because they can result in translocations, deletions, and other mutagenic events that disrupt the coding properties of genes [and] these modifications of the DNA can lead to cancer.”
“We have also studied the potential mechanisms that are involved in the interplays among alternative DNA structures and cancer development,” she continues. “Our team has discovered at least two different mechanistic pathways: one involving DNA replication, where these unusual structures cause a roadblock to DNA replication; the other pathway is independent of that, where DNA repair proteins, we think, recognize these alternative DNA structures as damage, even though there is no damage per se. The cells try to process the structures as damage, but they are really processing naturally occurring unusual DNA formations and not actual damage. An abortive error prone repair process can then cause DNA double-strand breaks and lead to serious problems including neoplastic transformation.”
Results of several studies are incorporated in the Cell Reports article, one of which used reporter gene assays to confirm that the short inverted repeat sequences from COS-7 cells, derived from monkey kidney tissue, were mutagenic. “We wanted to confirm that this was a biologically relevant finding,” Dr. Vasquez says. “That’s when we had to do some computational studies and insilico searching. We used the TACC supercomputers for that aspect of the work.”
“It would not have been possible to do this job without the TACC resources,” Albino Bacolla notes. “We have used both the Stampede and the Lonestar Linux clusters. The center is an incredible resource in terms of its capacity and support. We have been using the resources and staff support for some time now. It’s a wonderful opportunity for researchers at UT Austin.”
Stampede is a Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors, and as one of the largest computing systems in the world for open science research the system provides unprecedented computational capabilities to the national research community, enabling breakthrough science that has never before been possible. The scale of Stampede delivers opportunities in computational science and technology research, from highly parallel algorithms to high-throughput computing, and from scalable visualization to next generation programming languages.
Lonestar is a Dell Linux cluster containing 23,184 cores within 1,888 Dell PowerEdgeM610 compute blades (nodes), 16 PowerEdge R610 compute-I/Oserver-nodes, and 2 PowerEdge M610 (3.3GHz) login nodes. Each compute node has 24GB of memory, and the login/development nodes have 16GB. Lonestar also provides access to five large memory (1TB) nodes, and eight nodes containing two NVIDIA GPU’s, giving users access to high-throughput computing and remote visualization capabilities respectively. Lonestar is funded by The University of Texas at Austin, UT’s Institute of Computational Engineering and Sciences (ICES), UT System, Texas A&M, Texas Tech, and the National Science Foundation, and serves as a unique resource to researchers at all 15 UT System institutions.’
The broad strokes of the UT Austin study’s findings are that 1) short inverted repeat (IR) sequences are enriched at human cancer breakpoints; 2) short IRs stimulate DNA double-strand breaks and deletions in mammalian cells; 3) short IRs impede DNA replication forks in mammalian cells; and 4) ERCC1-XPF cleaves IRs and is required for IR-induced chromosome breakage.
Mr. Salazar also cites the program director at the Division of Cancer Biology of the National Cancer Institute, observing: “The focus of Dr. Vasquez’ research on the mechanisms of alternate DNA structure-induced mutations, DNA breaks, and chromosome translocations is a novel and significant aspect of NCI grant supported studies on mechanisms of genomic instability. Dr. Vasquez’ studies on the role of non-B DNA sequences in these mechanisms can contribute to our knowledge of the etiology of human cancer.”
“We wanted to confirm that this was a biologically relevant finding,” Dr. Vasquez notes. “That’s when we had to do some computational studies and insilico searching. We used the TACC supercomputers for that aspect of the work.”
“With TACC’s support, we were able to see that this is at least one plausible explanation in human cancer etiology, because these sequences are enriched at translocation breakpoints,” Dr. Vasquez tells Jorge Salazar. “That gives us hope, inspiration, and enthusiasm to move forward. “Our overarching interest is to understand how DNA structure can influence cancer development. With access to TACC, we are more confident that DNA sequences capable of forming particular unusual structures present a plausible explanation for how DNA breaks can lead to translocations in cancer. Our next steps are to go forward with a mouse model that can detect mutations and translocations in the mouse genome using human sequences from these cancer breakpoints.”
“The long term goal for these studies is to develop better prevention or treatment strategies for cancer patients,” Dr. Vasquez comments in the podcast. Questions that remain to be answered in further research include: are does this really occur now in the context of chromosomes in living organisms? Is it tissue specific? Does aging make a difference? “If we can help clinical scientists apply mechanistic information such as we hope will be gained from our research to better cancer treatment and a cancer prevention strategies, we are benefiting all of us, Dr. Vasquez concludes “I think the potential of the computational analysis is mind-blowing. Bioinformatics and computational centers like TACC are critical for the next steps in science. It’s an exciting time.”
The National Cancer Institute, part of the National Institutes of Health, funded this study.
The University of Texas at Austin
Texas Advanced Computing Center
The University of Texas at Austin
Karen Vasquez, UT Austin