AI and the Cell Cycle

Every time a cell in your body divides, six billion base pairs of DNA have to be replicated.  Errors in this process can have severe consequences: problematic DNA replication is often implicated in cancer and other genetic diseases.  Understanding how a cell is able to quickly and accurately replicate its DNA, and how it copes with any errors along the way, is critical to understanding how cells maintain genome integrity.

Replication begins at sites on a chromosome called origins of replication, each of which can "fire" with a certain probability to start two replication forks that move in either direction down the chromosome.  This stochasticity means that each cell exhibits different patterns of replication origin firing.  The movement of replication forks also varies between cells: features in the genome such as nucleotide repeats, actively transcribed genes, and DNA-binding proteins make replication forks more likely to stall or pause, but this may not happen at the same time or place cell-to-cell.  Current methods to study this are largely population-based, allowing us to observe how a population of cells replicate their DNA on average.  However, this "averages out" the rare events that are most important for understanding genome integrity.  We need an accurate, high-throughput method that can reveal how replication occurred in individual molecules.

brduIncorporation.PNG
sequencing-animated_0.gif

The MinION sequencer from Oxford Nanopore Technologies works by threading DNA through a nanopore such that the bases in the pore produce a characteristic current (see image from Oxford Nanopore above).  These current readings can be translated into a sequence of DNA bases (A, T, G, and C).  We use current-disrupting DNA base analogues (such as BrdU) as molecular labels: when replication forks incorporate these analogues into newly replicated DNA, it creates bands of analogue incorporation.  By detecting the location of the analogues, we can determine replication fork movement and which origins fired in each individual molecule that passes through the nanopore.  

rDNA_short.png

Our DNAscent software in action
Budding yeast rDNA consists of a series of ~9.1 kb repeats, each of which contains an origin of replication and a replication fork barrier (red lines) that blocks rightward-moving forks. It is therefore an ideal system to model replication stalling in human cancer cells. Here, our software analysed 16 nanopore-sequenced molecules where the BrdU incorporation at each thymidine position is represented as a bar graph. Fork stalling causes a sharp drop-off in the BrdU incorporated into the newly replicated DNA by replication forks, and we can call the location of these stall sites down to the base pair. This improves upon the resolution of previous microscopy-based techniques by about three orders of magnitude.

The software we use to do this is called DNAscent, and it is maintained and further developed by the Boemo Group.  It uses a residual convolutional neural network to evaluate the probability that each thymidine base in a nanopore read is actually BrdU, as well as bioinformatics and deep learning approaches to interpret these probabilities into replication fork direction and origin calls.  The high throughput of both Oxford Nanopore sequencing and the DNAscent software means that we can do genome-wide assays of DNA replication dynamics with single-molecule resolution. In addition to maintaining and pushing the boundaries of the software, we partner with experimental groups to answer a growing list of biological questions.

Corresponding software:

DNAscent is available on GitHub and we maintain a manual with examples.

Corresponding publications:

Totanes, F.I.G., Gockel, J., Chapman, S.E., Bartfai, R., Boemo, M.A.†, Merrick, C.J.† (2022) Replication origin mapping in the malaria parasite Plasmodium falciparum. [bioRxiv]

Mueller, C.A.*, Boemo, M.A.*, Spingardi, P., Kessler, B. Kriaucionis, S. Simpson, J.T., Nieduszynski, C.A.† (2019) Capturing the dynamics of genome replication on individual ultra-long nanopore sequencing reads.  Nature Methods 16:429-436. [DOI:10.1038/s41592-019-0394-y]

Boemo, M.A. (2021) DNAscent v2: Detecting replication forks in nanopore sequencing data with deep learning. BMC Genomics 22:430. [DOI:10.1186/s12864-021-07736-6]

Developers and Scientists:

To play, press and hold the enter key. To stop, release the enter key.