Poster Presentation 50 Years Shine-Dalgarno Symposium 2023

Enhanced identification of RNA modification sites using deep learning framework (#143)

Korawich Uthayopas 1 2 3 , Alex G. C. de Sá 1 2 3 4 , David B. Ascher 1 2 3 4
  1. Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia
  2. Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
  3. School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane , Queensland, 4072 , Australia
  4. Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville, Victoria, 3010, Australia

RNA modifications are critical post-transcriptional events that alter RNA activity, location, and stability by modifying a specific nucleotide through the action of RNA-binding proteins1-2. To date, over 100 types of RNA modifications have been identified, with some implicated in the development of cancers, cardiovascular disorders, and other diseases3-5. Although recent technological advancements have significantly increased our capacity to identify these modifications, existing analysis pipelines are restricted to known modification motifs6. In this study, we present a deep learning framework capable of accurately identifying RNA sites likely to undergo seven different modification types, including N6-methyladenosine (m6A), Pseudouridine (ψ), 1-Methyladenosine (m1A), 2’-O-methyladenosine (Am), 2’-O-methylcytidine (Cm), 2'-O-methylguanosine (Gm), and 2’-O-methyluridine (Um).

We curated publicly available experimental datasets7-8 and characterised the modification sites from three aspects - RNA sequences, conservation level, and geographic location. RNA sequence descriptors were generated using one-hot encoding, iFeatures9, and an optimised transformer-based machine learning technique for natural language processing (RNABERT)10. PhyloP and PhastCons scores were used to reflect the conservation status of modification sites and adjacent sites. We employed the geographic position of modification sites with regard to transcript structures to enhance the prediction performance.

The model performed well across cross-validation and independent blind tests, offering a potent tool for analysing RNA modification sites and allowing genome-wide predictive mapping. This framework expands our ability to identify RNA modifications and has the potential to facilitate advances in therapeutic applications.

  1. Frye, M., et al., RNA modifications: what have we learned and where are we headed? Nat Rev Genet, 2016. 17(6): p. 365-72.
  2. Schaefer, M., Kapoor, U., and Jantsch, M. F., Understanding RNA modifications: The promises and technological bottlenecks of the ‘epitranscriptome.’ Open Biology, 2017. 7(5): p. 170077.
  3. Cayir, A., RNA modifications as emerging therapeutic targets. Wiley Interdiscip Rev RNA, 2022. 13(4): p. e1702.
  4. Yanas, A. and K.F. Liu, RNA modifications and the link to human disease. Methods Enzymol, 2019. 626: p. 133-146.
  5. Cui, L., et al., RNA modifications: importance in immune cell biology and related diseases. Signal Transduct Target Ther, 2022. 7(1): p. 334.
  6. Zhang, Y., L. Lu, and X. Li, Detection technologies for RNA modifications. Exp Mol Med, 2022. 54(10): p. 1601-1616.
  7. Xuan, J.J., et al., RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res, 2018. 46(D1): p. D327-D334.
  8. Zhou, Y. et al., SRAMP: Prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Research, 2016. 44(10).
  9. Chen, Z. et al., iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics (Oxford, England), 2018. 34(14), p. 2499–2502.
  10. Akiyama, M. and Y. Sakakibara, Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform, 2022. 4(1): p. lqac012.