Poster Presentation 50 Years Shine-Dalgarno Symposium 2023

Development of a scalable pipeline for RNA editing detection and annotation in RNAseq data (#140)

Jacob E Munro 1 2 , Melanie Bahlo 1 2 , Brendan RE Ansell 1 2
  1. Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
  2. Population Health and Immunity Division, Walter and Eliza Hall Institute, Melbourne, VIC, Australia

RNA editing is a molecular process whereby certain bases in RNA are changed or ‘edited’ during transcription. The most frequent form of RNA editing is adenosine to inosine (A>I) conversion which is catalysed by the ADAR enzymes. A>I editing is detectable in standard RNAseq libraries because inosine is called as guanosine by next generation sequencing platforms. The biological effects of RNA editing result from either alterations to the RNA secondary structure, or from altered codons resulting in amino acid substitutions during translation. This process is required for healthy brain development and is dysregulated in neuropsychiatric diseases.

RNA editing can be detected by running variant calling software on RNAseq data. The purpose-built RNA editing variant caller JACUSA has demonstrated excellent performance in this regard, however substantial up-stream pre-processing and down-stream post-processing is required to effectively utilise JACUSA and interrogate RNA editing in a dataset of interest. Here we present our work developing a reproducible, scalable, fast and user-friendly pipeline for quantification of RNA editing with Nextflow and Docker/Singularity containers. The pipeline takes a batch of bulk RNAseq data in FASTQ format as input, then i) performs alignment and calls edited sites with JACUSA, ii) performs various quality control filtering strategies, iii) intersects calls with the REDIportal database of human RNA editing sites, iv) probabilistically categorises ambiguous sites and hyper-edited reads , and v) outputs the resulting callset along with an HTML report including various summary visualisations. This pipeline will allow researchers to easily interrogate RNA editing including in the vast and growing collection of publicly available RNA seq datasets, enabling a wealth of new epitranscriptomic biological insights.