RNA carries immense amounts of information in any biological material. RNA sequencing is used to capture a snapshot of gene expression, to classify species using structural RNA types, and to investigate transcription, splicing, translation and epitranscriptome layers of gene control. However, RNA is also a subject of rapid chemical or enzymatic decay or degradation. RNA can suffer integrity loss due to native cellular nucleic acid metabolism, or it can be additionally fragmented during sample storage and handling. Variability in the paths and extent of RNA fragmentation creates a source for potential biases between compared samples, which remains a major unresolved challenge. None of the existing methods provide a transcript-resolved native RNA length read-out and a link to the underlying fragmentation rate of the RNA, making accurate differential analyses difficult and sometimes impossible.
To resolve the RNA integrity and fragmentation problems, we employ direct RNA sequencing (DRS) data. DRS has been broadly adopted and opened entirely new ways to characterise the transcriptome by its unique capacity to investigate RNA molecules in their native configuration. Using unspecific, restricted chemical (magnesium ions) in vitro model of RNA degradation and DRS, we show that differential gene and transcript expression analysis, isoform and modification detection results can all be adversely influenced by different levels of RNA integrity, leading to false discoveries. Based on mathematical modelling and simulations of RNA fragmentation, we suggest Direct RNA Integrity (DRI) as a new DRS-based measure to estimate RNA degradation that accounts for inter- and intra-transcript degradation variability. DRI isolates RNA fragmentation from mapping artefacts and uncertainties and links the observed degradation profile with the underlying fragmentation rate of the RNA. We demonstrate that DRI can be used to correct for prospective false discoveries, enabling robust differential expression and feature analysis even across differently-degraded samples.