Correct termination of RNA synthesis (transcription) is crucial for proper bacterial gene expression. Currently, there are two well-established ‘textbook’ mechanisms of bacterial transcription termination- intrinsic and Rho-dependent. In bacteria, Rho-dependent termination relies on the protein factor Rho, that classically has three conserved domains. A few previous studies also describe bacteria with Rho factors having an insertion domain (e.g., Mycobacteria). However, the variation in Rho structure and function among bacteria has not been analysed in detail.
Therefore, this study aims to characterise bacterial atypical Rho termination factors with additional or missing domains. Distribution, sequence conservation, expression and predicted structure were analysed bioinformatically.The detection of Rho domains was performed by HMMER with Pfam or custom specific Hidden Markov models in 2730 high-quality bacterial genomes. Bioinformatics analyses of Rho included: alignment (MAFFT), phylogeny (FastTree), expression (Hisat2, StringTie, and Ballgown), secondary structure prediction (AlphaFold), and disorder (flDPnn).
At least one Rho domain was present in 91% of the genomes. However, less than half had typical Rho (49%) other patterns observed were: atypical with an initial domain (9%), atypical with a long insertion after the RNA-binding domain (14%), and multiple Rho domains in different genes (28%). Overall, ‘extra’ domains showed different enriched rich motifs according to the genomic GC% content, and they were predicted to be intrinsically disordered and bind to protein/DNA/RNA. Moreover, these atypical genes from Mycolicibacterium smegmatis are expressed in RNA-seq data.
In conclusion, atypical Rho factors are varied and broadly distributed among bacteria, possibly playing alternative cellular functions. Additionally, these factors might be associated with the termination of distinct RNA terminator sequences that could be part of an uncharacterised mechanism of transcription termination. The comparison of different 3’-end sequences from Term-seq available data has been performing to investigate this hypothesis.