RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure. Results: RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. Conclusion: A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.

References in zbMATH (referenced in 2 articles )

Showing results 1 to 2 of 2.
Sorted by year (citations)

  1. Giancarlo, R.; Scaturro, D.; Utro, F.: Textual data compression in computational biology: algorithmic techniques (2012)
  2. Dyrka, Witold; Nebel, Jean-Christophe: A stochastic context free grammar based framework for analysis of protein sequences (2009) ioport