Bioinformatics Logo

LocARNA

University Freiburg Logo

LocARNA - Alignment of RNAs

Synopsis


The LocARNA package comprises several tools for producing fast and high-quality pairwise and multiple alignment of RNA sequences of unknown structure. These tools build on the Turner free energy model of RNAs to simultaneously fold and align (or match) RNAs based on their sequence and structure features.


The tools come with many practically relevant features like support of anchor and structure constraints, support of local folding, and effective heuristics.

The package comprises the following tools:

  • LocARNA (i.e., tools locarna and mlocarna) performs global and local alignment of two or multiple RNAs. In local alignment, local motifs are defined as either subsequences (sequence locality) or substructures (structure locality). Tutorial video by Mathias Möhl provides a quick basic overview. For more algorithmic background, e.g. see these lecture slides. LocARNA-based clustering is supported via RNAclust.
  • LocARNA-P calculates reliabilities (based on partition functions) of global simultaneous folding and alignment (mlocarna --probabilistic) and predicts RNA boundaries. These slides provide more background.
  • SPARSE performs global alignment of two or multiple RNAs similar to LocARNA, but more efficiently. For algorithmic background, e.g. see these slides.
  • ExpaRNA-P predicts exact local sequence and structure matches between RNAs of unknown structure (simultaneously predicting their structure). The pipeline ExpLoc-P (exploc_p) uses such matches for fast multiple alignment.

This page mostly describes LocARNA and LocARNA-P; for SPARSE and ExpaRNA-P please visit their dedicated webpages.

Some other closely related tools, which however are not distributed with LocARNA, may be of interest:

  • CARNA makes use of the LocARNA library, but computes alignments based on unlimited RNA secondary structures via a constraint programming approach.
  • REAPR predicts novel ncRNA candidates in whole genomes based on structure-based LocARNA realignments of existing (sequence-based) whole genome alignments. Efficient realignment is integrated into LocARNA.

The Freiburg RNA Tools Web Server makes the main package features (LocARNA and LocARNA-P) available online.

Download / Installation


Latest Release

Latest Release: LocARNA 1.9.2.1 (2018-27-06) Maintenance patch source.

For installation from source, follow these installation instructions. At times the source release may be ahead of some or all bin packages; otherwise installation from bin packages (conda) is recommended for offline use.

Conda Installation

Installation on any GNU/Linux distribution and Mac OS X gets as simple as:

conda install -c bioconda locarna

After setting up Conda; read more at Bioconda.

Repository

LocARNA resides at GitHub. The releases and the most recent — i.e., not necessarily most robust — code are available here. Clone with:

git clone https://github.com/s-will/LocARNA

Compiling from the repository requires autotools; run autoreconf -i before continuing as usual.

Other Downloads

Main Publications


  1. Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. Published in PLOS Computational Biology, 3 no. 4, pp. e65, 2007.

  2. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs Sebastian Will, Tejal Joshi, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. Published in RNA, 18 no. 5, pp. 900-14, 2012.

More Publications.

What’s new in and around the package?


Try the novel very-efficient RNA alignment tool SPARSE

SPARSE improves over the original locarna algorithm in terms of speed. Moreover, it implements an advanced lightweight simultaneous alignment and folding model, which improves its structure prediction capabilities. Currently, the tool offers a trade-off between alignment accuracy and speed. Thus, the choice of either algorithm should be based on the specific application requirements. SPARSE-specific code is contributed by Milad Miladi.

ExpaRNA-P predicts exact sequence and structure matches in RNAs of unknown structure

ExpaRNA-P enumerates exactly matching local sequence-structure patterns in RNAs of unknown structure, supporting full structural flexibility according to RNA secondary structure energy models (inheriting from the Vienna RNA package). Based on ExpaRNA-P’s exact matching, the tool ExpLoc-P performs very fast simultaneous alignment and folding of RNAs (think: “like LocARNA, but faster”). For highly efficient prediction, ExpaRNA-P introduces novel ensemble-based sparsification techniques, which are also well used by SPARSE. ExpaRNA-P-specific code and the classes for the strong ensemble-based sparsification of ExpARNA-P and SPARSE are contributed by Christina Otto (nee Schmiedl).

Check out the new realignment mode used by REAPR

REAPR applies LocARNA for structure-based alignment of whole genomes to predict structural non-coding RNAs. With REAPR, we introduced a new realignment mode to LocARNA. In this mode, LocARNA aligns very fast within a small distance to a reference multiple alignment (mlocarna options --max-diff-aln and --max-diff).

Documentation of the LocARNA C++ API is now online


Installation from Source


LocARNA runs on recent GNU/Linux systems (including GYGWIN) and MacOSX. If available for your system, consider installing from a binary package. Installation from source (.tar.gz file) follows the usual “autotools” scheme (configure/make).

tar xzf locarna-xxx.tar.gz
cd locarna-xxx

./configure
make
make check
make install

LocARNA depends on the Vienna RNA Package (recent version).

If the Vienna package is not installed in a standard path, one needs to configure with the option --with-vrna="path to VRNA installation" or set the environment variable PKG_CONFIG_PATH to the pkg-config directory containing RNAlib2.pc (preferred).

See ./configure --help for further options.

Usage


There are two major uses of the tools, pairwise and multiple alignment, and clustering of RNAs. The workhorse of the tool package is the program locarna. However, we recommend the use of our high-level scripts mlocarna, locarnate, and RNAclust.pl.

Multiple Alignment - using mlocarna

Assume an input file rnas.fa in fasta format, containing several RNA sequences.

For computing a multiple alignment of these RNAs, call:

mlocarna rnas.fa

In its default settings, mlocarna will produce a global multiple alignment of your RNAs.

The program writes some output to the screen as well as output to a directory rnas.out, where the name is derived from the input name by default. The output directory can be controlled by the option --tgtdir.

Help on the many options to mlocarna is available by mlocarna --help or more conveniently mlocarna --man. The distribution contains some example input in the sub-directory Examples.


Anchor and Structure Constraints

The tool mlocarna provides a convenient interface for user-specified constraints on the alignment, including anchor constraints as well as structure constraints. Constraints are specified in the fasta file as follows:

>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
.......(((..(((xxxx))).)))...... #S
.........AAAAAA.BBBCCCC......... #1
.........123456.1231234......... #2
>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
..............(((.....xxxxxx......)))........... #S
...........AAAAAA.....BBB.........CCCC.......... #1
...........123456.....123.........1234.......... #2

The structure constraints (lines #S) inherit their semantics from RNAfold. In consequence, the alignment can only be guided by base pairs matches that are compatible with the given constraints. The anchor constraints are specified by giving unique names to certain sequence positions, here A1,A2,A3,A4,A5,A6,B1,B2,B3,C1,C2,C3,C4 (lines #1,#2). Positions of the same name in different sequences are aligned. In each sequence, names have to be unique.

A second, slightly larger example of constraints is provided in Examples/haca.snoRNA.fa of the LocARNA package.


Multiple Alignment - using locarnate

Assume again an input file rnas.fa in fasta format, containing several RNA sequences.

For computing a local multiple alignment by locarnate call:

locarnate rnas.fa

By default, the results are written to the subdirectory test_results. The final alignment is found in test_results/mult/tcoffee.aln.

Help is available by:

locarnate --man

Note that locarnate requires tcoffee.


Clustering - using RNAclust.pl

Please jump to the RNAclust section.


Pairwise Alignment - using locarna

The pairwise alignment tool is called with two input files that specify the input sequences and optionally ensemble probabilities (as e.g., generated by RNAfold -p). It accepts different file formats, which can be mixed freely. Available input formats are listed in order of increasing expressivity:


Further help

Further help is available for mlocarna, locarnate, and locarna via:

Library / C++ API


LocARNA implements a C++ API to its various algorithms and data structures. The library is installed together with the package, is used by the LocARNA programs themselves, and can be linked as a shared library to other programs.

API HTML Documentation: This documentation can be generated by doxygen from the package sources by running:

make doxygen-doc

Miscellaneous


LOCARNA-P

In probabilistic mode (mlocarna option --probabilistic), LocARNA computes more accurate multiple alignments based on a probabilistic consistency transformation and reliability profiles for assessing local alignment quality and localizing RNA motifs. These features are based on computing sequence and structure match probabilities due to the LocARNA alignment model.

RNAclust

RNAclust is a tool for clustering RNAs, which builds on LocARNA. RNAclust is written and copyrighted by Kristin Reiche. It replaces the cluster pipeline used for our paper “Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering”.

The latest release is available at the RNAclust web site. It requires the LocARNA and Vienna RNA package.

Short usage:

RNAclust.pl --fasta your_sequences --dir output_directory

The full documentation of RNAclust.pl is available as a PDF.

LocARNATE

This tool implements an alternative way to construct multiple alignments using LocARNA. While mlocarna implements various types of progressive and iterative alignment, where sequence-structure alignment is performed in each step, LocARNATE employs T-Coffee for combining pairwise LocARNA alignments into a multiple alignment.

Originally, LocARNATE was written by Wolfgang Otto. The current version was rewritten by Nikolaus Meinzer and has special support for ExpaRNA-P.

Contact


For comments, questions, and suggestions, or in case of unexpected behavior, please contact me (Sebastian Will).

For special questions related to the RNAclust pipeline, please contact Kristin Reiche.