Bioinformatics Logo

Data

University Freiburg Logo

Sequence Data - HP Sequence Classification via Folding Properties


We used a thermodynamic and kinetic feature-based classification procedure to identify protein-like sequences in the 3D-cubic HP-model. The following properties are tested:

These properties ensure a thermodynamically stable native structure (the unique mfe) and the ability to fold into this functional conformation within a short time interval as requested by short biomolecule life cycles. Furthermore, the sequential assembly of proteins is considered. There is evidence for co-translational folding during elongation that should restrict the accessible folding space. Thus, we are only interested in sequences that are able to form their native structure via sequential folding without high energy barriers in the traversed energy landscape.

A sequence fulfilling all criteria is called protein-like. If the ground state is not reachable sequentially but via global folding at a high rate, it is classified as a good folder. Bad folders are not able to adopt the native structure in a short time interval. All checked sequences are non-degenerate, i.e., having a unique ground state.

Main Publications

HP in Unrestricted 3D-Cubic


Benchmark Set for Protein Chain Lattice Fitting (PCLF) Problem


This is the benchmark set of high-resolution protein structures used for benchmarking tools solving the Protein Chain Lattice Fitting (PCLF) problem (see publication below).

The test set was taken from the PISCES web server (Wang and Dunbrack, 2005). We enforced a 40% sequence identity cutoff, chain length 50–300, R-factor ≤ 0.3, and resolution ≤ 1.5 Å to derive a high-quality set of proteins to model. Given our requirement for side chains, C_alpha-only chains were ignored.

The resulting benchmark set contains 1198 proteins exhibiting a mean length of 160.

Files

Main Publications


Contact


In case of questions, comments, or contributions to this page, please contact Martin Raden (nee Mann).