================================================================================ COMPREHENSIVE LITERATURE RESEARCH: PROTEIN FOLDING Compiled: 2026-03-10 Scope: Broad, cross-domain academic literature survey ================================================================================ TABLE OF CONTENTS 1. Fundamentals of Protein Folding 2. Energy Landscapes and Funnel Theory 3. Geometric and Topological Aspects of Protein Structure 4. Secondary Structure: Alpha Helices and Beta Sheets 5. Phi/Psi Angles and Ramachandran Plots 6. Tertiary and Quaternary Structure 7. Collagen and Specialized Structures 8. Self-Assembly and Self-Organization 9. Golden Ratio and Fibonacci Patterns in Protein Structure 10. Lattice Models of Protein Folding 11. Computational Approaches (AlphaFold, Rosetta, MD, Folding@home) 12. Chaperone-Assisted Folding and Quality Control 13. Protein Misfolding Diseases 14. Protein Folding Kinetics and Thermodynamics 15. Allosteric Transitions and Conformational Change 16. Membrane Protein Folding 17. Intrinsically Disordered Proteins 18. Evolutionary Constraints on Protein Structure 19. Symmetry in Protein Assemblies 20. Information Content and Entropy in Protein Sequences 21. Frequency Analysis and Vibrational Spectroscopy 22. Phase Transitions in Protein Solutions 23. Protein Fold Space and Classification 24. Experimental Methods for Structure Determination 25. Open Questions and Active Research Frontiers ================================================================================ 1. FUNDAMENTALS OF PROTEIN FOLDING ================================================================================ 1.1 ANFINSEN'S DOGMA (THE THERMODYNAMIC HYPOTHESIS) ----------------------------------------------------- Anfinsen's dogma, also known as the thermodynamic hypothesis, states that for a small globular protein in its standard physiological environment, the native structure is determined only by the protein's amino acid sequence. The native conformation is a unique, stable, and kinetically accessible minimum of the free energy, determined by the totality of interatomic interactions in a given environment. Christian B. Anfinsen championed this principle based on his work with ribonuclease A (RNase A). He shared the 1972 Nobel Prize in Chemistry with Stanford Moore and William Howard Stein "for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation." Source: Nobel Prize in Chemistry 1972 https://www.nobelprize.org/prizes/chemistry/1972/anfinsen/facts/ Source: Anfinsen's dogma, Wikipedia https://en.wikipedia.org/wiki/Anfinsen%27s_dogma THE ANFINSEN EXPERIMENT (RIBONUCLEASE A REFOLDING): In 1961, Anfinsen proved that the amino acid sequence alone determines how the protein chain folds, requiring no additional genetic information. The experiment proceeded as follows: - RNase A was treated with 8 M urea (to disrupt non-covalent interactions) and beta-mercaptoethanol (to break the four disulfide bonds connecting the eight cysteine residues), fully denaturing the enzyme. - Upon slow, careful removal of both urea and beta-mercaptoethanol through dialysis, the protein spontaneously refolded to its native, fully active conformation, recovering 100% enzymatic activity and reforming the correct disulfide bonds. - When beta-mercaptoethanol was removed but urea was retained, only ~1% of activity was recovered. This was attributed to the random formation of disulfide bridges among the 8 cysteines (105 possible pairings), consistent with roughly 1/105 chance of forming the correct native set. - A completely inactive RNase with scrambled disulfides recovered native activity upon incubation with catalytic amounts of beta-mercaptoethanol, which allowed productive reshuffling of the disulfide bonds. THREE CONDITIONS FOR UNIQUE NATIVE STRUCTURE: 1. Uniqueness: The sequence has no other configuration with comparable free energy. 2. Stability: Small changes in environment do not cause large structural changes. 3. Kinetic accessibility: The folding pathway allows reaching the minimum in biologically relevant time. EXCEPTIONS TO ANFINSEN'S DOGMA: - Some proteins require chaperone assistance (though chaperones do not alter the final state, only prevent aggregation during folding). - Prion diseases: PrP can adopt alternative stable conformations (PrPSc) that propagate through template-directed misfolding. - Amyloid diseases (Alzheimer's, Parkinson's) involve proteins adopting stable non-native aggregated conformations. Source: Profiles in Science, NLM https://profiles.nlm.nih.gov/spotlight/kk/feature/protein Source: The Anfinsen Dogma: Intriguing Details Sixty-Five Years Later, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC9318638/ 1.2 LEVINTHAL'S PARADOX ------------------------- In 1969, Cyrus Levinthal formulated a thought experiment: given the many degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. QUANTITATIVE BASIS: - Each amino acid residue has two primary backbone torsion angles (phi, psi). - A 100-residue protein has 200 such angles. - Assuming ~3 staggered conformations per torsion angle: 9 conformers per residue, yielding 9^100 ~ 10^95 total conformations. - Levinthal's original estimate: ~10^300 possible conformations. - At least 2^100 conformations for a typical 100-residue globular domain. - Sampling at picosecond rates: exploring all 2^100 conformations would require ~10^10 years (far exceeding the age of the universe). THE PARADOX: Most small proteins fold spontaneously on millisecond or even microsecond timescales. The enormous gap between theoretical random-search time and actual folding time constitutes the paradox. RESOLUTION: Levinthal himself suggested that "protein folding is sped up and guided by the rapid formation of local interactions which then determine the further folding of the peptide." Modern resolution comes from the energy landscape / funnel theory (Section 2), which shows that proteins do not search randomly but are guided by a funneled energy landscape biased toward the native state. Source: Levinthal's paradox, Wikipedia https://en.wikipedia.org/wiki/Levinthal%27s_paradox Source: Solution of Levinthal's Paradox, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC7072185/ Source: Protein folding problem: enigma, paradox, solution, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC9842845/ 1.3 THE HYDROPHOBIC EFFECT AS THE PRIMARY DRIVING FORCE --------------------------------------------------------- The hydrophobic effect is widely regarded as the major driving force in protein folding: the tendency for hydrophobic (nonpolar) side chains to isolate themselves from contact with water, becoming buried in the protein interior. THERMODYNAMIC MECHANISM: - Water molecules form ordered "cages" (clathrate-like structures) around exposed hydrophobic residues, reducing water's rotational entropy. - When the protein folds and hydrophobic residues become buried, these ordered water cages break apart, releasing water molecules and increasing the entropy of the solvent system. - At room temperature, the hydrophobic effect is primarily entropy-driven. - The enthalpic component is actually favorable (strengthened water-water hydrogen bonds in the solvation shell) but small compared to the entropic contribution. - "Hydrophobic collapse" is often the earliest event in protein folding. Source: Towards a structural biology of the hydrophobic effect, Scientific Reports (2016) https://www.nature.com/articles/srep28285 Source: Hydrophobic Effect, ScienceDirect Topics https://www.sciencedirect.com/topics/materials-science/hydrophobic-effect ================================================================================ 2. ENERGY LANDSCAPES AND FUNNEL THEORY ================================================================================ 2.1 THE ENERGY LANDSCAPE PERSPECTIVE -------------------------------------- The energy landscape theory of protein folding, developed primarily by Bryngelson, Wolynes, Onuchic, and colleagues, provides a statistical description of a protein's potential energy surface. FOUNDATIONAL WORK: - 1987: Bryngelson & Wolynes applied the random energy model (REM), developed by Derrida for spin glass systems, to protein folding. - 1995: Bryngelson, Onuchic, Socci, and Wolynes published the landmark synthesis "Funnels, pathways, and the energy landscape of protein folding." KEY CONCEPTS: FUNNEL HYPOTHESIS: The folding energy landscape is shaped like a funnel, with the native state at the bottom. The width of the funnel represents conformational entropy (many unfolded states at the top, few near-native states at the bottom), while the depth represents energy (native contacts stabilize lower-energy conformations). RUGGED LANDSCAPE: A typical random amino acid sequence has a rough energy landscape riddled with deep metastable minima, resembling a spin glass. Conflicts between different choices of favorable interactions inevitably arise in sufficiently long random sequences. FUNNELED LANDSCAPE: Natural proteins have been selected by evolution to have smooth, funnel-shaped landscapes biased toward the native structure. The most realistic model of a protein is a "minimally frustrated heteropolymer with a rugged funnel-like landscape." ENSEMBLE VIEW: Folding occurs through organizing an ensemble of structures rather than through only a few uniquely defined structural intermediates. Source: Funnels, pathways, and the energy landscape of protein folding: a synthesis. Bryngelson et al. (1995) https://pubmed.ncbi.nlm.nih.gov/7784423/ Source: Theory of Protein Folding: The Energy Landscape Perspective https://www.researchgate.net/publication/13879984 Source: Evolution, Energy Landscapes and the Paradoxes of Protein Folding, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC4472606/ 2.2 THE PRINCIPLE OF MINIMAL FRUSTRATION ----------------------------------------- Natural proteins correspond to sequences that have been selected by evolution to have more consistently stabilizing interactions throughout the natively structured molecule than a typical random sequence. Natural proteins are "minimally frustrated" compared to random heteropolymers. THE Tf/Tg RATIO: The folding transition temperature (Tf) is compared to the glass transition temperature (Tg). A high Tf/Tg ratio indicates faster folding and fewer kinetic traps. - Wolynes and Onuchic found Tf/Tg ~ 1.6 for funneled protein energy landscapes, indicating the landscape is quite smooth. - Chan has argued the ratio may be much larger, perhaps Tf/Tg ~ 10, corresponding to a very highly funneled landscape. - A Tf/Tg ratio of 1.6 clearly indicates that natural proteins have much smoother energy landscapes than random heteropolymers. EXPERIMENTAL VALIDATION: All studied proteins within a family fold with very similar rates but unfold with rates differing by up to three orders of magnitude. Unfolding rates correlate well with thermodynamic stability. Proteins that unfold slower are more resistant to proteolysis. These results provide direct experimental support for the minimal frustration hypothesis. Source: Evidence for the principle of minimal frustration, PNAS (2017) https://www.pnas.org/doi/10.1073/pnas.1613892114 Source: Fuzziness and Frustration in the Energy Landscape, Accounts of Chemical Research (2021) https://pubs.acs.org/doi/10.1021/acs.accounts.0c00813 ================================================================================ 3. GEOMETRIC AND TOPOLOGICAL ASPECTS OF PROTEIN STRUCTURE ================================================================================ 3.1 PROTEIN KNOTS AND TOPOLOGICAL COMPLEXITY ---------------------------------------------- Among proteins of known 3D structure, a small but significant subset possesses complex topological features such as knotted or interlinked (catenated) protein backbones. TYPES OF TOPOLOGICAL FEATURES: - Trefoil knots (3_1): The most common type of knot found in proteins. Most are "deep" trefoil knots. - Slipknots: Unknotted structures containing a knotted subchain. The chain forms a knot that is effectively undone when the terminus doubles back, like a tied shoelace. - More complex knots (4_1, 5_2, 6_1) have been discovered in rare cases. FUNCTIONAL SIGNIFICANCE: - High conservation of knotted motifs and their location (usually) in enzymatic active sites indicates knots are crucial for function. - Knots provide extra stability to protein chains. - The KnotProt database catalogs proteins with knots and slipknots. FOLDING MECHANISM OF KNOTTED PROTEINS: Due to topological constraints, knotted proteins fold through a three-state mechanism containing: 1. A precise nucleation site creating a correctly twisted native loop. 2. A rate-limiting free energy barrier traversed by two parallel knot-forming routes (threading or loop-flipping). Source: Knotted and topologically complex proteins, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC2179896/ Source: Topological knots and links in proteins, PNAS (2017) https://www.pnas.org/doi/10.1073/pnas.1615862114 Source: Slipknotting upon native-like loop formation, PNAS (2010) https://www.pnas.org/doi/10.1073/pnas.1009522107 3.2 PROTEIN FOLDING RATE AND GEOMETRY/TOPOLOGY ------------------------------------------------ The folding rate of proteins correlates with geometric and topological properties of the native state: - Contact order: Average sequence separation of residue pairs in contact in the native structure. Higher contact order correlates with slower folding. - Long-range order: Related measure of non-local contacts. - Correlation coefficients of 0.75 or higher observed between contact order and folding/unfolding rates. Source: The protein folding rate and the geometry and topology of the native state, Scientific Reports (2022) https://www.nature.com/articles/s41598-022-09924-0 ================================================================================ 4. SECONDARY STRUCTURE: ALPHA HELICES AND BETA SHEETS ================================================================================ 4.1 HISTORICAL DISCOVERY -------------------------- Linus Pauling, Robert Corey, and Herman Branson proposed the alpha-helix and beta-sheet structures in landmark PNAS papers in the spring of 1951. They deduced these from: - Properties of small molecules from crystal structures. - Pauling's resonance theory predicting planar peptide groups. - Precise bond dimensions from crystallographic data. - The requirement for linear hydrogen bonds of length 2.72 Angstroms. The pivotal insight came in early spring 1948 when Pauling, bedridden with a cold, drew a polypeptide chain on a strip of paper and folded it into a helix maintaining planar peptide bonds. After a few attempts he produced a model with physically plausible hydrogen bonds. Source: The discovery of the alpha-helix and beta-sheet, PNAS (2003) https://www.pnas.org/doi/10.1073/pnas.2034522100 4.2 ALPHA HELIX GEOMETRY -------------------------- STRUCTURAL PARAMETERS: - Right-handed helical structure. - 3.6 amino acid residues per turn. - Pitch (rise per turn): 5.4 Angstroms. - Rise per residue: 5.4 / 3.6 = 1.5 Angstroms. - Each residue corresponds to a 100-degree turn in the helix. - Typical phi/psi angles: approximately -60 degrees / -50 degrees. HYDROGEN BONDING PATTERN: - Every backbone N-H group forms a hydrogen bond with the C=O group of the amino acid four residues earlier: the i+4 -> i pattern. - Every mainchain C=O and N-H group is hydrogen bonded. - This creates a very regular, stable arrangement. Source: Alpha helix, Wikipedia https://en.wikipedia.org/wiki/Alpha_helix 4.3 BETA SHEET GEOMETRY ------------------------- STRUCTURAL PARAMETERS: - Beta strands: stretches of polypeptide chain typically 3-10 amino acids long in an extended conformation. - Strands connected laterally by at least 2-3 backbone hydrogen bonds. - Overall sheet is generally twisted and pleated. ANTIPARALLEL vs. PARALLEL: - Antiparallel: successive strands alternate direction (N-terminus adjacent to C-terminus of next strand). Optimal phi/psi: -139/+135 degrees. Hydrogen bonds are planar (preferred orientation) = stronger and more stable. - Parallel: all N-termini oriented in same direction. Optimal phi/psi: -119/+113 degrees. Hydrogen bonds are non-planar = slightly less stable. STRAND TWIST: - Each residue rotates ~30 degrees in a right-handed sense. - Individual beta strands have a right-handed twist. - The overall beta sheet has a left-handed twist. - Width of a six-stranded sheet: approximately 25 Angstroms. SIDE CHAIN ARRANGEMENT: - Side chains point outward from the folds of the pleats, roughly perpendicular to the plane of the sheet. - Successive residues point outward on alternating faces. Source: Beta sheet, Wikipedia https://en.wikipedia.org/wiki/Beta_sheet Source: Beta-Sheet Geometry, Birkbeck College https://www.cryst.bbk.ac.uk/PPS95/course/3_geometry/sheet.html ================================================================================ 5. PHI/PSI ANGLES AND RAMACHANDRAN PLOTS ================================================================================ 5.1 THE RAMACHANDRAN PLOT --------------------------- Developed in 1963 by G.N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, the Ramachandran plot is a two-dimensional representation of the backbone dihedral angles phi (rotation around N-Calpha bond) and psi (rotation around Calpha-C bond) for amino acid residues in protein structures. - Horizontal axis: phi values (-180 to +180 degrees). - Vertical axis: psi values (-180 to +180 degrees). - Allowed regions determined primarily by steric constraints (van der Waals radii). - White/disallowed areas: atoms come closer than the sum of their van der Waals radii. SECONDARY STRUCTURE REGIONS ON THE PLOT: - Alpha helices: bottom-left quadrant, phi ~ -60, psi ~ -45. - Beta sheets: upper-left quadrant, phi ~ -135, psi ~ +135. - Left-handed alpha helix: upper-right quadrant (rare, energetically unfavorable for L-amino acids). SPECIAL AMINO ACIDS: - Glycine: Has only a hydrogen as its side chain (smallest van der Waals radius), so the allowable area is considerably larger. Glycine is the least restricted amino acid. - Proline: The cyclic pyrrolidine ring restricts phi to approximately -63 degrees, limiting the number of possible phi/psi combinations. Source: Ramachandran plot, Wikipedia https://en.wikipedia.org/wiki/Ramachandran_plot Source: Tutorial: Ramachandran principle and phi psi angles, Proteopedia https://proteopedia.org/wiki/index.php/Tutorial:Ramachandran_principle_and_phi_psi_angles ================================================================================ 6. TERTIARY AND QUATERNARY STRUCTURE ================================================================================ 6.1 FORCES STABILIZING TERTIARY STRUCTURE ------------------------------------------- The three-dimensional fold of a protein is stabilized by multiple types of non-covalent and covalent interactions: HYDROPHOBIC INTERACTIONS: The primary driving force. Nonpolar side chains cluster in the protein interior, away from water. HYDROGEN BONDS: Between backbone groups (defining secondary structure) and between side chains. Typical H-bond length: 2.7-3.1 Angstroms. VAN DER WAALS FORCES: Weak, short-range attractions from transient dipoles. Individually weak but cumulatively substantial due to dense packing in the protein interior. IONIC INTERACTIONS (SALT BRIDGES): Bonds between oppositely charged residues. Although rare, they can approach the strength of covalent bonds and are potent electrostatic attractions. DISULFIDE BONDS: Covalent bonds between sulfhydryl groups of two cysteine residues. Substantially stronger than non-covalent interactions. Can connect different regions within a single chain or join separate chains. Source: Biochemistry, Tertiary Protein Structure, StatPearls/NCBI https://www.ncbi.nlm.nih.gov/books/NBK470269/ Source: Forces Stabilizing Proteins, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC4116631/ 6.2 QUATERNARY STRUCTURE -------------------------- Quaternary structure: the arrangement and interaction of two or more folded polypeptide subunits. OLIGOMER TYPES: - Dimer (M2), trimer (M3), tetramer (M4), and higher oligomers. - Most common: dimers and tetramers. - Homo-oligomers (identical subunits) and hetero-oligomers. SYMMETRY: - Cyclic symmetry (Cn): rotational symmetry about a single axis. - Dihedral symmetry (Dn): rotational symmetry about two axes. - Cubic symmetries (tetrahedral, octahedral, icosahedral). - Tetramers of 222 symmetry: "dimer of dimers." - Hexamers of 32 point group symmetry: "trimer of dimers" or "dimer of trimers." ASSEMBLY MECHANISM: Direct interaction of two nascent proteins emerging from nearby ribosomes appears to be a general mechanism for oligomer formation. Hundreds of protein oligomers assemble in human cells by such co-translational interaction. Source: Protein quaternary structure, Wikipedia https://en.wikipedia.org/wiki/Protein_quaternary_structure ================================================================================ 7. COLLAGEN AND SPECIALIZED STRUCTURES ================================================================================ 7.1 COLLAGEN TRIPLE HELIX ---------------------------- STRUCTURE: - Three left-handed helical strands twist to form a right-handed triple helix. - Repetitious amino acid sequence: Glycine-X-Y, where X and Y are frequently proline or hydroxyproline. - 3.3 residues per turn. - Synthetic peptides: close to 7/2 helical symmetry (20.0 Angstrom axial repeat). - Native tendon collagen: looser 10/3 helical symmetry (28.6 Angstrom axial repeat). GLYCINE REQUIREMENT: Glycine must be every third residue. Its small hydrogen side chain is the only one that fits in the crowded interior of the triple helix. The three chains are hydrogen bonded via peptide NH groups of glycine residues as donors, with CO groups of residues on adjacent chains as acceptors. PROLINE AND HYDROXYPROLINE: - Pyrrolidine rings of proline and hydroxyproline stabilize each strand via steric repulsion, keeping the chain in the extended helical form. - Hydroxyproline in the Y position increases thermal stability through water-mediated hydrogen bonds that stitch together the triple helix. - Proline ring conformations flip between endo and exo on nanosecond timescales. GOLDEN RATIO CONNECTION: The triple helix of collagen has been noted to relate to the golden gnomon through angles 108 degrees and 36 degrees, seen in the basic structure given by pi/5 + 3pi/5 + 3pi/5 + pi/5 = 10pi/5, where pi/5 is the acute angle of the golden gnomon. Source: Collagen helix, Wikipedia https://en.wikipedia.org/wiki/Collagen_helix Source: COLLAGEN STRUCTURE AND STABILITY, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC2846778/ ================================================================================ 8. SELF-ASSEMBLY AND SELF-ORGANIZATION IN PROTEINS ================================================================================ 8.1 PRINCIPLES OF PROTEIN SELF-ASSEMBLY ----------------------------------------- Protein self-assembly is a fundamental biological process where proteins spontaneously organize into complex functional structures without external direction. DRIVING FORCES: - Non-covalent interactions: hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, pi-stacking interactions, electrostatic interactions. - Self-organization: the selective and spontaneous formation of well-ordered structures from within a complex mixture. EXAMPLES IN NATURE: - Tubular alpha-hemolysin. - Helical tobacco mosaic virus (TMV). - Polyhedral carboxysome. - Well-differentiated bacteriophage T4. - Viral capsids (icosahedral assemblies from identical subunits). HIERARCHICAL ASSEMBLY: Most proteins execute their biological functions as protein clusters formed through self-assembly with sophisticated topological structures. Proteins can form various supramolecular assemblies spontaneously or under the influence of self-assembly triggers. Source: Hierarchical Self-Assembly of Proteins, Frontiers in Bioengineering (2020) https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2020.00295/full Source: Molecular self-assembly, Wikipedia https://en.wikipedia.org/wiki/Molecular_self-assembly 8.2 CO-TRANSLATIONAL FOLDING ------------------------------ Many proteins fold co-translationally as they emerge from the ribosome exit tunnel. VECTORIAL NATURE: - Folding proceeds from N-terminus to C-terminus, matching the direction of translation. - Folding can start before synthesis is complete. - The narrow exit tunnel enhances secondary structure formation. - The ribosome can facilitate compaction, induce intermediates not seen in solution, or delay folding onset. NON-NATIVE INTERMEDIATES: The vectorial nature of protein folding inside the tunnel favors local interactions, inducing co-translational folding intermediates that do not form upon protein refolding in solution. Source: How the ribosome shapes cotranslational protein folding, Current Opinion in Structural Biology (2024) https://www.sciencedirect.com/science/article/pii/S0959440X23002142 Source: Cotranslational Folding of Proteins on the Ribosome, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC7023365/ ================================================================================ 9. GOLDEN RATIO AND FIBONACCI PATTERNS IN PROTEIN STRUCTURE ================================================================================ 9.1 GOLDEN RATIO IN PROTEIN STRUCTURES ----------------------------------------- The golden ratio (phi = (1+sqrt(5))/2 = 1.618033...) has been documented in several contexts within protein and biological macromolecular structure. ALPHA HELIX CONNECTIONS: - Recent research identifies at least five facets relating the alpha helix to the golden ratio concept, involving quantum physics equations and the standard model of physics. - The heptad motif in coiled coils shows helix circle angles with a fit to the Golden Ratio Concept: the angle 102.86 degrees relates to pi/5 radians (= arccos(phi/2)) with approximately 4.9% tolerance. - Connections to the Boerdijk-Coxeter helix and 7-vertex tetrablock helix are documented. COLLAGEN TRIPLE HELIX: The triple helix geometry relates to the golden gnomon through angles pi/5 and 3pi/5 (36 degrees and 108 degrees). DNA STRUCTURE: B-DNA contains ratios structured around the golden ratio (1.618) in the ratio of length to width of one turn of the helix, the spacing of the two helices, and in the axial structure with ten-fold rotational symmetry. ENERGY LANDSCAPE CONNECTION: The frustration ratio Tf/Tg for funneled protein energy landscapes is approximately 1.6 (Wolynes and Onuchic), close to the golden ratio. When proteins unfold, the degree of departure is characterized by a ratio of approximately 1.6. PROTEIN DENATURATION: In mathematical models of protein denaturation, the golden ratio emerges as a design principle driving denaturation in early stages of unfolding. ICOSAHEDRAL SYMMETRY: The golden ratio is fundamentally embedded in icosahedral geometry (see Section 19). In a regular pentagon, all diagonals and sides are in golden ratio to each other. The icosahedron, which governs viral capsid geometry, contains pentagons at its vertices. Source: Fibonacci Numbers and the Golden Ratio in Biology, Physics... (arXiv) https://arxiv.org/abs/1801.01369 Source: Conjecture on the Design of Helical Proteins, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC8086193/ Source: The Alpha Helix... Related to the Golden Ratio in at least Five Facets https://www.researchgate.net/publication/380951687 9.2 FIBONACCI PATTERNS IN BIOLOGICAL STRUCTURES ------------------------------------------------- Fibonacci numbers (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...) appear widely in biological systems. The ratio of successive Fibonacci numbers converges to the golden ratio. PHYLLOTAXIS: - Plant leaves, seeds, and petals arrange in spiral patterns where the numbers of clockwise and counterclockwise spirals are consecutive Fibonacci numbers. - The divergence angle stabilizes near 137.5 degrees (the golden angle). - This pattern arises from local growth rules: each new element is placed where there is the largest gap, naturally producing Fibonacci spacing. - Laboratory reproduction: elastically mismatched bi-layer structures produce Fibonacci spiral patterns, explaining the widespread pattern in plants. PROTEIN MODELS: - AB protein models use Fibonacci series sequence lengths: 13, 21, 34, 55, 89 residues. - Recent research investigates Fibonacci patterns and golden ratio principles in proteinoid-based systems, examining their impact on structural organization. SELF-ASSEMBLY AND ENERGY OPTIMIZATION: Fibonacci numbers and the golden ratio appear in nearly all domains of science where self-organization processes are at play or expressing minimum energy configurations. Source: A new mathematical model of phyllotaxis, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC10764405/ Source: On the Response of Proteinoid Ensembles to Fibonacci Sequences, ACS Omega https://pubs.acs.org/doi/10.1021/acsomega.4c10571 ================================================================================ 10. LATTICE MODELS OF PROTEIN FOLDING ================================================================================ 10.1 THE HP (HYDROPHOBIC-POLAR) MODEL --------------------------------------- The HP model is one of the most popular simplified models for studying protein folding computationally: - Amino acids divided into two classes: Hydrophobic (H) and Polar (P). - The heteropolymer is modeled as a self-avoiding walk on a lattice (typically square or simple cubic). - Hydrophobic interactions are the main driving force for folding. - The optimal conformation maximizes H-H contacts. LATTICE VARIANTS: - 2D square lattice (simplest). - 3D simple cubic lattice. - FCC (face-centered cubic) lattice. - 2D hexagonal lattice: alleviates the "sharp turn" problem and models certain aspects of secondary structure more realistically. - Hexagonal lattice with diagonals. COMPUTATIONAL COMPLEXITY: Finding the optimal conformation in the HP model is NP-complete, even on the simplest lattices. COMPUTATIONAL APPROACHES: - Approximation algorithms (partitioning strategies). - Evolutionary algorithms with lattice rotations and generalized move sets. - Monte Carlo methods (Wang-Landau sampling with pull moves). - Hybrid approaches (Cuckoo Search combined with Hill Climbing). Source: The HP model of protein folding: A challenging testing ground, ScienceDirect https://www.sciencedirect.com/science/article/abs/pii/S0010465508000271 Source: An effective evolutionary algorithm for protein folding on 3D FCC HP model, PMC https://https.ncbi.nlm.nih.gov/pmc/articles/PMC3908773/ ================================================================================ 11. COMPUTATIONAL APPROACHES ================================================================================ 11.1 ALPHAFOLD ---------------- AlphaFold is an artificial intelligence program developed by DeepMind (a subsidiary of Alphabet) for protein structure prediction using deep learning. ALPHAFOLD 1 (2018) - CASP13: - Placed first in overall rankings. - Best prediction for 25 out of 43 most difficult targets. - Median GDT score of 58.9 (vs. 52.5 and 52.4 for next two teams). ALPHAFOLD 2 (2020) - CASP14: - Achieved scores above 90 GDT for approximately two-thirds of proteins. - Models often indistinguishable from experimental structures. - Consensus: the structure prediction problem for single protein chains has been essentially solved. - Trained on >170,000 proteins from the Protein Data Bank. - Uses attention network deep learning architecture. - Median error: less than 1 Angstrom. ALPHAFOLD 3 (2024): - Announced May 8, 2024. - Predicts structure of complexes: proteins with DNA, RNA, ligands, ions. - Minimum 50% improvement in accuracy for protein-other molecule interactions compared to existing methods. 2024 NOBEL PRIZE IN CHEMISTRY: - Demis Hassabis and John Jumper of Google DeepMind shared one half "for protein structure prediction." - David Baker of the University of Washington received the other half "for computational protein design." - Predictions available through the AlphaFold Protein Structure Database (used by >2 million scientists in 190 countries). Source: AlphaFold, Wikipedia https://en.wikipedia.org/wiki/AlphaFold Source: Highly accurate protein structure prediction with AlphaFold, Nature (2021) https://www.nature.com/articles/s41586-021-03819-2 Source: Chemistry Nobel goes to developers of AlphaFold AI, Nature (2024) https://www.nature.com/articles/d41586-024-03214-7 11.2 ROSETTA -------------- Rosetta was introduced by the Baker laboratory at the University of Washington in 1998 as an ab initio approach to structure prediction. COMPUTATIONAL METHOD: The distribution of conformations observed for each short sequence segment in known structures is taken as an approximation of the local conformations that segment would sample during folding. The program searches for the combination with lowest overall energy. APPLICATIONS: - Ab initio structure prediction. - Protein design and stabilization. - Protein-protein and protein-ligand docking. - Antibody engineering. - Enzyme design. - Biomolecular materials engineering. LANDMARK: DESIGN OF TOP7: The first computationally created protein with a completely new fold, expanding the scope of protein engineering from modifying natural proteins to creating entirely new ones. ROSETTAFOLD: A three-track neural network architecture for protein structure prediction, making accurate predictions accessible to all. Source: Practically Useful: What the Rosetta Protein Modeling Suite Can Do, Biochemistry https://pubs.acs.org/doi/10.1021/bi902153g Source: RoseTTAFold: Accurate protein structure prediction, IPD https://www.ipd.uw.edu/2021/07/rosettafold-accessible-to-all/ 11.3 MOLECULAR DYNAMICS SIMULATIONS -------------------------------------- FORCE FIELDS: - AMBER: Highly effective for protein research, particularly protein-ligand interactions and folding. Parameters from experimental data and quantum chemical calculations. - CHARMM: Widely used for proteins, nucleic acids, and lipids. Detailed parameters for amino acid residues. - OPLS: Optimized Potentials for Liquid Simulations. - Modified versions (CHARMM22*, ff99SB-ILDN*) better reproduce experimental data per Lindorff-Larsen benchmarks. ANTON SUPERCOMPUTER (D.E. SHAW RESEARCH): - Special-purpose hardware for molecular dynamics. - First simulations reaching millisecond timescales (100x beyond previous state of the art). - Lindorff-Larsen, Piana, Dror, and Shaw (2011): Simulated 12 structurally diverse proteins spanning all three major structural classes over 100 microsecond to 1 millisecond timescales. - All 12 proteins spontaneously and repeatedly folded to their experimentally determined native structures using a single physics-based energy function. Source: How Fast-Folding Proteins Fold, Science (2011) https://www.science.org/doi/abs/10.1126/science.1208351 Source: Millisecond-scale molecular dynamics simulations on Anton https://dl.acm.org/doi/10.1145/1654059.1654099 11.4 FOLDING@HOME ------------------- Distributed computing project (est. 2000, Vijay Pande lab, Stanford; now led by Gregory Bowman at UPenn) enabling volunteers to contribute personal computing power for protein folding simulations. KEY CONTRIBUTIONS: - Markov state models (MSMs): Network models of protein conformational ensembles, with states as free energy minima and links as transition probabilities. - First petascale computer (Guinness record). - Became world's first exascale computer. - Major contributions to understanding SARS-CoV-2 during the pandemic. Source: Folding@home: Achievements from over 20 years, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC10398258/ ================================================================================ 12. CHAPERONE-ASSISTED FOLDING AND QUALITY CONTROL ================================================================================ 12.1 MOLECULAR CHAPERONE FAMILIES ----------------------------------- Molecular chaperones assist in correct protein folding, prevent aggregation, and facilitate degradation of misfolded proteins. MAJOR FAMILIES: - HSP40 (J-domain proteins): Co-chaperones for HSP70. - HSP60 (Chaperonins): Provide isolated folding chambers. - HSP70: Central hub of protein quality control. - HSP90: Late-stage folding and signaling protein maturation. - HSP100: Disaggregases for clearing aggregates. - Small HSPs (sHSPs): Holdase function, prevent aggregation. - CCT (chaperonin-containing TCP-1): Eukaryotic group II chaperonin. 12.2 HSP70 SYSTEM ------------------- HSP70 is a central hub in the protein quality control network. It collaborates with J-domain co-chaperones and nucleotide exchange factors (BAG1, HSPBP1) to facilitate folding. MECHANISM: HSP70 remodels the energy landscape of its substrate proteins, accelerating folding by smoothing the landscape and preventing kinetic traps. Source: Energy landscape remodeling mechanism of Hsp70, PMC (2021) https://pmc.ncbi.nlm.nih.gov/articles/PMC8204389/ 12.3 GroEL/GroES CHAPERONIN SYSTEM ------------------------------------- The most widely studied chaperonin system in E. coli: REACTION CYCLE: 1. Non-native substrate protein binds to the trans ring of GroEL. 2. ATP-dependent GroES binding encapsulates the substrate within the cavity. 3. Inside the cavity, the protein folds in a protected environment. 4. 7 ATP molecules are hydrolyzed in the cis ring. 5. Upon hydrolysis, ADP and GroES release. 6. Folded protein exits the cavity. FUNCTIONAL ROLE: GroEL/ES can accelerate the folding rate and rescue proteins that would otherwise be trapped in misfolded states. It makes the otherwise impossible folding of certain proteins possible. Source: GroEL-Mediated Protein Folding: Making the Impossible, Possible, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC3783267/ 12.4 UBIQUITIN-PROTEASOME SYSTEM (PROTEIN QUALITY CONTROL) ------------------------------------------------------------- Approximately one-third of newly synthesized polypeptides never reach the mature protein stage. UBIQUITIN-PROTEASOME PATHWAY (UPP): - Misfolded proteins are tagged with polyubiquitin chains. - Polyubiquitinated proteins are directed to the 26S proteasome for degradation and amino acid recycling. - Provides degradation of dysfunctional and misfolded proteins to maintain cellular homeostasis. UNFOLDED PROTEIN RESPONSE (UPR): - Activated when misfolded/unfolded proteins accumulate in the endoplasmic reticulum (ER). - Triggered by heat stress, oxidative stress, hypoxic stress, DNA damage. - ERAD pathway (ER-associated degradation): transports unfolded proteins out of the ER by retro-translocation for degradation. Source: Protein quality control and elimination of protein waste, ScienceDirect https://www.sciencedirect.com/science/article/pii/S0167488913002656 ================================================================================ 13. PROTEIN MISFOLDING DISEASES ================================================================================ 13.1 OVERVIEW -------------- Protein misfolding and aggregation is closely associated with the accumulation of toxic proteins causing many neurodegenerative and systemic diseases. 13.2 PRION DISEASES --------------------- Diseases: Creutzfeldt-Jakob disease (CJD), bovine spongiform encephalopathy (BSE), scrapie, chronic wasting disease, kuru. MECHANISM: - The cellular prion protein PrPC (alpha-helix-rich) converts to the disease-associated PrPSc (beta-sheet-rich). - PrPSc acts as an autocatalytic template, inducing PrPC to adopt the pathogenic conformation. - PrPC structure: disordered N-terminal tail (residues 23-124) and a structured C-terminal region (125-230) with three alpha-helices and a short antiparallel beta-sheet. - Conversion involves antiparallel beta-strand addition near helix 1. - Alternative model: parallel beta-helix requiring disintegration of the antiparallel beta-sheet. Source: Prion Protein Misfolding, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC3330701/ 13.3 ALZHEIMER'S DISEASE -------------------------- Two hallmark misfolded proteins: AMYLOID BETA (A-beta): - Cleaved from amyloid precursor protein (APP). - Forms extracellular amyloid plaques. - Aggregates via the cross-beta spine / steric zipper mechanism (see 13.5). TAU PROTEIN: - Normally stabilizes microtubules. - Hyperphosphorylation causes tau to dissociate from microtubules. - Free hyperphosphorylated tau aggregates into paired helical filaments (PHFs) and straight filaments (SFs). - PHF structure: two identical protofilaments with a C-shaped fold, packed base-to-base with C2 symmetry. - SFs: protofilaments pack back-to-back in a nonsymmetrical arrangement. - Neurofibrillary tangles (NFTs) form intracellularly, disrupting axonal transport and leading to cell death. Source: The Role of Tau in Alzheimer's Disease, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC4072215/ 13.4 PARKINSON'S DISEASE -------------------------- - Characterized by accumulation of alpha-synuclein protein in the brain. - Alpha-synuclein misfolds and forms Lewy bodies (intracellular aggregates). - Prion-like propagation: pathogenic conformations can spread cell-to-cell. 13.5 AMYLOID FIBRIL STRUCTURE ------------------------------- All amyloid fibrils share a common cross-beta spine structure: CROSS-BETA SPINE: - A pair of beta-sheets, with side chains from the two sheets interdigitated in a dry "steric zipper." - Each sheet formed from parallel segments stacked in register. - The dry interface has unusually high shape complementarity (Sc = 0.86). - 13 different steric zipper structures resolved at atomic resolution. PROTEINS WITH KNOWN STERIC ZIPPERS: Amyloid-beta, tau, PrP prion protein, insulin, islet amyloid polypeptide (IAPP), lysozyme, myoglobin, alpha-synuclein, beta2-microglobulin. Source: Atomic structures of amyloid cross-beta spines, Nature (2007) https://www.nature.com/articles/nature05695 Source: Structure of the cross-beta spine, Nature (2005) https://www.nature.com/articles/nature03680 ================================================================================ 14. PROTEIN FOLDING KINETICS AND THERMODYNAMICS ================================================================================ 14.1 TWO-STATE FOLDING KINETICS ---------------------------------- Many small proteins fold via a simple two-state mechanism (unfolded <-> folded) with no populated intermediates. - The "unfolded state" is actually an equilibrium distribution of many unfolded or partially folded conformations. - The Foldon Funnel Model: proteins fold in units of secondary structures that form sequentially along the folding pathway. Predicts two-state behavior when secondary structures are intrinsically unstable. - Correctly describes the 9-order-of-magnitude dependence of folding rates on protein size for a set of 93 proteins. 14.2 CONTACT ORDER AND FOLDING RATES -------------------------------------- - Contact order: average sequence distance between contacting residues in the native state. - Both folding and unfolding rates correlate with contact order and long-range order, with correlation coefficients >= 0.75. - Proteins with more local contacts fold faster. Source: General Mechanism of Two-State Protein Folding Kinetics, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC5104671/ 14.3 PHI-VALUE ANALYSIS AND THE NUCLEATION-CONDENSATION MECHANISM -------------------------------------------------------------------- ALAN FERSHT'S PHI-VALUE ANALYSIS: Phi is the ratio of change of free energy of activation for folding to the equilibrium free energy of folding. Scores the extent of native-like structure formation at individual residues (scale: 0 to 1). NUCLEATION-CONDENSATION MECHANISM: - Folding initiated by formation of a marginally stable nucleus containing some correct secondary and tertiary structure interactions. - The nucleus serves as a template for rapid condensation of further structure. - Secondary and tertiary structures form in parallel (unlike the framework model where secondary structure forms first). - Evidence from CI2 (chymotrypsin inhibitor 2): moderate Phi values at the nucleus center, falling off with distance. FRAMEWORK MODEL (ALTERNATIVE): Secondary-structure elements fold first, then coalesce to form tertiary structure. Supported in some proteins. TRANSITION STATE ENSEMBLE: - Formation of the transition state is rate-determining in protein folding. - The transition state resembles an expanded, distorted native structure built around an extended nucleus. - The nucleus is weakly formed in the denatured state but develops in the transition state. Source: Phi-Value analysis and the nature of protein-folding transition states, PNAS (2004) https://www.pnas.org/doi/10.1073/pnas.0402684101 Source: How General Is The Nucleation-Condensation Mechanism? PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC2776727/ 14.4 THERMODYNAMIC STABILITY AND MARGINAL STABILITY ------------------------------------------------------ - Proteins are only "marginally stable": free energies of unfolding typically 5-15 kcal/mol (approximately 0.4 kJ/mol per amino acid). - A 100-residue protein: about -10 kcal/mol total stability. ENTHALPY-ENTROPY COMPENSATION: - The change in enthalpy during folding is almost entirely compensated by a corresponding change in entropy, resulting in a small net free energy. - Conformational entropy is reduced upon folding (unfavorable). - Solvent entropy increases (hydrophobic effect, favorable). - Internal energy from hydrogen bonds, van der Waals, etc. (favorable enthalpy). - The narrow range of delta-G values guarantees functionality and stability. COOPERATIVITY: - Cooperativity is a key indicator of protein stability. - DSC (differential scanning calorimetry) directly measures heat capacity Cp(T) and is the method of choice for thermodynamic studies. - Protein unfolding is a cooperative process with many short-lived intermediates. Source: Thermodynamics of Protein Folding and Stability, Alan Cooper https://www.chem.gla.ac.uk/staff/alanc/Protfold.pdf Source: Entropy-Enthalpy Compensations Fold Proteins in Precise Ways, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC8431812/ 14.5 THE MOLTEN GLOBULE INTERMEDIATE -------------------------------------- A compact intermediate in which tertiary structure is lost but secondary structure is intact or even strengthened. CHARACTERISTICS: - Considerable conformational mobility compared to the native state. - Lies on the kinetic (and sometimes thermodynamic) pathway between native and unfolded states. - For some proteins (e.g., cytochrome c), represents a distinct third equilibrium state. - Observable at acid pH (~2), mild denaturant, or high temperature. - Detected by ANS fluorescence enhancement (binds partially folded states). Source: A look back at the molten globule state, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC6557940/ 14.6 FOLDING SPEED LIMITS AND ULTRAFAST FOLDING -------------------------------------------------- SPEED LIMITS: Proteins are subject to physical limits from Brownian diffusion. FASTEST-FOLDING PROTEINS: - Trp-cage (20 residues): folds in ~4 microseconds at room temperature. The smallest polypeptide with truly cooperative folding behavior. - Variants: folding times range from 0.9-7.5 microseconds at 300 K. - WW domains: fold in microseconds, 10-fold range between fastest and slowest variants. - Pressure-jump experiments: refolding times of 2.1 +/- 0.7 microseconds, approaching the speed limit. DOWNHILL (BARRIERLESS) FOLDING: - Occurs when the free energy barrier vanishes. - Gradual melting of native structure permits resolving folding mechanisms at atomic resolution. - BBL protein: single-molecule experiments reveal a single conformational ensemble at all denaturing conditions, with gradual shift. - Downhill folding is not necessarily associated with ultrafast kinetics. Source: Smaller and Faster: The 20-Residue Trp-Cage, JACS (2002) https://pubs.acs.org/doi/10.1021/ja0279141 Source: Downhill, Ultrafast and Fast Folding Proteins Revised, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC7589632/ 14.7 KRAMERS THEORY AND DIFFUSION MODELS ------------------------------------------- - Protein folding dynamics can be represented by one-dimensional diffusion along a reaction coordinate. - Kramers' rate theory accounts for frictional effects from solvents. - "Kramers turnover": a maximum in folding rate at intermediate friction. - The underlying energy landscape is multidimensional, so diffusion is position-dependent (coordinate-dependent). - Configuration-dependent diffusion can shift the kinetic transition state and barrier height. Source: Diffusive model of protein folding dynamics, Physical Review Letters (2006) https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.96.228104 Source: Diffusion models of protein folding, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC3457642/ ================================================================================ 15. ALLOSTERIC TRANSITIONS AND CONFORMATIONAL CHANGE ================================================================================ 15.1 THE MONOD-WYMAN-CHANGEUX (MWC) MODEL -------------------------------------------- Conceived in 1965 to account for signal transduction and cooperative properties of regulatory enzymes and hemoglobin. CORE POSTULATE: Regulated proteins exist in different interconvertible states in the absence of any regulator. Two distinct symmetric states: "R" (relaxed, high affinity) and "T" (tense, low affinity). MECHANISM: - Equilibrium between T and R states of the unbound protein. - Ligand binding shifts the equilibrium toward R. - Produces sigmoidal binding curves (positive cooperativity). - A small change in ligand concentration causes a large shift in the R/T ratio. APPLICATIONS: Hemoglobin (paradigmatic case), allosteric enzymes, ion channels, G protein-coupled receptors (GPCRs). Source: On the Nature of Allosteric Transitions: A Plausible Model, Monod, Wyman, Changeux (1965) https://pubmed.ncbi.nlm.nih.gov/14343300/ Source: Allostery and the MWC model after 50 years https://pubmed.ncbi.nlm.nih.gov/22224598/ ================================================================================ 16. MEMBRANE PROTEIN FOLDING ================================================================================ 16.1 MECHANISMS OF INSERTION AND FOLDING ------------------------------------------ Membrane proteins are equilibrium structures that fold along thermodynamically controlled pathways, integrating hydrophobic transmembrane segments into the lipid bilayer. TRANSLOCON-MEDIATED INSERTION: - Nearly all membrane proteins require translocons (protein-conducting channels) for proper insertion. - Central component: Sec61alpha (eukaryotes) / SecY (prokaryotes). - 10 transmembrane helices arranged around a central pore with a lateral gate. - Sufficiently hydrophobic nascent chain segments cause the lateral gate to open, allowing access to the lipid bilayer. CO-TRANSLATIONAL MECHANISM: - In vivo, most alpha-helical membrane proteins fold during translation. - SRP (signal recognition particle) delivers nascent chains to the translocon. - The translocon assists insertion while translation continues. TRANSLOCON AS ACTIVE CHAPERONE: Beyond being a passive channel, the SecY translocon actively serves as a chaperone. Cytoplasmic and extracellular cavities create distinct environments for promoting unfolding and folding of transmembrane segments. SPONTANEOUS INSERTION (IN VITRO): Without a translocon, transmembrane peptides can partition into the bilayer spontaneously, driven by favorable hydrophobic interactions. LIPID BILAYER EFFECTS: Both the bilayer and the protein must adjust structurally to minimize the total free energy of the protein-plus-bilayer system. Source: Mechanisms of integral membrane protein insertion and folding, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC4339636/ Source: SecY translocon chaperones protein folding, ScienceDirect (2025) https://www.sciencedirect.com/science/article/abs/pii/S0092867425001060 ================================================================================ 17. INTRINSICALLY DISORDERED PROTEINS (IDPs) ================================================================================ 17.1 DEFINITION AND CHARACTERISTICS -------------------------------------- IDPs lack a well-defined, stable three-dimensional fold. They populate ensembles of dynamically exchanging conformations, separated by low energy barriers. SEQUENCE FEATURES: - Low content of bulky hydrophobic amino acids. - High proportion of polar and charged amino acids (low hydrophobicity). - High net charge and low mean hydrophobicity prevent stable folding. CONFORMATIONAL ENSEMBLE: The sequence of an IDP encodes a vast ensemble of spatial conformations that specify its biological function. Unlike folded proteins described by a single structure, IDPs are described by a conformational ensemble of several structural states (conformers) that interconvert rapidly. 17.2 FUNCTIONAL ROLES ----------------------- - Cell signaling processes. - Decision-making circuits in cells. - Transcriptional regulation. - Misfunction often causes human disease. STRUCTURAL ELEMENTS MEDIATING FUNCTION: 1. Short Linear Motifs (SLiMs): 3-10 residue functional peptide sequences. 2. Molecular Recognition Features (MORFs): regions that fold upon binding. 3. Intrinsically Disordered Domains (IDDs): larger functional disordered regions. Source: Intrinsically disordered proteins, Wikipedia https://en.wikipedia.org/wiki/Intrinsically_disordered_proteins Source: Introduction to Intrinsically Disordered Proteins, Chemical Reviews https://pubs.acs.org/doi/10.1021/cr500288y ================================================================================ 18. EVOLUTIONARY CONSTRAINTS ON PROTEIN STRUCTURE ================================================================================ 18.1 STRUCTURE CONSERVATION ------------------------------ - Protein folds are conserved well past sequence signal saturation. - Tertiary structures are relatively robust to sequence perturbations and evolve much more slowly than amino acid sequences. - Amino acid substitutions reflect both Darwinian selection and neutral evolution within structural and functional constraints. 18.2 FOLD SPACE IS FINITE ---------------------------- - Unlike sequence space (practically infinite), fold space is finite and small. - Unrelated domains may share the same fold due to physical constraints, functional selection, or chance (convergent evolution). - Estimated total number of protein folds: 400-10,000 (studies vary widely). - Most commonly cited estimates: ~650-2,000 naturally occurring folds. - This discrepancy reflects different definitions of "fold" and sampling biases. 18.3 STRUCTURAL PHYLOGENETICS ------------------------------- - FoldTree method: compares proteins by shape to construct evolutionary family trees over long timescales. - Side-chain rotamer states reveal that evolutionary properties depend strongly on side-chain geometry. - Protein folds are more susceptible to evolutionary convergence than amino acid sequences. Source: Structural and functional constraints in the evolution of protein families, Nature Reviews MCB https://www.nature.com/articles/nrm2762 Source: Progress towards mapping the universe of protein folds, Genome Biology https://genomebiology.biomedcentral.com/articles/10.1186/gb-2004-5-5-107 ================================================================================ 19. SYMMETRY IN PROTEIN ASSEMBLIES ================================================================================ 19.1 VIRAL CAPSID SYMMETRY ----------------------------- ICOSAHEDRAL SYMMETRY: - 5-fold, 3-fold, and 2-fold symmetry axes, yielding 60-fold redundancy. - Maximizes enclosed volume for a given subunit size. - Many viruses self-assemble spontaneously from components in vitro. CASPAR-KLUG THEORY (1962): - Quasi-equivalence principle: identical subunits occupy quasi-equivalent positions, as part of hexamers or pentamers, with minor distortions. - Triangulation number T = h^2 + k^2 + kh, where h,k are non-negative integers. Possible T values: 1, 3, 4, 7, 12, 13... - An icosahedral shell with triangulation number T comprises: * 12 pentamers * 10(T-1) hexamers * 10T + 2 total capsomers * 60T total capsid protein copies GOLDEN RATIO IN ICOSAHEDRAL GEOMETRY: The golden ratio (phi = 1.618...) is fundamentally embedded in icosahedral symmetry: - The ratio of a regular pentagon's diagonal to its side = phi. - Five of the icosahedron's 20 triangular faces meet at each vertex, forming regular pentagons. - The icosahedron can be constructed from three mutually perpendicular golden rectangles. Source: Caspar and Klug, ViralZone https://viralzone.expasy.org/8577 Source: Origin of icosahedral symmetry in viruses, PNAS (2004) https://www.pnas.org/doi/10.1073/pnas.0405844101 Source: Minimal Design Principles for Icosahedral Virus Capsids, ACS Nano https://pubs.acs.org/doi/10.1021/acsnano.1c04952 19.2 COILED COIL GEOMETRY ---------------------------- Coiled coils are unique among protein folds in being fully described by parametric equations (Crick equations). PARAMETERS: - omega_0: supercoil twist - omega_1: alpha-helical twist - R_0: supercoil radius - phi_1...phi_n: phases of individual helices - z_2...z_n: offsets along superhelical axis - Pitch, pitch angle alpha, pairwise helix-crossing angle Omega DESIGN: Combinatorial design calculations identify low-energy sequences for alternative helix supercoil arrangements, producing hyperstable proteins with precisely tuned geometries. Source: Understanding a protein fold: alpha-helical coiled coils, JBC (2023) https://www.jbc.org/article/S0021-9258(23)00221-1/fulltext ================================================================================ 20. INFORMATION CONTENT AND ENTROPY IN PROTEIN SEQUENCES ================================================================================ 20.1 SHANNON ENTROPY IN PROTEIN SEQUENCES -------------------------------------------- Shannon information entropy is widely used to measure residue diversity and conservation in protein sequence alignments. METHODOLOGY: - Calculated for each column in a multiple sequence alignment. - Incorporates both frequencies and number of possibilities. - An invariant (perfectly conserved) column has entropy = 0. - Maximum entropy for 20 amino acids: log2(20) = 4.32 bits/residue. OBSERVED VALUES: - Typical Shannon entropy: approximately 2.5 bits/amino acid (from Zipf analysis and k-tuplet analysis). - This is much smaller than the 4.18 bits/amino acid expected from the non-uniform amino acid composition alone. - The difference implies significant constraints on protein sequences beyond simple composition. CORRELATION WITH STRUCTURE: - Strong linear correlation between sequence entropy and inverse packing density at associated residue positions. - Low entropy (high conservation) correlates with buried, tightly packed residues. - High entropy (high variability) correlates with surface-exposed residues. ALTERNATIVE MEASURES: Jensen-Shannon divergence has been shown to outperform Shannon entropy for identifying functionally important residues. Source: The Shannon information entropy of protein sequences, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC1233466/ Source: Entropy Analysis of Protein Sequences Reveals a Hierarchical Organization, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC8700119/ ================================================================================ 21. FREQUENCY ANALYSIS AND VIBRATIONAL SPECTROSCOPY ================================================================================ 21.1 NORMAL MODE ANALYSIS (NMA) --------------------------------- NMA is a fast method to calculate vibrational modes and protein flexibility. ELASTIC NETWORK MODELS (ENM): - The workhorse model for protein NMA. - Considers only C-alpha atoms connected by springs (chemical bonds). - Diagonalization of the topology matrix yields normal modes of vibration. - Low-frequency modes describe large-scale biological motions. - Any conformation near equilibrium can be represented as a weighted combination of normal modes. ADVANTAGES: - Computationally inexpensive compared to molecular dynamics. - Quick, systematic investigation of flexibility and dynamics. - Applicable to large proteins and protein complexes. Source: Normal mode analysis as a method to derive protein dynamics, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC5711701/ Source: A Fresh Look at Normal Mode Analysis: Allosteric Co-Vibrational Modes, JACS Au (2024) https://pubs.acs.org/doi/10.1021/jacsau.4c00109 21.2 VIBRATIONAL SPECTROSCOPY FOR SECONDARY STRUCTURE ------------------------------------------------------- AMIDE BANDS: - Amide I band (1600-1700 cm^-1): mainly C=O stretching vibration (70-85%). Highly sensitive to secondary structure. - Amide II band: N-H bending coupled with C-N stretching. - Amide III band: C-N stretching coupled with N-H bending. INFRARED (IR) SPECTROSCOPY: - FTIR (Fourier Transform IR) used for detailed secondary structure estimation from amide I spectra. - Each secondary structure type has characteristic amide I frequencies. RAMAN SPECTROSCOPY: - UV resonance Raman (exciting within amide pi->pi* transitions at 206.5 nm) directly determines pure alpha-helix, beta-sheet, and unordered secondary structure spectra. - Calibrated against 13 proteins with known X-ray crystal structures. - Complementary and sometimes superior to CD, VCD, and absorption spectroscopy. Source: UV resonance Raman-selective amide vibrational enhancement https://pubmed.ncbi.nlm.nih.gov/9485436/ Source: Determination of the secondary structure from the amide I band, ScienceDirect https://www.sciencedirect.com/science/article/abs/pii/0022283681901273 ================================================================================ 22. PHASE TRANSITIONS IN PROTEIN SOLUTIONS ================================================================================ 22.1 LIQUID-LIQUID PHASE SEPARATION (LLPS) --------------------------------------------- LLPS occurs when a homogeneous protein solution demixes into two distinct phases, forming biomolecular condensates without lipid membrane barriers. DRIVING FORCES: - Multiple weak, multivalent interactions between macromolecules. - Strong enough to bring molecules together but not so strong as to freeze dynamics within the condensate. - Mediated by hydrophobic and non-ionic interactions. PHASE BEHAVIOR: - Upper critical solution temperature (UCST): phase separation below Tc. - Lower critical solution temperature (LCST): phase separation above Tc. - Mechanism for cells to buffer acute temperature changes. BIOLOGICAL FUNCTIONS: - Organizes cellular compartments without membrane barriers. - Forms structures like stress granules, P-bodies, nucleoli. - Recognized as a fundamental mechanism of cellular organization. PATHOLOGICAL IMPLICATIONS: - Liquid-to-solid transition of condensates can lead to pathological aggregates. - Anomalies in condensate formation linked to cancer and neurodegenerative diseases. - Connects to amyloid formation: proteins undergo LLPS then liquid-to-solid transition. Source: Formation of biological condensates via phase separation, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC6779427/ Source: Reentrant liquid condensate phase of proteins, Nature Communications https://www.nature.com/articles/s41467-021-21181-9 ================================================================================ 23. PROTEIN FOLD SPACE AND CLASSIFICATION ================================================================================ 23.1 SCOP (STRUCTURAL CLASSIFICATION OF PROTEINS) ---------------------------------------------------- Largely manual classification of protein structural domains. HIERARCHY: 1. Class: Types of folds (all-alpha, all-beta, alpha/beta, alpha+beta). 2. Fold: Different shapes of domains within a class. 3. Superfamily: Distant common ancestor (structural similarity). 4. Family: More recent common ancestor (sequence + structural similarity). 23.2 CATH (CLASS ARCHITECTURE TOPOLOGY HOMOLOGY) --------------------------------------------------- Semi-automatic classification followed by manual curation. HIERARCHY: 1. Class: Overall secondary structure content. 2. Architecture: Arrangement of secondary structures. 3. Topology: Connectivity and shape. 4. Homologous superfamily: Evolutionary relationship. COMPARISON: Despite different methodologies, SCOP and CATH identify similar numbers of fold groups and superfamilies. FunFams (functional families) provide subclassification within CATH superfamilies. Source: SCOP database in 2020, Nucleic Acids Research https://academic.oup.com/nar/article/48/D1/D376/5625529 Source: The history of the CATH structural classification, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC4678953/ ================================================================================ 24. EXPERIMENTAL METHODS FOR STRUCTURE DETERMINATION ================================================================================ 24.1 X-RAY CRYSTALLOGRAPHY ----------------------------- - Most powerful traditional tool: resolves structures at atomic resolution. - Requires protein crystallization (often the bottleneck). - Primary data: X-ray diffraction patterns. - Phasing methods: molecular replacement, Se-Met, SAD/MAD. - Limitations: crystal contacts may influence structure; provides a static picture. 24.2 NUCLEAR MAGNETIC RESONANCE (NMR) ---------------------------------------- - Non-destructive technique, works in solution. - Provides information on local conformation and interatomic distances. - Captures protein dynamics. - Size limitation: typically < ~40 kDa (though advances extend this). 24.3 CRYO-ELECTRON MICROSCOPY (CRYO-EM) ------------------------------------------ - "Resolution revolution" beginning ~2013 with improved electron detection and image processing. - Single protein molecules embedded in vitrified ice. - Images reconstructed into 3D density maps. - Resolution now rivals X-ray crystallography. - No crystallization needed; works for large complexes. INTEGRATIVE APPROACHES: These three techniques are increasingly used in combination: X-ray for high-resolution structure, NMR for dynamics in solution, cryo-EM for large complexes. Source: PDB-101: Methods for Determining Structure https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/methods-for-determining-structure Source: How cryo-EM and X-ray crystallography complement each other, PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC5192981/ ================================================================================ 25. OPEN QUESTIONS AND ACTIVE RESEARCH FRONTIERS ================================================================================ 25.1 WHAT ALPHAFOLD HAS NOT SOLVED ------------------------------------- Despite AlphaFold's success at predicting static protein structures, major challenges remain: 1. PROTEIN DYNAMICS: AlphaFold provides one prediction per protein; it cannot simulate how proteins change through time or model conformational ensembles. 2. CELLULAR CONTEXT: Proteins cannot yet be modeled in the context of the cell (crowding, post-translational modifications, membrane environment). 3. CONFORMATIONAL ENSEMBLES: The frontier is predicting conformational ensembles, not just single structures. 4. FOLDING MECHANISM: Knowing a protein's structure does not explain how it folds (the folding pathway). 5. MISFOLDING: Understanding why and how proteins misfold remains a critical open question. 6. PROTEIN-PROTEIN INTERACTIONS: Predicting transient and dynamic complexes. 7. INTRINSICALLY DISORDERED PROTEINS: AlphaFold produces low-confidence predictions for IDPs/IDRs, which are inherently dynamic. Source: AlphaFold and protein folding: Not dead yet! PMC (2025) https://pmc.ncbi.nlm.nih.gov/articles/PMC11892350/ Source: How AI Revolutionized Protein Science, but Didn't End It, Quanta Magazine (2024) https://www.quantamagazine.org/how-ai-revolutionized-protein-science-but-didnt-end-it-20240626/ 25.2 ACTIVE RESEARCH FRONTIERS --------------------------------- - Phase separation and biomolecular condensates as a new organizing principle in cell biology. - Co-translational folding and how the ribosome shapes protein structure. - De novo protein design (beyond natural fold space). - Multi-scale modeling: connecting atomic simulations to cellular-scale behavior. - Machine learning approaches beyond structure prediction (dynamics, function, evolution). - Understanding the role of water in protein folding at atomic resolution. - Protein engineering for therapeutics, materials, and catalysis. - Connecting protein misfolding to neurodegeneration for therapeutic intervention. ================================================================================ KEY RESEARCHERS AND INSTITUTIONS ================================================================================ FOUNDATIONAL CONTRIBUTORS: - Christian B. Anfinsen (NIH) - Thermodynamic hypothesis, 1972 Nobel Prize - Cyrus Levinthal (Columbia) - Levinthal's paradox - G.N. Ramachandran (IISc Bangalore) - Ramachandran plot - Linus Pauling, Robert Corey, Herman Branson (Caltech) - Alpha helix, beta sheet discovery - Jacques Monod, Jeffries Wyman, Jean-Pierre Changeux - MWC allosteric model - Donald Caspar, Aaron Klug - Quasi-equivalence theory of viral capsids ENERGY LANDSCAPE THEORY: - Peter Wolynes (Rice University) - Jose Onuchic (Rice University) - Joseph Bryngelson FOLDING MECHANISMS: - Alan Fersht (Cambridge) - Phi-value analysis, nucleation-condensation - Ken Dill (Stony Brook) - HP model, lattice models - Hue Sun Chan - Frustration ratios COMPUTATIONAL PIONEERS: - David Baker (University of Washington) - Rosetta, 2024 Nobel Prize - Demis Hassabis, John Jumper (DeepMind) - AlphaFold, 2024 Nobel Prize - David E. Shaw (D.E. Shaw Research) - Anton supercomputer - Kresten Lindorff-Larsen (University of Copenhagen) - Force field validation, millisecond MD - Vijay Pande (Stanford/a16z) - Folding@home - Gregory Bowman (UPenn) - Markov state models CHAPERONE BIOLOGY: - F. Ulrich Hartl (Max Planck) - GroEL/ES mechanism - Arthur Horwich (Yale) - Chaperonin function AMYLOID AND MISFOLDING: - David Eisenberg (UCLA) - Steric zipper structures - Stanley Prusiner (UCSF) - Prion discovery, 1997 Nobel Prize ================================================================================ QUANTITATIVE REFERENCE TABLE ================================================================================ Alpha helix pitch: 5.4 Angstroms Alpha helix residues per turn: 3.6 Alpha helix rise per residue: 1.5 Angstroms Alpha helix phi/psi angles: -60 / -50 degrees Alpha helix H-bond pattern: i -> i+4 Beta sheet phi/psi (antiparallel): -139 / +135 degrees Beta sheet phi/psi (parallel): -119 / +113 degrees Beta strand twist per residue: ~30 degrees (right-handed) Collagen residues per turn: 3.3 Collagen repeat: Gly-X-Y Collagen axial repeat (synthetic): 20.0 Angstroms (7/2 symmetry) Collagen axial repeat (native): 28.6 Angstroms (10/3 symmetry) Protein marginal stability: 5-15 kcal/mol (typical) Per-residue stability: ~0.4 kJ/mol Tf/Tg frustration ratio: ~1.6 (Wolynes/Onuchic) Golden ratio (phi): 1.618033... Levinthal conformations (100 res): ~10^95 to 10^300 Fastest folding protein: Trp-cage, ~4 microseconds (20 residues) Protein folds in nature: ~650 to ~2,000 (estimates vary) Maximum Shannon entropy (20 aa): 4.32 bits/residue Observed Shannon entropy: ~2.5 bits/amino acid PDB structures solved by method: ~85% X-ray, ~8% NMR, ~7% cryo-EM (approximate as of mid-2020s) Icosahedral capsid T numbers: 1, 3, 4, 7, 12, 13... Capsid proteins per T number: 60T copies Pentamers per capsid: 12 (always) Hexamers per capsid: 10(T-1) Steric zipper shape complementarity: 0.86 ================================================================================ END OF RESEARCH DOCUMENT Compiled from academic literature, peer-reviewed publications, and established reference sources. All sources cited inline. ================================================================================