-
J. Jumper et al., Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
-
K. A. Dill, J. L. MacCallum, The protein-folding problem, 50 years on. Science 338, 1042–1046 (2012).
-
C. M. Dobson, Protein folding and misfolding. Nature 426, 884–890 (2003).
-
J. N. Onuchic, P. G. Wolynes, Theory of protein folding. Curr. Opin. Struct. Biol. 14, 70–75 (2004).
-
M. K. Higgins, Can we alphafold our way out of the next pandemic? J. Mol. Biol. 433, 167093 (2021).
-
H. Park, P. Patel, R. Haas, E. Huerta, APACE: Alphafold2 and advanced computing as a service for accelerated discovery in biophysics. Proc. Natl. Acad. Sci. U.S.A. 121, e2311888121 (2024).
-
N. Qian, T. J. Sejnowski, Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865–884 (1988).
-
M. Minsky, S. A. Papert, Perceptrons, Reissue of the 1988 Expanded Edition with a New Foreword by Léon Bottou: An Introduction to Computational Geometry (MIT Press, 2017).
-
D. H. Ackley, G. E. Hinton, T. J. Sejnowski, A learning algorithm for Boltzmann machines. Cognit. Sci. 9, 147–169 (1985).
-
D. E. Rumelhart, J. L. McClelland, Corporate PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations (MIT Press, 1986).
-
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
-
P. Baldi, “Autoencoders, unsupervised learning, and deep architectures” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning (JMLR Workshop and Conference Proceedings, 2012), pp. 37–49.
-
J. Gui, Z. Sun, Y. Wen, D. Tao, J. Ye, A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng. 35, 3313–3332 (2021).
-
J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U.S.A. 79, 2554–2558 (1982).
-
J. Martin, M. Lequerica-Mateos, J. Onuchic, I. Coluzza, F. Morcoc, Machine learning in biological physics: From biomolecular prediction to design. Proc. Natl. Acad. Sci. U.S.A. 121, e2311807121 (2024).
-
Protein data bank, The single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47, D520–D528 (2019).
-
J. I. Sułkowska, F. Morcos, M. Weigt, T. Hwa, J. N. Onuchic, Genomics-aided structure prediction. Proc. Natl. Acad. Sci. U.S.A. 109, 10340–10345 (2012).
-
D. De Juan, F. Pazos, A. Valencia, Emerging methods in protein co-evolution. Nat. Rev. Genet. 14, 249–261 (2013).
-
A. Davtyan et al., AWSEM-MD: Protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J. Phys. Chem. B 116, 8494–8503 (2012).
-
J. D. Bryngelson, P. G. Wolynes, Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. U.S.A. 84, 7524–7528 (1987).
-
J. D. Bryngelson, J. N. Onuchic, N. D. Socci, P. G. Wolynes, Funnels, pathways, and the energy landscape of protein folding: A synthesis. Prot.: Struct. Funct. Bioinf. 21, 167–195 (1995).
-
S. Yang et al., Domain swapping is a consequence of minimal frustration. Proc. Natl. Acad. Sci. U.S.A. 101, 13786–13791 (2004).
-
R. D. Hills Jr, C. L. Brooks III, Insights from coarse-grained g ̄o models for protein folding and dynamics. Int. J. Mol. Sci. 10, 889–905 (2009).
-
S. Tripathi, D. A. Kessler, H. Levine, Biological networks regulating cell fate choice are minimally frustrated. Phys. Rev. Lett. 125, 088101 (2020).
-
K. M. Ruff, R. V. Pappu, Alphafold and implications for intrinsically disordered proteins. J. Mol. Biol. 433, 167208 (2021).
-
M. Di Pierro, B. Zhang, E. L. Aiden, P. G. Wolynes, J. N. Onuchic, Transferable model for chromosome architecture. Proc. Natl. Acad. Sci. U.S.A. 113, 12168–12173 (2016).
-
M. A. Marti-Renom, L. A. Mirny, Bridging the resolution gap in structural modeling of 3D genome organization. PLoS Comput. Biol. 7, e1002125 (2011).
-
U. Lupo, D. Sgarbossa, A. F. Bitbol, Pairing interacting protein sequences using masked language modeling. Proc. Natl. Acad. Sci. U.S.A. 121, e2311887121 (2024).
-
B. Meynard-Piganeau, C. Feinauer, M. Weigt, A. M. Walczak, T. Mora, Tulip-a transformer based unsupervised language model for interacting peptides and T-cell receptors that generalizes to unseen epitopes. bioRxiv [Preprint] (2023). https://www.biorxiv.org/content/10.1101/2023.07.19.549669v1 (Accessed 10 January 2024).
-
B. P. Kwee et al., STAPLER: Efficient learning of TCR-peptide specificity prediction from full-length TCR-peptide data. bioRxiv [Preprint] (2023). https://www.biorxiv.org/content/10.1101/2023.04.25.538237v1 (Accessed 10 January 2024).
-
A. T. Wang et al., RACER-m leverages structural features for sparse T cell specificity prediction. bioRxiv [Preprint] (2023). https://www.biorxiv.org/content/10.1101/2023.08.06.552190v1 (Accessed 3 January 2024).
-
B. A. Camley, W. J. Rappel, Physical models of collective cell motility: From cell to tissue. J. Phys. D: Appl. Phys. 50, 113002 (2017).
-
M. Basan, J. Elgeti, E. Hannezo, W. J. Rappel, H. Levine, Alignment of cellular motility forces with tissue flow as a mechanism for efficient wound healing. Proc. Natl. Acad. Sci. U.S.A. 110, 2452–2459 (2013).
-
V. Hakim, P. Silberzan, Collective cell migration: A physics perspective. Rep. Progr. Phys. 80, 076601 (2017).
-
J. LaChance, K. Suh, J. Clausen, D. J. Cohen, Learning the rules of collective cell migration using deep attention networks. PLoS Comput. Biol. 18, e1009293 (2022).
-
S. U. Hirway, S. H. Weinberg, A review of computational modeling, machine learning and image analysis in cancer metastasis dynamics. Comput. Syst. Oncol. 3, e1044 (2023).
-
S. Al-Janabi, A. Huisman, P. J. Van Diest, Digital pathology: Current status and future perspectives. Histopathology 61, 1–9 (2012).
-
D. B. Brückner et al., Stochastic nonlinear dynamics of confined cell migration in two-state systems. Nat. Phys. 15, 595–601 (2019).
-
R. Yu, R. Wang, Learning dynamical systems from data: An introduction to physics-guided deep learning. Proc. Natl. Acad. Sci. U.S.A. 121, e2311808121 (2024).
-
D. Kochkov et al., Machine learning-accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. U.S.A. 118, e2101784118 (2021).
-
E. M. King, C. X. Du, Q.-Z. Zhu, S. S. Schoenholz, M. P. Brenner, Programming patchy particles for materials assembly design. Proc. Natl. Acad. Sci. U.S.A. 121, e2311891121 (2024).
-
R. E. Wengert, A simple automatic derivative evaluation program. Commun. ACM 7, 463–464 (1964).
-
F. Ruehle, Data science applications to string theory. Phys. Rep. 839, 1–117 (2020).
-
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436 EP (2015).
-
I. Goodfellow, A. Courville, Y. Bengio, Deep Learning (MIT Press, 2016), vol. 1.
-
K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778.
-
Y. Wu et al., Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv [Preprint] (2016). http://arxiv.org/abs/1609.08144 (Accessed 3 January 2024).
-
D. Silver et al., Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).
-
W. Mcculloch, W. Pitts, A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophys. 5, 127–147 (1943).
-
D. J. Amit, H. Gutfreund, H. Sompolinsky, Spin-glass models of neural networks. Phys. Rev. A 32, 1007 (1985).
-
F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).
-
P. W. Anderson, More is different. Science 177, 393–396 (1972).
-
S. Ambrose, M. Bridges, M. Lovett, How Learning Works: 7 Research-Based Principles for Smart Teaching (John Wiley and Sons, San Francisco, 2010).
-
H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951).
-
L. Bottou, “Large-scale machine learning with stochastic gradient descent” in Proceedings of COMPSTAT 2010, Y. Lechevallier, G. Saporta Eds. (Physica-Verlag HD, Heidelberg, 2010), pp. 177–186.
-
P. Chaudhari, S. Soatto, “Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks” in 2018 Information Theory and Applications Workshop (ITA) (2018). http://dx.doi.org/10.1109/ita.2018.8503224.
-
Y. Feng, Y. Tu, The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima. Proc. Natl. Acad. Sci. U.S.A. 118 (2021).
-
N. Yang, C. Tang, Y. Tu, Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions. Phys. Rev. Lett. 130, 237101 (2023).
-
G. E. Hinton, D. van Camp, “Keeping the neural networks simple by minimizing the description length of the weights” in Proceedings of the Sixth Annual Conference on Computational Learning Theory, COLT 1993 (ACM, New York, NY, USA, 1993), pp. 5–13.
-
S. Hochreiter, J. Schmidhuber, Flat minima. Neural Comput. 9, 1–42 (1997).
-
C. Baldassi et al., Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc. Natl. Acad. Sci. U.S.A. 113, E7655–E7662 (2016).
-
P. Chaudhari et al., Entropy-SGD: Biasing Gradient Descent into Wide Valleys (ICLR, 2017).
-
Y. Zhang, A. M. Saxe, M. S. Advani, A. A. Lee, Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning. Mol. Phys. 116, 3214–3223 (2018).
-
S. Mei, A. Montanari, P. M. Nguyen, A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. U.S.A. 115, E7665–E7671 (2018).
-
C. Baldassi, F. Pittorino, R. Zecchina, Shaping the learning landscape in neural networks around wide flat minima. Proc. Natl. Acad. Sci. U.S.A. 117, 161–170 (2020).
-
I. Goodfellow et al., “Generative adversarial nets” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Weinberger, Eds. (Curran Associates, Inc., 2014), vol. 27.
-
S. Durr, Y. Mroueh, Y. Tu, S. Wang, Effective dynamics of generative adversarial networks. Phys. Rev. X 13, 041004 (2023).
-
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics” in Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, F. Bach, D. Blei, Eds. (PMLR, Lille, France, 2015), vol. 37, pp. 2256–2265.
-
D. Ghioa, Y. Dandi, F. Krzakala, L. Zdeborova, Sampling with flows, diffusion and autoregressive neural networks from a spin-glass perspective. Proc. Natl. Acad. Sci. U.S.A. 121, e2311810121 (2024).
-
K. Dill, J. Maccallum, The Protein-Folding Problem, 50 Years on (Science New York, N.Y., 2012), vol. 338, pp. 1042–1046.
-
Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, S. Bengio, Fantastic generalization measures and where to find them. ICLR (2020).
-
N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy , P. T. P. Tang , On large-batch training for deep learning: Generalization gap and sharp minima. ICLR (2017).
-
L. Dinh, R. Pascanu, S. Bengio, Y. Bengio, “Sharp minima can generalize for deep nets” in Proceedings of 34th International Conference Machine Learning (2017), vol. 70, pp. 1019–1028.
-
Y. Feng, W. Zhang, Y. Tu, Activity-weight duality in feed-forward neural networks reveals two co-determinants for generalization. Nat. Mach. Intell. 5, 908–918 (2023).
-
C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization. ICLR (2017).
-
M. Belkin, D. Hsu, S. Ma, S. Mandal, Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl. Acad. Sci. U.S.A. 116, 15849–15854 (2019).
-
A. Brutzkus, A. Globerson, E. Malach , S. Shalev-Shwartz , SGD learns over-parameterized networks that provably generalize on linearly separable data. ICLR (2018).
-
Y. Li, Y. Liang, Learning overparameterized neural networks via stochastic gradient descent on structured data. Adv. Neural Inf. Process. Syst. 31, 8157–8166 (2018).
-
Z. Allen-Zhu, Y. Li, Z. Song, “A convergence theory for deep learning via over-parameterization” in International Conference Machine Learning (2019), pp. 242–252.
-
A. Jacot, F. Gabriel, C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks. Adv. Neural Inf. Process. Syst. 31, 8571–8580 (2018).
-
M. Geiger et al., Scaling description of generalization with number of parameters in deep learning. J. Stat. Mech.: Theory Exp. 2020, 023401 (2020).
-
S. Mei, A. Montanari, The generalization error of random features regression: Precise asymptotics and the double descent curve. Commun. Pure Appl. Math. 75, 667–766 (2022).
-
F. Gerace, B. Loureiro, F. Krzakala, M. Mézard, L. Zdeborová, Generalisation error in learning with random features and the hidden manifold model (ICML, 2020), pp. 3452–3462.
-
Y. Bahri, E. Dyer, J. Kaplan, J. Lee, U. Sharma, Explaining neural scaling laws. Proc. Natl. Acad. Sci. 121, e2311878121 (2024).
-
Q. Li, B. Sorscher, H. Sompolinsky, Representations and generalization in artificial and brain neural networks. Proc. Natl. Acad. Sci. U.S.A. 121, e2311805121 (2024).
-
J. Moore et al., The neuron as a direct data-driven controller. Proc. Natl. Acad. Sci. U.S.A. 2023–11893 (2024).