🧠🤖

W. Yang , G. Buzsáki. (2024). Interpretability of LLMs Deception: Universal Motif. ICLR (under review). [Paper link] [Github link]


W. Yang , C. Sun, G. Buzsáki. (2024). Interpretability for Safe AI: Jailbreak as a case study (in preparation). [Blog post link]


W. Yang , C. Sun, R. Huszár, T. Hainmueller, K. Kiselev, G. Buzsáki. (2024). Selection of experience for memory by hippocampal sharp wave ripple. Science 383, 1478-1483. [Paper link] [Project website]


I. Zutshi, A. Apostolelli, W. Yang , Z. Zheng, T. Dohi, E. Balzani, A. H. Williams, C. Savin, G. Buzsáki. (2024). Hippocampal neuronal activity is aligned with action plans. Nature (in press). [Preprint link]


C. Sun, W. Yang , T. Jiralerspong, D. Malenfant, B. Alsbury- Nealy, Y. Bengio, B. Richards. (2023). Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL. NeurIPS. [Paper link]


W. Yang , C. Sun, R. Huszár, G. Buzsáki. (2023). Changes in the geometry of hippocampal representations across brain states. Symmetry and Geometry in Neural Representations Workshop NeurIPS. [Paper link]


E. Y. Kimchi , A. Burgos-Robles, G. A. Matthews, T. Chakoma, M. Patarino, J. Weddington, C. A. Siciliano, W. Yang , S. Foutch, R. Simons, M. Fong, M. Jing, Y. Li, D. B. Polley, Kay M. Tye. (2023). Reward contingency gates selective cholinergic suppression of amygdala neurons. eLife. [Paper link]


S. Tennant , I. Hawes, H. Clark, W. Tam, J. Hua, W. Yang , K. Gerlei, E. Wood, M. Nolan. (2022). Analogue representation of a spatial memory by ramp-like neural activity in retrohippocampal cortex. Current Biology. [Paper link]


C. Sun, W. Yang , J. Martin, S. Tonegawa. (2020). Hippocampal neurons represent events as transferable units of experience. Nature Neuroscience. [Paper link] [PDF]