š§ š¤

Wannan (Winnie) Yang
Ā
Welcome to My Corner of the Web! š
Iām Winnie, a PhD candidate at New York University (NYU)ās Buzsakilab, where I decode the content of ādreamā (replay) to unravel the mechanism of memory consolidation during sleep (my work was recently published in Science š„³).
From Neuroscience to AI
In early 2024 I was beginning my transition from Neuroscience to AI. By then, progress in AI for several years had already exploded in a myriad of promise and hope. But what struck me in 2024 was that the rate of progress was continuing unabated, and my feelings of open-ended benefit for humanity turned to worries for our extinction as we barrel unprepared toward the direction of ASI. My main motivation is that in order for humanity to benefit from the plethora of benefits that AI brings, we must make sure the system is aligned, and we must build such safe systems NOW. This is a goal which I am wholly committed as my lifeās work.
Within the broad goal of helping superalignment, my current research interests revolve around building automated alignment research assistants ā perhaps by better leveraging interpretability tools to monitor their internals. One area, for instance, that I am currently excited about to is to monitor/evaluate the trustworthiness / deception of both alignment assistant (supervisor) and the supervisee via interpretability tools. Delving into the black box is within my existing expertise as a neuroscientist (expertise in āinternal oversightā of the brain) as well as my recent research experience in LLM deception.
Current Focus
Iāve started the exciting journey of transitioning from neuroscience to AI research. My current interests include:
- Monitoring and steering LLMs to be honest
- Scalable Oversight
- Adversarial robustness