1 code implementation • 31 May 2024 • Martina Contisciani, Marius Hobbhahn, Eleanor A. Power, Philipp Hennig, Caterina De Bacco
In this paper, we develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information.
1 code implementation • 17 May 2024 • Lucius Bushnaq, Stefan Heimersheim, Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius Hobbhahn
We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB).
no code implementations • 17 May 2024 • Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn
We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions.
no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell
External audits of AI systems are increasingly recognized as a key mechanism for AI governance.
1 code implementation • 9 Nov 2023 • Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn
We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so.
1 code implementation • 26 Oct 2022 • Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn
We investigate the potential constraints on LLM scaling posed by the availability of public human-generated text data.
no code implementations • 5 Jul 2022 • Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, Marius Hobbhahn
From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude.
1 code implementation • 11 Feb 2022 • Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn, Pablo Villalobos
Since the advent of Deep Learning in the early 2010s, the scaling of training compute has accelerated, doubling approximately every 6 months.
1 code implementation • 7 May 2021 • Marius Hobbhahn, Philipp Hennig
The method can be thought of as a pre-processing step which can be implemented in <5 lines of code and runs in less than a second.
1 code implementation • 2 Mar 2020 • Marius Hobbhahn, Agustinus Kristiadi, Philipp Hennig
We reconsider old work (Laplace Bridge) to construct a Dirichlet approximation of this softmax output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions (the conjugate prior to the Categorical distribution) in the output space.