Arjun Yadav

This page serves as my machine learning/AI safety notebook, I hope for this page to serve as a more unstructured knowledge hub of all that I have discovered in ML/AIS!


What I'm Learning

  • Feb-March 2024: Re-building a large language model (such as GPT-2) and self-stidying LLMs more generally during this time
  • Jan 2024: Final month of interpretability self-studying (shifting focus to more recent work)
  • Nov-Dec 2023: Applications + Interpretability (focus on Neel Nanda's work and lectures)

What I Want To Learn


  • Sparse Coding

Evaluation and Analysis

  • Independent Component Analysis


  • How Gato by DeepMind truly works

What I've learnt thus far


  • Variational Autoencoders


  • UAE's AIS Space (or at least the most recent parts of it)


  • Principal Component Analysis
  • The underlying principle behind transformers


Nov 2023

  • Applied to the Inspirit AI Program for Summer 2024 (July 15 - July 26, 5-7:30 PT)!
  • Applied to OpenAI's Red Teaming Network!
  • Applied to GovAI's blog editor post!
  • Applied to AGISF 2024!
  • Applied to AI Safety Camp 2024!

Summer 2023

  • Taught high school seniors about machine learning and AI safety!
  • Worked with my brother on variational auto-encoders and mechanistic interpretability while he was setting up BAISC.
  • Research Assistant Tenureship - got to learn a lot about transformers!

Papers + Notes (if open access)

Mechanistic Interpretability

  • Variational Sparse Coding:

    The Problem: Unsupervised discovery of interpretable features and controllable generation with high-dimensional data are currently major challenges in machine learning.

    Proposal: A model based on variational auto-encoders (VAEs) in which interpretation is induced through latent space sparsity with a mixture of Spike and Slab distributions as prior.

Practical Applications of Transformers

Multi-headed Attention and Transformers

  • Vision Transformers - Origin:

    Background: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place.

    Purpose: How can we use a transformer more solely to achieve great results in CV?

    Enter, ViT - obtaining results similar to CNNs by slightly tweaking the original transformer model presented in "Attention Is All You Need" and pre-training on large datasets and later fine-tuning to suit smaller, downstream tasks.

    An important aspect of this paper was inspecting how the attention can be visualized and how the positional embeddings appear once visualized.

  • EfficientFormer: Vision Transformers at MobileNet:

    Purpose: Can transformers (specifically Vision Transformers - ViT) run as fast as MobileNet while obtaining high performance?

    Introduction: Dosovitskiy et al. adapt the attention mechanism to 2D images and propose Vision Transformer (ViT): the input image is divided into non-overlapping patches, and the inter-patch representations are learned through MHSA (multi-head self-attention) without inductive bias. However, many bottlenecks exist when using transformers for computer vision applications - and this is the focus of the paper: using latency analysis to see what's bottlenecking their performance and hence, deliver a model that addresses these issues.

    Observation 1: Patch embedding with large kernel and stride is a speed bottleneck on mobile devices.

    Observation 2: Consistent feature dimension is important for the choice of token mixer. MHSA is not necessarily a speed bottleneck.

    Observation 3: CONV-BN is more latency-favorable than LN (GN)-Linear and the accuracy drawback is generally acceptable.

    Observation 4: The latency of nonlinearity is hardware and compiler dependent.

    The rest of the paper is the implementation (and hairy) mathematics behind the EfficientFormer in PyTorch 1.11 using the Timm library that addresses each of the bottlenecks. This is done mainly through latency driven slimming for their supernet by using a MetaPath and a different Softmax (Gumbel Softmax) implementatoin for their searching when it comes to get the importance score ofr th blocks within each MP for (I believe) the self-attention aspect of the model.

  • Attention is all you need:


Terms and Concepts

(in a rough decreasing order of "high-levelness", these terms tend to get updated as I learn more!)

  • Monosemanticity: The fact that some neurons only do one thing, making them easier to interpret. Check this out!

  • Pareto frontier: In mathematics, it's just a set of solutions that represents the best trade-off between all the objective functions. Need to learn more about this 'Pareto' guy.

  • Multimodal interface - "Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data."

  • Risk Awareness Moment: Risk Awareness Moments are more retroactive in nature: they are moments in our history where major national and/or international bodies come together to get their head straight on a non-partisan issue (e.g.: the ozone layer hole of Antarctica) (this article does a great job explaining RAMs for AI).

  • Mechanistic Interpretability: Essentially, it's a series of techniques that one can employ to try to reverse engineer a neural network: the inner workings of a neural network are something that, on paper, may seem decipherable: but are typically obfuscated by a lot of factors (the main one being dimensionality) (Anthropic)

  • AI Plans and Strategies: A collection of compilations of AI plans and strategies for alignment: 1, 2

  • The AI Pause Debate:

AI Pause Debate

  • Generative Adversarial Network: A GAN consists of both an encoder and decoder that essentially battle it out for the encoder to generate more convincing fake images. See an example here.

  • Sparse Coding: A technique to set an encoding function in such a way that it can exploit a high-dimensional space to model a large number of possible features, while being encouraged to use a small subset of non-zero elements to describe each individual observation.

  • Variational auto-encoder (VAE): A VAE is an autoencoder whose encodings distribution is regularised during the training in order to ensure that its latent space has "good" properties allowing us to generate some new data (in a sense, visualize it in a better manner).

    It does this via having a layer that splits the encoder's result into mean and standard deviation, and a latent vector with a separate episilon value for backpropagation to happen properly.

    If you wish to see a VAE in action, check out this repository.

    Credit for below image.


  • Auto-encoder: An autoencoder is a type of convolutional neural network (CNN) that converts a high-dimensional input into a low-dimensional one (i.e. a latent vector), and later reconstructs the original input with the highest quality possible. It consists of both an encoder and decoder. An example of its use is removing noise from a dataset (Paperspace Blog).

People I've Met in AI Safety

(anonymized - of course)

Meeting with an eval researcher

  • Incredibly useful and provided a lot of clarity, couldn't have asked for a better meeting from the Slack message I had posted in an AI safety workspace.

Meeting with a (former) Berkeley Ph.D. student

  • "Honesty is everything" is what I remember most vividly, I wonder if they still hold that opinion...

Related Projects and Posts