Project Sprint: Transformer from scratch

Tasks:
- Pick which architecture I want to replicate (e.g. decoder-only)
  - Find a good paper/blog which implements a Transformer (e.g. Neel Nanda’s, or Alan Cooney’s repo; or the ARENA section on Transformers)
- Create public repo for the build
- Plan the project, including task breakdown (GH issues)
  - Use existing art for the project-planning phase
- Build the Transformer in PyTorch
  - Make notes as I go, including mathematical derivations (possibly for blog post; though lots of people have done this)
  - Write tests (possibly do TDD)
  - [optional] Ensure it’s runnable on an M1 MacBook
    - Maybe just choose small hyperparams for the beginning
    - Run in a container
Extensions:
- Write a blog post about the project
  - What I’ve learned
  - What I found hard
  - What made intuitive sense
  - etc.
- Replicate some safety-relevant results
  - Use visualisations to inspect the model
    - CircuitsVis/TransformerLens
    - Feature visualisation (sparse autoencoder, dictionary learning)
What are you going to try to deliver at the end of the project?
- A Transformer I’ve built from scratch in PyTorch
If you deliver this, how well does it actually achieve the goals you set out above?
- Goals are:
  - gain familiarity with at least one ML framework
  - gain familiarity with implementing a safety-relevant model architecture
  - dive deep on one topic, answering emergent questions as they arise (don’t know what I don’t know about implementation yet!)
- Achieves all of these, I think
Are there any other ideas that might actually achieve your goals better? Set a 5 minute timer to think this through.
- Implement a simple MLP architecture
  - Decided against this — need MLP layers in a Transformer, so will still gain this experience (Transformer seems manageable and more efficient to learn)
How might you achieve this deliverable in the time that you have? Are there smaller milestones you could use to check you'll be on track, or other ways to break down the task into smaller pieces?
- Will see this when I project plan — there is a lot of prior art, so I can draw on this if I get stuck
- Simplest deliverable is an MLP — maybe start with this, and re-use it in the Transformer?
Imagine the project hasn't worked out. What do you think is mostly likely to have gone wrong? How could you have prevented this, or how could you at least reduce your uncertainty about this risk? Repeat this a few times, maybe considering:
- How you'll make sure you spend sufficient time on the project
  - Allocate 2 evenings and half a weekend to project work
- What other resources you might need, like compute or access to models
  - Nothing up front
- Uncertainties about how difficult technical work might be, or how software libraries work
  - I do have uncertainties here, but there’s prior art to learn from
- Whether you'll have version control and backups of your work
  - GitHub
How would you want your cohort group to support you? Would you like a kind of accountability mechanism from the group?
- If anyone is doing something similar, that would be great to know — we could help with technical issues etc.

Project Sprint: Transformer from scratch

Other project ideas

Investigate auxiliary confidence loss

Agentic systems investigation

How easy is it to replicate OS fine-tuning results?