zhengxuanzenwu commited on
Commit
6030781
·
verified ·
1 Parent(s): 52eb696

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,3 +1,34 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+
5
+ # 1. AxBench
6
+
7
+ AxBench evaluates interpretability methods in terms of concept detection and model steering. AxBench releases two supervised dictionary learning methods that outperforms existing methods including SAEs. These dictionaries contain 1D subspaces that map to high-level concepts.
8
+
9
+ # 2. What is `gemma-diffmean-9b-it-res`?
10
+
11
+ - `gemma-`: Refer to Gemma 2 models
12
+ - `diffmean-` : The dictionary learning model is taking the difference in mean between two contrastive groups.
13
+ - `9b-it-`: The dictionary is for Gemma 2 9B instruction-tuning model
14
+ - `res` : The dictionary is trained on the model's residual stream.
15
+ - We release the weights as well as the annotated concepts for all subspaces.
16
+
17
+ # 3. How can I use these dictionaries straight away?
18
+
19
+ ```python
20
+ import pyvene as pv
21
+
22
+ ```
23
+
24
+ # 4. Point of Contact
25
+
26
+ Point of contact: Zhengxuan Wu or Aryaman Arora
27
+
28
+ Contact by email:
29
+
30
+ {wuzhengx, aryamana}@stanford.edu
31
+
32
+ # 5. Citation
33
+
34
+ Paper: