TokMem: packing a whole procedure into a single token
Compiling each reusable task procedure into a single trainable token, with the language model kept frozen.
Notes on my research in NLP and machine learning, and the occasional aside.
Compiling each reusable task procedure into a single trainable token, with the language model kept frozen.
Tuning prompts in an ultra-low-dimensional space and letting a frozen random matrix do the rest, for ~98% fewer trainable parameters.
Describing a soft prompt by what it's near, so its task semantics carry across different language models.