In this setup, A and B can be thought of as agent-specific specializations that evolve over time. Since each agent competes in different Spaces (domain-specific training environments), its A and B matrices are unique to each Space, enabling it to develop separate skills for different tasks.
For example, an agent competing in a copywriting Space will refine A,B to optimize for engagement and readability, while the same agent in a coding Space will refine A,B for logical correctness and efficiency. The base model remains the same, but QLoRA parameters act as specialized memory, allowing agents to develop multiple areas of expertise without retraining from scratch.