Improved Equivarient Reg tech

Strategies:

Cross attention: empirically a bust. Not sure why

attention on patches: Huge performance boost, huge improvement in training time

relaxations between displacement prediction layers: also great, helps training time, helps add more layers to network stably.

new record for rotation equivariant mnist training time- was ~30 mins last week, now is ~30 secs. Most important is relaxation layers

Allows removal of Diffusion Regularized pretraining- bless any reduction in training complexity. Great sign.

Now train GradICON end to end.