✨ Orthogonium: Improved implementations of orthogonal layers ¶

This library aims to centralize, standardize and improve methods to build orthogonal layers, with a focus on convolutional layers . We noticed that a layer's implementation play a significant role in the final performance : a more efficient implementation allows larger networks and more training steps within the same compute budget. So our implementation differs from original papers in order to be faster, to consume less memory or be more flexible.

📃 What is included in this library ?¶

Layer name	Description	Orthogonal ?	Usage	Status
AOC (Adaptive-BCOP)	The most scalable method to build orthogonal convolution. Allows control of kernel size, stride, groups dilation and convtranspose	Orthogonal	A flexible method for complex architectures. Preserve orthogonality and works on large scale images.	done
Adaptive-SC-Fac	Same as previous layer but based on SC-Fac instead of BCOP, which claims a complete parametrization of separable convolutions	Orthogonal	Same as above	pending
Adaptive-SOC	SOC modified to be: i) faster and memory efficient ii) handle stride, groups, dilation & convtranspose	Orthogonal	Good for depthwise convolutions and cases where control over the kernel size is not required	in progress
SLL	The original SLL layer, which is already quite efficient.	1-Lipschitz	Well suited for residual blocks, it also contains ReLU activations.	done
SLL-AOC	SLL-AOC is to the downsampling block what SLL is to the residual block (see ResNet paper)	1-Lipschitz	Allows to construct a "strided" residual block than can change the number of channels. It adds a convolution in the residual path.	done
Sandwish-AOC	Sandwish convolutions that uses AOC to replace the FFT. Allowing it to scale to large images.	1-Lipschitz		pending
Adaptive-ECO	ECO modified to i) handle stride, groups & convtranspose	Orthogonal		(low priority)

directory structure¶

orthogonium
├── layers
│   ├── conv
│   │   ├── AOC
│   │   │   ├── ortho_conv.py # contains AdaptiveOrthoConv2d layer
│   │   ├── AdaptiveSOC
│   │   │   ├── ortho_conv.py # contains AdaptiveSOCConv2d layer (untested)
│   │   ├── SLL
│   │   │   ├── sll_layer.py # contains SDPBasedLipschitzConv, SDPBasedLipschitzDense, SLLxAOCLipschitzResBlock
│   ├── legacy
│   │   ├── original code of BCOP, SOC, Cayley etc.
│   ├── linear
│   │   ├── ortho_linear.py # contains OrthoLinear layer (can be used with BB, QR and Exp parametrization)
│   ├── normalization.py # contains Batch centering and Layer centering
│   ├── custom_activations.py # contains custom activations for 1 lipschitz networks
│   ├── channel_shuffle.py # contains channel shuffle layer  
├── model_factory.py # factory function to construct various models for the zoo
├── losses # loss functions, VRA estimation

AOC:¶

AOC is a method that allows to build orthogonal convolutions with an explicit kernel, that support all features like stride, conv transposed, grouped convolutions and dilation (and all compositions of these parameters). This approach is highly scalable, and can be applied to problems like Imagenet-1K.

Adaptive-SC-FAC:¶

As AOC is built on top of BCOP method, we can construct an equivalent method constructed on top of SC-Fac instead. This will allow to compare performance of the two methods given that they have very similar parametrization. (See our paper for discussions about the similarities and differences between the two methods).

Adaptive-SOC:¶

Adaptive-SOC blend the approach of AOC and SOC. It differs from SOC in the way that it is more memory efficient and sometimes faster. It also allows to handle stride, groups, dilation and transposed convolutions. However, it does not allow to control the kernel size explicitly as the resulting kernel size is larger than the requested kernel size. It is due to the computation to the exponential of a kernel that increases the kernel size at each iteration.

Its development is still in progress, so extra testing is still require to ensure exact orthogonality.

SLL:¶

SLL is a method that allows to construct small residual blocks with ReLU activations. We kept most to the original implementation, and added SLLxAOCLipschitzResBlock that construct a down-sampling residual block by fusing SLL with $AOC.

more layers are coming soon !¶

🏠 Install the library:¶

The library is available on pip,so you can install it by running the following command:

pip install orthogonium

If you wish to deep dive in the code and edit your local versin, you can clone the repository and run the following command to install it locally:

git clone git@github.com:deel-ai/orthogonium.git
pip install -e .

Use the layer:¶

from orthogonium.layers.conv.AOC import AdaptiveOrthoConv2d, AdaptiveOrthoConvTranspose2d
from orthogonium.reparametrizers import DEFAULT_ORTHO_PARAMS

# use OrthoConv2d with the same params as torch.nn.Conv2d
kernel_size = 3
conv = AdaptiveOrthoConv2d(
  kernel_size=kernel_size,
  in_channels=256,
  out_channels=256,
  stride=2,
  groups=16,
    dilation=2,
  padding_mode="circular",
    ortho_params=DEFAULT_ORTHO_PARAMS,
)
# conv.weight can be assigned to a torch.nn.Conv2d 

# this works similarly for ConvTranspose2d:
conv_transpose = AdaptiveOrthoConvTranspose2d(
    in_channels=256,
    out_channels=256,
    kernel_size=kernel_size,
    stride=2,
    dilation=2,
    groups=16,
    ortho_params=DEFAULT_ORTHO_PARAMS,
)

🐯 Model Zoo¶

Stay tuned, a model zoo will be available soon !

💥Disclaimer¶

Given the great quality of the original implementations, orthogonium do not focus on reproducing exactly the results of the original papers, but rather on providing a more efficient implementation. Some degradations in the final provable accuracy may be observed when reproducing the results of the original papers, we consider this acceptable is the gain in terms of scalability is worth it. This library aims to provide more scalable and versatile implementations for people who seek to use orthogonal layers in a larger scale setting.

🔭 Ressources¶

1 Lipschitz CNNs and orthogonal CNNs¶

1-Lipschitz Layers Compared: github and paper
BCOP: github and paper
SC-Fac: paper
ECO: paper
Cayley: github and paper
LOT: github and paper
ProjUNN-T: github and paper
SLL: github and paper
Sandwish: github and paper
AOL: github and paper
SOC: github and paper 1, paper 2

Lipschitz constant evaluation¶

📖 Citations¶

If you find this repository useful for your research, please cite:

@misc{boissin2025adaptiveorthogonalconvolutionscheme,
      title={An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures}, 
      author={Thibaut Boissin and Franck Mamalet and Thomas Fel and Agustin Martin Picard and Thomas Massena and Mathieu Serrurier},
      year={2025},
      eprint={2501.07930},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.07930}, 
}

🍻 Contributing¶

This library is still in a very early stage, so expect some bugs and missing features. Also, before the version 1.0.0, the API may change and no backward compatibility will be ensured (code is expected to keep working under minor changes but the loading of parametrized network could fail). This will allow a rapid integration of new features, if you project to release a trained architecture, exporting the convolutions to torch.nn.conv2D is advised (by saving the weight attribute of a layer). If you plan to release a training script, fix the version in your requirements. In order to prioritize the development, we will focus on the most used layers and models. If you have a specific need, please open an issue, and we will try to address it as soon as possible.

Also, if you have a model that you would like to share, please open a PR with the model and the training script. We will be happy to include it in the zoo.

If you want to contribute, please open a PR with the new feature or bug fix. We will review it as soon as possible.

Ongoing developments¶

Layers: - SOC: - remove channels padding to handle ci != co efficiently - enable groups - enable support for native stride, transposition and dilation - AOL: - torch implementation of AOL - Sandwish: - import code - plug AOC into Sandwish conv

ZOO: - models from the paper

✨ Orthogonium: Improved implementations of orthogonal layers¶