
β¨ Orthogonium: Improved implementations of orthogonal layers¶
This library aims to centralize, standardize and improve methods to build orthogonal layers, with a focus on convolutional layers . We noticed that a layer's implementation play a significant role in the final performance : a more efficient implementation allows larger networks and more training steps within the same compute budget. So our implementation differs from original papers in order to be faster, to consume less memory or be more flexible.
π What is included in this library ?¶
Layer name | Description | Orthogonal ? | Usage | Status |
---|---|---|---|---|
AOC (Adaptive-BCOP) | The most scalable method to build orthogonal convolution. Allows control of kernel size, stride, groups dilation and convtranspose | Orthogonal | A flexible method for complex architectures. Preserve orthogonality and works on large scale images. | done |
Adaptive-SC-Fac | Same as previous layer but based on SC-Fac instead of BCOP, which claims a complete parametrization of separable convolutions | Orthogonal | Same as above | pending |
Adaptive-SOC | SOC modified to be: i) faster and memory efficient ii) handle stride, groups, dilation & convtranspose | Orthogonal | Good for depthwise convolutions and cases where control over the kernel size is not required | in progress |
SLL | The original SLL layer, which is already quite efficient. | 1-Lipschitz | Well suited for residual blocks, it also contains ReLU activations. | done |
SLL-AOC | SLL-AOC is to the downsampling block what SLL is to the residual block (see ResNet paper) | 1-Lipschitz | Allows to construct a "strided" residual block than can change the number of channels. It adds a convolution in the residual path. | done |
Sandwish-AOC | Sandwish convolutions that uses AOC to replace the FFT. Allowing it to scale to large images. | 1-Lipschitz | pending | |
Adaptive-ECO | ECO modified to i) handle stride, groups & convtranspose | Orthogonal | (low priority) |
directory structure¶
orthogonium
βββ layers
β βββ conv
β β βββ AOC
β β β βββ ortho_conv.py # contains AdaptiveOrthoConv2d layer
β β βββ AdaptiveSOC
β β β βββ ortho_conv.py # contains AdaptiveSOCConv2d layer (untested)
β β βββ SLL
β β β βββ sll_layer.py # contains SDPBasedLipschitzConv, SDPBasedLipschitzDense, SLLxAOCLipschitzResBlock
β βββ legacy
β β βββ original code of BCOP, SOC, Cayley etc.
β βββ linear
β β βββ ortho_linear.py # contains OrthoLinear layer (can be used with BB, QR and Exp parametrization)
β βββ normalization.py # contains Batch centering and Layer centering
β βββ custom_activations.py # contains custom activations for 1 lipschitz networks
β βββ channel_shuffle.py # contains channel shuffle layer
βββ model_factory.py # factory function to construct various models for the zoo
βββ losses # loss functions, VRA estimation
AOC:¶
AOC is a method that allows to build orthogonal convolutions with an explicit kernel, that support all features like stride, conv transposed, grouped convolutions and dilation (and all compositions of these parameters). This approach is highly scalable, and can be applied to problems like Imagenet-1K.
Adaptive-SC-FAC:¶
As AOC is built on top of BCOP method, we can construct an equivalent method constructed on top of SC-Fac instead. This will allow to compare performance of the two methods given that they have very similar parametrization. (See our paper for discussions about the similarities and differences between the two methods).
Adaptive-SOC:¶
Adaptive-SOC blend the approach of AOC and SOC. It differs from SOC in the way that it is more memory efficient and sometimes faster. It also allows to handle stride, groups, dilation and transposed convolutions. However, it does not allow to control the kernel size explicitly as the resulting kernel size is larger than the requested kernel size. It is due to the computation to the exponential of a kernel that increases the kernel size at each iteration.
Its development is still in progress, so extra testing is still require to ensure exact orthogonality.
SLL:¶
SLL is a method that allows to construct small residual blocks with ReLU activations. We kept most to the original
implementation, and added SLLxAOCLipschitzResBlock
that construct a down-sampling residual block by fusing SLL with
$AOC.
more layers are coming soon !¶
π Install the library:¶
The library is available on pip,so you can install it by running the following command:
pip install orthogonium
If you wish to deep dive in the code and edit your local versin, you can clone the repository and run the following command to install it locally:
git clone git@github.com:deel-ai/orthogonium.git
pip install -e .
Use the layer:¶
from orthogonium.layers.conv.AOC import AdaptiveOrthoConv2d, AdaptiveOrthoConvTranspose2d
from orthogonium.reparametrizers import DEFAULT_ORTHO_PARAMS
# use OrthoConv2d with the same params as torch.nn.Conv2d
kernel_size = 3
conv = AdaptiveOrthoConv2d(
kernel_size=kernel_size,
in_channels=256,
out_channels=256,
stride=2,
groups=16,
dilation=2,
padding_mode="circular",
ortho_params=DEFAULT_ORTHO_PARAMS,
)
# conv.weight can be assigned to a torch.nn.Conv2d
# this works similarly for ConvTranspose2d:
conv_transpose = AdaptiveOrthoConvTranspose2d(
in_channels=256,
out_channels=256,
kernel_size=kernel_size,
stride=2,
dilation=2,
groups=16,
ortho_params=DEFAULT_ORTHO_PARAMS,
)
π― Model Zoo¶
Stay tuned, a model zoo will be available soon !
π₯Disclaimer¶
Given the great quality of the original implementations, orthogonium do not focus on reproducing exactly the results of the original papers, but rather on providing a more efficient implementation. Some degradations in the final provable accuracy may be observed when reproducing the results of the original papers, we consider this acceptable is the gain in terms of scalability is worth it. This library aims to provide more scalable and versatile implementations for people who seek to use orthogonal layers in a larger scale setting.
π Ressources¶
1 Lipschitz CNNs and orthogonal CNNs¶
- 1-Lipschitz Layers Compared: github and paper
- BCOP: github and paper
- SC-Fac: paper
- ECO: paper
- Cayley: github and paper
- LOT: github and paper
- ProjUNN-T: github and paper
- SLL: github and paper
- Sandwish: github and paper
- AOL: github and paper
- SOC: github and paper 1, paper 2
Lipschitz constant evaluation¶
- Spectral Norm of Convolutional Layers with Circular and Zero Paddings
- Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration
- github of the two papers
π Citations¶
If you find this repository useful for your research, please cite:
@misc{boissin2025adaptiveorthogonalconvolutionscheme,
title={An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures},
author={Thibaut Boissin and Franck Mamalet and Thomas Fel and Agustin Martin Picard and Thomas Massena and Mathieu Serrurier},
year={2025},
eprint={2501.07930},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2501.07930},
}
π» Contributing¶
This library is still in a very early stage, so expect some bugs and missing features. Also, before the version 1.0.0,
the API may change and no backward compatibility will be ensured (code is expected to keep working under minor changes
but the loading of parametrized network could fail). This will allow a rapid integration of new features, if you project
to release a trained architecture, exporting the convolutions to torch.nn.conv2D is advised (by saving the weight
attribute of a layer). If you plan to release a training script, fix the version in your requirements.
In order to prioritize the development, we will focus on the most used layers and models. If you have a specific need,
please open an issue, and we will try to address it as soon as possible.
Also, if you have a model that you would like to share, please open a PR with the model and the training script. We will be happy to include it in the zoo.
If you want to contribute, please open a PR with the new feature or bug fix. We will review it as soon as possible.
Ongoing developments¶
Layers: - SOC: - remove channels padding to handle ci != co efficiently - enable groups - enable support for native stride, transposition and dilation - AOL: - torch implementation of AOL - Sandwish: - import code - plug AOC into Sandwish conv
ZOO: - models from the paper