convolutions

`AdaptiveOrthoConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

Factory function to create an orthogonal convolutional layer, selecting the appropriate class based on kernel size and stride. This is the implementation for the Adaptive Orthogonal Convolution scheme [1]. It aims to be scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers. This layer also intend to be compatible with all the feature of the nn.Conv2d class (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

Key Features:¶

- Enforces orthogonality, preserving gradient norms.
- Supports native striding, dilation, grouped convolutions, and flexible padding.

Behavior:¶

- When kernel_size == stride, the layer is an `RKOConv2d`.
- When stride == 1, the layer is a `FastBlockConv2d`.
- Otherwise, the layer is a `BcopRkoConv2d`.

Note

This implementation also work under zero padding, it lipschitz constant is still tight, but it looses orthogonality.orthogonality on the image border.
the unit tesing validated for a tolerance of 1e-4 under various orthogonalization schemes (see reparametrizers). Only Cholesky based methods were validated for a lower tolerance of 5e-2.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`_size_2_t`	Size of the convolution kernel.	required
`stride`	`_size_2_t`	Stride of the convolution. Default is 1.	`1`
`padding`	`str or _size_2_t`	Padding mode or size. Default is "same".	`'same'`
`dilation`	`_size_2_t`	Dilation rate. Default is 1.	`1`
`groups`	`int`	Number of blocked connections from input to output channels. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`padding_mode`	`str`	Padding mode. Default is "circular".	`'circular'`
`ortho_params`	`OrthoParams`	Parameters to control orthogonality. Default is `OrthoParams()`.	`OrthoParams()`

Returns:

Type	Description
`Conv2d`	A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

Raises:

Type	Description
`ValueError`	If kernel_size < stride, as orthogonality cannot be enforced.

References

[1] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\AOC\ortho_conv.py

def AdaptiveOrthoConv2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: Union[str, _size_2_t] = "same",
    dilation: _size_2_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "circular",
    ortho_params: OrthoParams = OrthoParams(),
) -> nn.Conv2d:
    """
    Factory function to create an orthogonal convolutional layer, selecting the appropriate class based on kernel
    size and stride. This is the implementation for the `Adaptive Orthogonal Convolution` scheme [1]. It aims to be
    scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers.
    This layer also intend to be compatible with all the feature of the `nn.Conv2d` class (e.g., striding, dilation,
    grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a
    standard convolutional layer, but the weight are constrained to be orthogonal.

    Key Features:
    -------------
        - Enforces orthogonality, preserving gradient norms.
        - Supports native striding, dilation, grouped convolutions, and flexible padding.

    Behavior:
    ---------
        - When kernel_size == stride, the layer is an `RKOConv2d`.
        - When stride == 1, the layer is a `FastBlockConv2d`.
        - Otherwise, the layer is a `BcopRkoConv2d`.

    Note:
        - This implementation also work under zero padding, it lipschitz constant is still tight, but it looses
            orthogonality.orthogonality on the image border.
        - the unit tesing validated for a tolerance of 1e-4 under various orthogonalization schemes (see
            reparametrizers). Only Cholesky based methods were validated for a lower tolerance of 5e-2.

    Arguments:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (_size_2_t): Size of the convolution kernel.
        stride (_size_2_t, optional): Stride of the convolution. Default is 1.
        padding (str or _size_2_t, optional): Padding mode or size. Default is "same".
        dilation (_size_2_t, optional): Dilation rate. Default is 1.
        groups (int, optional): Number of blocked connections from input to output channels. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        padding_mode (str, optional): Padding mode. Default is "circular".
        ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.

    Returns:
        A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

    Raises:
        `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.


    References:
        - [1] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """

    if kernel_size < stride:
        raise ValueError(
            "kernel size must be smaller than stride. The set of orthonal convolutions is empty in this setting."
        )
    if kernel_size == stride:
        convclass = RKOConv2d
    elif (stride == 1) or ((in_channels >= out_channels) and (dilation > 1)):
        convclass = FastBlockConv2d
    else:
        convclass = BcopRkoConv2d
    return convclass(
        in_channels=in_channels,
        out_channels=out_channels,
        kernel_size=kernel_size,
        stride=stride,
        padding=padding,
        dilation=dilation,
        groups=groups,
        bias=bias,
        padding_mode=padding_mode,
        ortho_params=ortho_params,
    )

`AdaptiveOrthoConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams())` ¶

Factory function to create an orthogonal transposed convolutional layer, selecting the appropriate class based on kernel size and stride. This is the implementation for the Adaptive Orthogonal Convolution scheme [1]. It aims to be scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers. This layer also intend to be compatible with all the feature of the nn.Conv2d class (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

Key Features:¶

- Ensures orthogonality in transpose convolutions for stable gradient propagation.
- Supports dilation, grouped operations, and efficient kernel construction.

Behavior:¶

- When kernel_size == stride, the layer is an `RkoConvTranspose2d`.
- When stride == 1, the layer is a `FastBlockConvTranspose2D`.
- Otherwise, the layer is a `BcopRkoConvTranspose2d`.

Note

This implementation also work under zero padding, it lipschitz constant is still tight, but it looses orthogonality.orthogonality on the image border.
The current implementation of the torch.nn.ConvTranspose2d does not support circular padding. One can implement padding manually by add a padding layer before and setting padding = (0,0).

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`_size_2_t`	Size of the convolution kernel.	required
`stride`	`_size_2_t`	Stride of the transpose convolution. Default is 1.	`1`
`padding`	`_size_2_t`	Padding size. Default is 0.	`0`
`output_padding`	`_size_2_t`	Additional size for output. Default is 0.	`0`
`groups`	`int`	Number of groups. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`dilation`	`_size_2_t`	Dilation rate. Default is 1.	`1`
`padding_mode`	`str`	Padding mode. Default is "zeros".	`'zeros'`
`ortho_params`	`OrthoParams`	Parameters to control orthogonality. Default is `OrthoParams()`.	`OrthoParams()`

Returns:

Type	Description
`ConvTranspose2d`	A configured instance of `nn.ConvTranspose2d` (one of `RkoConvTranspose2d`, `FastBlockConvTranspose2D`, or `BcopRkoConvTranspose2d`).

Raises: - ValueError: If kernel_size < stride, as orthogonality cannot be enforced.

References

[1] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\AOC\ortho_conv.py

def AdaptiveOrthoConvTranspose2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: _size_2_t = 0,
    output_padding: _size_2_t = 0,
    groups: int = 1,
    bias: bool = True,
    dilation: _size_2_t = 1,
    padding_mode: str = "zeros",
    ortho_params: OrthoParams = OrthoParams(),
) -> nn.ConvTranspose2d:
    """
    Factory function to create an orthogonal transposed convolutional layer, selecting the appropriate class based on kernel
    size and stride. This is the implementation for the `Adaptive Orthogonal Convolution` scheme [1]. It aims to be
    scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers.
    This layer also intend to be compatible with all the feature of the `nn.Conv2d` class (e.g., striding, dilation,
    grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a
    standard convolutional layer, but the weight are constrained to be orthogonal.

    Key Features:
    -------------
        - Ensures orthogonality in transpose convolutions for stable gradient propagation.
        - Supports dilation, grouped operations, and efficient kernel construction.

    Behavior:
    ---------
        - When kernel_size == stride, the layer is an `RkoConvTranspose2d`.
        - When stride == 1, the layer is a `FastBlockConvTranspose2D`.
        - Otherwise, the layer is a `BcopRkoConvTranspose2d`.


    Note:
        - This implementation also work under zero padding, it lipschitz constant is still tight, but it looses
            orthogonality.orthogonality on the image border.
        - The current implementation of the torch.nn.ConvTranspose2d does not support circular padding. One can
            implement padding manually by add a padding layer before and setting padding = (0,0).

    Arguments:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (_size_2_t): Size of the convolution kernel.
        stride (_size_2_t, optional): Stride of the transpose convolution. Default is 1.
        padding (_size_2_t, optional): Padding size. Default is 0.
        output_padding (_size_2_t, optional): Additional size for output. Default is 0.
        groups (int, optional): Number of groups. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        dilation (_size_2_t, optional): Dilation rate. Default is 1.
        padding_mode (str, optional): Padding mode. Default is "zeros".
        ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.

    Returns:
        A configured instance of `nn.ConvTranspose2d` (one of `RkoConvTranspose2d`, `FastBlockConvTranspose2D`, or `BcopRkoConvTranspose2d`).

    **Raises:**
    - `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.


    References:
        - [1] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """

    if kernel_size < stride:
        raise ValueError(
            "kernel size must be smaller than stride. The set of orthonal convolutions is empty in this setting."
        )
    if kernel_size == stride:
        convclass = RkoConvTranspose2d
    elif stride == 1:
        convclass = FastBlockConvTranspose2D
    else:
        convclass = BcopRkoConvTranspose2d
    return convclass(
        in_channels=in_channels,
        out_channels=out_channels,
        kernel_size=kernel_size,
        stride=stride,
        padding=padding,
        output_padding=output_padding,
        groups=groups,
        bias=bias,
        dilation=dilation,
        padding_mode=padding_mode,
        ortho_params=ortho_params,
    )

SSL derived 1-Lipschitz Layers¶

This module implements several 1-Lipschitz residual blocks, inspired by and extending the SDP-based Lipschitz Layers (SLL) from [1]. Specifically:

SDPBasedLipschitzResBlock
The original version of the 1-Lipschitz convolutional residual block. It enforces Lipschitz constraints by rescaling activation outputs according to an estimate of the operator norm.
SLLxAOCLipschitzResBlock
An extended version of the SLL approach described in [1], combined with additional orthogonal convolutions to handle stride, kernel-size, or channel-dimension changes. It fuses multiple convolutions via the block convolution, thereby preserving the 1-Lipschitz property while enabling strided downsampling or modifying input/output channels.
AOCLipschitzResBlock
A variant of the original Lipschitz block where the core convolution is replaced by an AdaptiveOrthoConv2d. It maintains the 1-Lipschitz property with orthogonal weight parameterization while providing efficient convolution implementations.

References¶

[1] Alexandre Araujo, Aaron J Havens, Blaise Delattre, Alexandre Allauzen, and Bin Hu. A unified alge- braic perspective on lipschitz neural networks. In The Eleventh International Conference on Learning Representations, 2023 [2] Thibaut Boissin, Franck Mamalet, Thomas Fel, Agustin Martin Picard, Thomas Massena, Mathieu Serrurier, An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures

Notes on the SLL approach¶

In [1], the SLL layer for convolutions is a 1-Lipschitz residual operation defined approximately as:

\[ y = x - \mathbf{K}^T \star (t \times \sigma(\mathbf{K} \star x + b)), \]

where \(\mathbf{K}\) represents a toeplitz (convolution) matrix that represent a 1-Lipschitz operator. This is done in practice by computing a normalization vector \(\mathbf{t}\) and rescaling the activation outputs by \(\mathbf{t}\).

By default, the SLL formulation does not allow strides or changes in the number of channels.
To address these issues, SLLxAOCLipschitzResBlock adds extra orthogonal convolutions before and/or after the main SLL operation. These additional convolutions can be merged via block convolution (Proposition 1 in [2]) to maintain 1-Lipschitz behavior while enabling stride and/or channel changes.

When \(\mathbf{K}\), \(\mathbf{K}_{pre}\), and \(\mathbf{K}_{post}\) each correspond to 2×2 convolutions, the resulting block effectively contains two 3×3 convolutions in one branch and a single 4×4 stride-2 convolution in the skip branch—quite similar to typical ResNet blocks.

`AOCLipschitzResBlock` ¶

Bases: Module

Source code in orthogonium\layers\conv\SLL\sll_layer.py

class AOCLipschitzResBlock(nn.Module):
    def __init__(
        self,
        in_channels: int,
        inner_dim_factor: int,
        kernel_size: _size_2_t,
        dilation: _size_2_t = 1,
        groups: int = 1,
        bias: bool = True,
        padding_mode: str = "circular",
        ortho_params: OrthoParams = OrthoParams(),
    ):
        """
        A Lipschitz residual block in which the main convolution is replaced by
        `AdaptiveOrthoConv2d` (AOC). This preserves 1-Lipschitz (or lower) behavior through
        an orthogonal parameterization, without explicitly computing a scaling factor `t`.

        $$
        y = x - \mathbf{K}^T \\star (\sigma(\\mathbf{K} \\star x + b)),
        $$

        **Args**:
          - `in_channels` (int): Number of input channels.
          - `inner_dim_factor` (int): Multiplier for internal representation size.
          - `kernel_size` (_size_2_t): Convolution kernel size.
          - `dilation` (_size_2_t, optional): Default is 1.
          - `groups` (int, optional): Default is 1.
          - `bias` (bool, optional): If True, adds a learnable bias. Default is True.
          - `padding_mode` (str, optional): `'circular'` or `'zeros'`. Default is `'circular'`.
          - `ortho_params` (OrthoParams, optional): Orthogonal parameterization settings. Default is `OrthoParams()`.


        References:
            - [1] Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
            A Unified Algebraic Perspective on Lipschitz Neural Networks.
            In The Eleventh International Conference on Learning Representations.
            <https://arxiv.org/abs/2303.03169>
            - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
            An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
            <https://arxiv.org/abs/2501.07930>
        """
        super().__init__()

        inner_dim = int(in_channels * inner_dim_factor)
        self.activation = nn.ReLU()

        if padding_mode not in ["circular", "zeros"]:
            raise ValueError("padding_mode must be either 'circular' or 'zeros'")
        if padding_mode == "circular":
            self.padding = 0  # will be handled by the padding function
        else:
            self.padding = kernel_size // 2

        self.in_conv = AdaptiveOrthoConv2d(
            in_channels,
            inner_dim,
            kernel_size=kernel_size,
            stride=1,
            padding="same",
            dilation=dilation,
            groups=groups,
            bias=bias,
            padding_mode=padding_mode,
            ortho_params=ortho_params,
        )
        self.kernel_size = kernel_size
        self.dilation = dilation
        self.groups = groups
        self.bias = bias
        self.padding_mode = padding_mode

    def forward(self, x):
        kernel = self.in_conv.weight
        # conv
        res = x
        if self.padding_mode == "circular":
            res = F.pad(
                res,
                (self.padding,) * 4,
                mode="circular",
                value=0,
            )
        res = F.conv2d(
            res,
            kernel,
            bias=self.in_conv.bias,
            padding=0,
            groups=self.groups,
        )
        # activation
        res = self.activation(res)
        # conv transpose
        if self.padding_mode == "circular":
            res = F.pad(
                res,
                (self.padding,) * 4,
                mode="circular",
                value=0,
            )
        res = 2 * F.conv_transpose2d(res, kernel, padding=0, groups=self.groups)
        # residual
        out = x - res
        return out

`init(in_channels, inner_dim_factor, kernel_size, dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

A Lipschitz residual block in which the main convolution is replaced by AdaptiveOrthoConv2d (AOC). This preserves 1-Lipschitz (or lower) behavior through an orthogonal parameterization, without explicitly computing a scaling factor t.

\[ y = x - \mathbf{K}^T \star (\sigma(\mathbf{K} \star x + b)), \]

Args: - in_channels (int): Number of input channels. - inner_dim_factor (int): Multiplier for internal representation size. - kernel_size (_size_2_t): Convolution kernel size. - dilation (_size_2_t, optional): Default is 1. - groups (int, optional): Default is 1. - bias (bool, optional): If True, adds a learnable bias. Default is True. - padding_mode (str, optional): 'circular' or 'zeros'. Default is 'circular'. - ortho_params (OrthoParams, optional): Orthogonal parameterization settings. Default is OrthoParams().

References

[1] Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B. A Unified Algebraic Perspective on Lipschitz Neural Networks. In The Eleventh International Conference on Learning Representations. https://arxiv.org/abs/2303.03169
[2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\SLL\sll_layer.py

def __init__(
    self,
    in_channels: int,
    inner_dim_factor: int,
    kernel_size: _size_2_t,
    dilation: _size_2_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "circular",
    ortho_params: OrthoParams = OrthoParams(),
):
    """
    A Lipschitz residual block in which the main convolution is replaced by
    `AdaptiveOrthoConv2d` (AOC). This preserves 1-Lipschitz (or lower) behavior through
    an orthogonal parameterization, without explicitly computing a scaling factor `t`.

    $$
    y = x - \mathbf{K}^T \\star (\sigma(\\mathbf{K} \\star x + b)),
    $$

    **Args**:
      - `in_channels` (int): Number of input channels.
      - `inner_dim_factor` (int): Multiplier for internal representation size.
      - `kernel_size` (_size_2_t): Convolution kernel size.
      - `dilation` (_size_2_t, optional): Default is 1.
      - `groups` (int, optional): Default is 1.
      - `bias` (bool, optional): If True, adds a learnable bias. Default is True.
      - `padding_mode` (str, optional): `'circular'` or `'zeros'`. Default is `'circular'`.
      - `ortho_params` (OrthoParams, optional): Orthogonal parameterization settings. Default is `OrthoParams()`.


    References:
        - [1] Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
        A Unified Algebraic Perspective on Lipschitz Neural Networks.
        In The Eleventh International Conference on Learning Representations.
        <https://arxiv.org/abs/2303.03169>
        - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """
    super().__init__()

    inner_dim = int(in_channels * inner_dim_factor)
    self.activation = nn.ReLU()

    if padding_mode not in ["circular", "zeros"]:
        raise ValueError("padding_mode must be either 'circular' or 'zeros'")
    if padding_mode == "circular":
        self.padding = 0  # will be handled by the padding function
    else:
        self.padding = kernel_size // 2

    self.in_conv = AdaptiveOrthoConv2d(
        in_channels,
        inner_dim,
        kernel_size=kernel_size,
        stride=1,
        padding="same",
        dilation=dilation,
        groups=groups,
        bias=bias,
        padding_mode=padding_mode,
        ortho_params=ortho_params,
    )
    self.kernel_size = kernel_size
    self.dilation = dilation
    self.groups = groups
    self.bias = bias
    self.padding_mode = padding_mode

`SDPBasedLipschitzDense` ¶

Bases: Module

Source code in orthogonium\layers\conv\SLL\sll_layer.py

class SDPBasedLipschitzDense(nn.Module):
    def __init__(self, in_features, out_features, inner_dim, **kwargs):
        """
        A 1-Lipschitz fully-connected layer (dense version). Similar to the convolutional
        SLL approach, but operates on vectors:

        $$
        y = x - K^T \\times (t \\times \sigma(K \\times x + b)),
        $$

        **Args**:
          - `in_features` (int): Input size.
          - `out_features` (int): Output size (must match `in_features` to remain 1-Lipschitz).
          - `inner_dim` (int): The internal dimension used for the transform.


        References:
            - Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
            A Unified Algebraic Perspective on Lipschitz Neural Networks.
            In The Eleventh International Conference on Learning Representations.
            <https://arxiv.org/abs/2303.03169>
        """
        super().__init__()

        inner_dim = inner_dim if inner_dim != -1 else in_features
        self.activation = nn.ReLU()

        self.weight = nn.Parameter(torch.empty(inner_dim, in_features))
        self.bias = nn.Parameter(torch.empty(1, inner_dim))
        self.q = nn.Parameter(torch.randn(inner_dim))

        nn.init.xavier_normal_(self.weight)
        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / np.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)  # bias init

    def compute_t(self):
        q = torch.exp(self.q)
        q_inv = torch.exp(-self.q)
        t = torch.abs(
            torch.einsum("i,ik,kj,j -> ij", q_inv, self.weight, self.weight.T, q)
        ).sum(1)
        t = safe_inv(t)
        return t

    def forward(self, x):
        t = self.compute_t()
        res = F.linear(x, self.weight)
        res = res + self.bias
        res = t * self.activation(res)
        res = 2 * F.linear(res, self.weight.T)
        out = x - res
        return out

`init(in_features, out_features, inner_dim, **kwargs)` ¶

A 1-Lipschitz fully-connected layer (dense version). Similar to the convolutional SLL approach, but operates on vectors:

\[ y = x - K^T \times (t \times \sigma(K \times x + b)), \]

Args: - in_features (int): Input size. - out_features (int): Output size (must match in_features to remain 1-Lipschitz). - inner_dim (int): The internal dimension used for the transform.

References

Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B. A Unified Algebraic Perspective on Lipschitz Neural Networks. In The Eleventh International Conference on Learning Representations. https://arxiv.org/abs/2303.03169

Source code in orthogonium\layers\conv\SLL\sll_layer.py

def __init__(self, in_features, out_features, inner_dim, **kwargs):
    """
    A 1-Lipschitz fully-connected layer (dense version). Similar to the convolutional
    SLL approach, but operates on vectors:

    $$
    y = x - K^T \\times (t \\times \sigma(K \\times x + b)),
    $$

    **Args**:
      - `in_features` (int): Input size.
      - `out_features` (int): Output size (must match `in_features` to remain 1-Lipschitz).
      - `inner_dim` (int): The internal dimension used for the transform.


    References:
        - Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
        A Unified Algebraic Perspective on Lipschitz Neural Networks.
        In The Eleventh International Conference on Learning Representations.
        <https://arxiv.org/abs/2303.03169>
    """
    super().__init__()

    inner_dim = inner_dim if inner_dim != -1 else in_features
    self.activation = nn.ReLU()

    self.weight = nn.Parameter(torch.empty(inner_dim, in_features))
    self.bias = nn.Parameter(torch.empty(1, inner_dim))
    self.q = nn.Parameter(torch.randn(inner_dim))

    nn.init.xavier_normal_(self.weight)
    fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
    bound = 1 / np.sqrt(fan_in)
    nn.init.uniform_(self.bias, -bound, bound)  # bias init

`SDPBasedLipschitzResBlock` ¶

Bases: Module

Source code in orthogonium\layers\conv\SLL\sll_layer.py

class SDPBasedLipschitzResBlock(nn.Module):
    def __init__(self, cin, inner_dim_factor, kernel_size=3, groups=1, **kwargs):
        """
         Original 1-Lipschitz convolutional residual block, based on the SDP-based Lipschitz
        layer (SLL) approach [1]. It has a structure akin to:

        out = x - 2 * ConvTranspose( t * ReLU(Conv(x) + bias) )

        where `t` is a channel-wise scaling factor ensuring a Lipschitz constant ≤ 1.

        !!! note
            By default, `SDPBasedLipschitzResBlock` assumes `cin == cout` and does **not** handle
            stride changes outside the skip connection (i.e., typically used when stride=1 or 2
            for downsampling in a standard residual architecture).

        **Args**:
          - `cin` (int): Number of input channels.
          - `cout` (int): Number of output channels.
          - `inner_dim_factor` (float): Multiplier for the intermediate dimensionality.
          - `kernel_size` (int, optional): Size of the convolution kernel. Default is 3.
          - `groups` (int, optional): Number of groups for the convolution. Default is 1.
          - `**kwargs`: Additional keyword arguments (unused).


        References:
            - Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
            A Unified Algebraic Perspective on Lipschitz Neural Networks.
            In The Eleventh International Conference on Learning Representations.
            <https://arxiv.org/abs/2303.03169>
        """
        super().__init__()

        inner_dim = int(cin * inner_dim_factor)
        self.activation = nn.ReLU()
        self.groups = groups

        self.padding = kernel_size // 2

        self.kernel = nn.Parameter(
            torch.randn(inner_dim, cin // groups, kernel_size, kernel_size)
        )
        parametrize.register_parametrization(
            self,
            "kernel",
            AOLReparametrizer(
                inner_dim,
                groups=groups,
            ),
        )
        self.bias = nn.Parameter(torch.empty(1, inner_dim, 1, 1))
        self.q = nn.Parameter(torch.ones(inner_dim, 1, 1, 1))

        nn.init.xavier_normal_(self.kernel)
        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.kernel)
        bound = 1 / np.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)  # bias init

    def forward(self, x):
        res = F.conv2d(x, self.kernel, padding=self.padding, groups=self.groups)
        res = res + self.bias
        res = self.activation(res)
        with parametrize.cached():
            res = 2 * F.conv_transpose2d(
                res, self.kernel, padding=self.padding, groups=self.groups
            )
        out = x - res
        return out

`init(cin, inner_dim_factor, kernel_size=3, groups=1, **kwargs)` ¶

Original 1-Lipschitz convolutional residual block, based on the SDP-based Lipschitz layer (SLL) approach [1]. It has a structure akin to:

out = x - 2 * ConvTranspose( t * ReLU(Conv(x) + bias) )

where t is a channel-wise scaling factor ensuring a Lipschitz constant ≤ 1.

Note

By default, SDPBasedLipschitzResBlock assumes cin == cout and does not handle stride changes outside the skip connection (i.e., typically used when stride=1 or 2 for downsampling in a standard residual architecture).

Args: - cin (int): Number of input channels. - cout (int): Number of output channels. - inner_dim_factor (float): Multiplier for the intermediate dimensionality. - kernel_size (int, optional): Size of the convolution kernel. Default is 3. - groups (int, optional): Number of groups for the convolution. Default is 1. - **kwargs: Additional keyword arguments (unused).

References

Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B. A Unified Algebraic Perspective on Lipschitz Neural Networks. In The Eleventh International Conference on Learning Representations. https://arxiv.org/abs/2303.03169

Source code in orthogonium\layers\conv\SLL\sll_layer.py

def __init__(self, cin, inner_dim_factor, kernel_size=3, groups=1, **kwargs):
    """
     Original 1-Lipschitz convolutional residual block, based on the SDP-based Lipschitz
    layer (SLL) approach [1]. It has a structure akin to:

    out = x - 2 * ConvTranspose( t * ReLU(Conv(x) + bias) )

    where `t` is a channel-wise scaling factor ensuring a Lipschitz constant ≤ 1.

    !!! note
        By default, `SDPBasedLipschitzResBlock` assumes `cin == cout` and does **not** handle
        stride changes outside the skip connection (i.e., typically used when stride=1 or 2
        for downsampling in a standard residual architecture).

    **Args**:
      - `cin` (int): Number of input channels.
      - `cout` (int): Number of output channels.
      - `inner_dim_factor` (float): Multiplier for the intermediate dimensionality.
      - `kernel_size` (int, optional): Size of the convolution kernel. Default is 3.
      - `groups` (int, optional): Number of groups for the convolution. Default is 1.
      - `**kwargs`: Additional keyword arguments (unused).


    References:
        - Araujo, A., Havens, A. J., Delattre, B., Allauzen, A., & Hu, B.
        A Unified Algebraic Perspective on Lipschitz Neural Networks.
        In The Eleventh International Conference on Learning Representations.
        <https://arxiv.org/abs/2303.03169>
    """
    super().__init__()

    inner_dim = int(cin * inner_dim_factor)
    self.activation = nn.ReLU()
    self.groups = groups

    self.padding = kernel_size // 2

    self.kernel = nn.Parameter(
        torch.randn(inner_dim, cin // groups, kernel_size, kernel_size)
    )
    parametrize.register_parametrization(
        self,
        "kernel",
        AOLReparametrizer(
            inner_dim,
            groups=groups,
        ),
    )
    self.bias = nn.Parameter(torch.empty(1, inner_dim, 1, 1))
    self.q = nn.Parameter(torch.ones(inner_dim, 1, 1, 1))

    nn.init.xavier_normal_(self.kernel)
    fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.kernel)
    bound = 1 / np.sqrt(fan_in)
    nn.init.uniform_(self.bias, -bound, bound)  # bias init

`SLLxAOCLipschitzResBlock` ¶

Bases: Module

Source code in orthogonium\layers\conv\SLL\sll_layer.py

class SLLxAOCLipschitzResBlock(nn.Module):
    def __init__(
        self, cin, cout, inner_dim_factor, kernel_size=3, stride=2, groups=1, **kwargs
    ):
        """
        Extended SLL-based convolutional residual block. Supports arbitrary kernel sizes,
        strides, and changes in the number of channels by integrating additional
        orthogonal convolutions *and* fusing them via `\mathbconv` [1].

        The forward pass follows:

        $$
        y = (\mathbf{K}_{post} \circledast \mathbf{K}_{pre}) \\star x - (\mathbf{K}_{post} \circledast \mathbf{K}^T) \\star (t \\times  \sigma(( \mathbf{K} \circledast \mathbf{K}_{pre}) \\star x + b)),
        $$

        where $\mathbf{K}_{pre}$ and $\mathbf{K}_{post}$ are obtained with AOC.


        <img src="../../assets/SLL_3.png" alt="illustration of SLL x AOC" width="600">



        where the kernel `\kernel{K}` may effectively be expanded by pre/post AOC layers to
        handle stride and channel changes. This approach is described in "Improving
        SDP-based Lipschitz Layers" of [1].

        **Args**:
          - `cin` (int): Number of input channels.
          - `inner_dim_factor` (float): Multiplier for the internal channel dimension.
          - `kernel_size` (int, optional): Base kernel size for the SLL portion. Default is 3.
          - `stride` (int, optional): Stride for the skip connection. Default is 2.
          - `groups` (int, optional): Number of groups for the convolution. Default is 1.
          - `**kwargs`: Additional options (unused).



        References:
            - Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
            An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
            <https://arxiv.org/abs/2501.07930>
        """
        super().__init__()
        inner_kernel_size = kernel_size - (stride - 1)
        self.skip_kernel_size = stride + (stride // 2)
        inner_dim = int(cout * inner_dim_factor)
        self.activation = nn.ReLU()
        self.stride = stride
        self.groups = groups
        self.padding = kernel_size // 2
        self.kernel = nn.Parameter(
            torch.randn(
                inner_dim, cin // self.groups, inner_kernel_size, inner_kernel_size
            )
        )
        parametrize.register_parametrization(
            self,
            "kernel",
            AOLReparametrizer(
                inner_dim,
                groups=groups,
            ),
        )
        self.bias = nn.Parameter(torch.empty(1, inner_dim, 1, 1))
        self.q = nn.Parameter(torch.ones(inner_dim, 1, 1, 1))

        nn.init.xavier_normal_(self.kernel)
        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.kernel)
        bound = 1 / np.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)  # bias init

        self.pre_conv = AdaptiveOrthoConv2d(
            cin, cin, kernel_size=stride, stride=1, bias=False, padding=0, groups=groups
        )
        self.post_conv = AdaptiveOrthoConv2d(
            cin,
            cout,
            kernel_size=stride,
            stride=stride,
            bias=False,
            padding=0,
            groups=groups,
        )

    def forward(self, x):
        # compute t
        # print(self.pre_conv.weight.shape, self.kernel.shape, self.post_conv.weight.shape)
        kernel_1a = fast_matrix_conv(
            self.pre_conv.weight, self.kernel, groups=self.groups
        )
        with parametrize.cached():
            kernel_1b = fast_matrix_conv(
                transpose_kernel(self.kernel, groups=self.groups),
                self.post_conv.weight,
                groups=self.groups,
            )
            kernel_2 = fast_matrix_conv(
                self.pre_conv.weight, self.post_conv.weight, groups=self.groups
            )
            # first branch
            # fuse pre conv with kernel
            res = F.conv2d(x, kernel_1a, padding=self.padding, groups=self.groups)
            res = res + self.bias
            res = self.activation(res)
            res = 2 * F.conv2d(
                res,
                kernel_1b,
                padding=self.padding,
                stride=self.stride,
                groups=self.groups,
            )
            # residual branch
            x = F.conv2d(
                x,
                kernel_2,
                padding=self.skip_kernel_size // 2,
                stride=self.stride,
                groups=self.groups,
            )
        # skip connection
        out = x - res
        return out

`init(cin, cout, inner_dim_factor, kernel_size=3, stride=2, groups=1, **kwargs)` ¶

Extended SLL-based convolutional residual block. Supports arbitrary kernel sizes, strides, and changes in the number of channels by integrating additional orthogonal convolutions and fusing them via \mathbconv [1].

The forward pass follows:

\[ y = (\mathbf{K}_{post} \circledast \mathbf{K}_{pre}) \star x - (\mathbf{K}_{post} \circledast \mathbf{K}^T) \star (t \times \sigma(( \mathbf{K} \circledast \mathbf{K}_{pre}) \star x + b)), \]

where \(\mathbf{K}_{pre}\) and \(\mathbf{K}_{post}\) are obtained with AOC.

illustration of SLL x AOC

where the kernel \kernel{K} may effectively be expanded by pre/post AOC layers to handle stride and channel changes. This approach is described in "Improving SDP-based Lipschitz Layers" of [1].

Args: - cin (int): Number of input channels. - inner_dim_factor (float): Multiplier for the internal channel dimension. - kernel_size (int, optional): Base kernel size for the SLL portion. Default is 3. - stride (int, optional): Stride for the skip connection. Default is 2. - groups (int, optional): Number of groups for the convolution. Default is 1. - **kwargs: Additional options (unused).

References

Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\SLL\sll_layer.py

def __init__(
    self, cin, cout, inner_dim_factor, kernel_size=3, stride=2, groups=1, **kwargs
):
    """
    Extended SLL-based convolutional residual block. Supports arbitrary kernel sizes,
    strides, and changes in the number of channels by integrating additional
    orthogonal convolutions *and* fusing them via `\mathbconv` [1].

    The forward pass follows:

    $$
    y = (\mathbf{K}_{post} \circledast \mathbf{K}_{pre}) \\star x - (\mathbf{K}_{post} \circledast \mathbf{K}^T) \\star (t \\times  \sigma(( \mathbf{K} \circledast \mathbf{K}_{pre}) \\star x + b)),
    $$

    where $\mathbf{K}_{pre}$ and $\mathbf{K}_{post}$ are obtained with AOC.


    <img src="../../assets/SLL_3.png" alt="illustration of SLL x AOC" width="600">



    where the kernel `\kernel{K}` may effectively be expanded by pre/post AOC layers to
    handle stride and channel changes. This approach is described in "Improving
    SDP-based Lipschitz Layers" of [1].

    **Args**:
      - `cin` (int): Number of input channels.
      - `inner_dim_factor` (float): Multiplier for the internal channel dimension.
      - `kernel_size` (int, optional): Base kernel size for the SLL portion. Default is 3.
      - `stride` (int, optional): Stride for the skip connection. Default is 2.
      - `groups` (int, optional): Number of groups for the convolution. Default is 1.
      - `**kwargs`: Additional options (unused).



    References:
        - Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """
    super().__init__()
    inner_kernel_size = kernel_size - (stride - 1)
    self.skip_kernel_size = stride + (stride // 2)
    inner_dim = int(cout * inner_dim_factor)
    self.activation = nn.ReLU()
    self.stride = stride
    self.groups = groups
    self.padding = kernel_size // 2
    self.kernel = nn.Parameter(
        torch.randn(
            inner_dim, cin // self.groups, inner_kernel_size, inner_kernel_size
        )
    )
    parametrize.register_parametrization(
        self,
        "kernel",
        AOLReparametrizer(
            inner_dim,
            groups=groups,
        ),
    )
    self.bias = nn.Parameter(torch.empty(1, inner_dim, 1, 1))
    self.q = nn.Parameter(torch.ones(inner_dim, 1, 1, 1))

    nn.init.xavier_normal_(self.kernel)
    fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.kernel)
    bound = 1 / np.sqrt(fan_in)
    nn.init.uniform_(self.bias, -bound, bound)  # bias init

    self.pre_conv = AdaptiveOrthoConv2d(
        cin, cin, kernel_size=stride, stride=1, bias=False, padding=0, groups=groups
    )
    self.post_conv = AdaptiveOrthoConv2d(
        cin,
        cout,
        kernel_size=stride,
        stride=stride,
        bias=False,
        padding=0,
        groups=groups,
    )

`AOLConv2D` ¶

Bases: Conv2d

Source code in orthogonium\layers\conv\AOL\aol.py

class AOLConv2D(nn.Conv2d):

    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        padding_mode="zeros",
        device=None,
        dtype=None,
        niter=1,
    ):
        """
        Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce
        almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer
        is guaranteed to be less than 1.

        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            kernel_size (int or tuple): Size of the convolution kernel.
            stride (int or tuple, optional): Stride of the convolution. Default is 1.
            padding (int or tuple, optional): Padding size. Default is 0.
            dilation (int or tuple, optional): Dilation rate. Default is 1.
            groups (int, optional): Number of groups. Default is 1.
            bias (bool, optional): Whether to include a learnable bias. Default is True.
            padding_mode (str, optional): Padding mode. Default is "zeros".
            device (torch.device, optional): Device to store the layer parameters. Default is None.
            dtype (torch.dtype, optional): Data type to store the layer parameters. Default is None.


        References:
            `[1] Prach, B., & Lampert, C. H. (2022).
                   "Almost-orthogonal layers for efficient general-purpose lipschitz networks."
                   ECCV.`<https://arxiv.org/abs/2208.03160>`_
        """
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            groups=groups,
            bias=bias,
            padding_mode=padding_mode,
            device=device,
            dtype=dtype,
        )
        self.niter = niter

        parametrize.register_parametrization(
            self,
            "weight",
            MultiStepAOLReparametrizer(
                min(out_channels, in_channels),
                groups=groups,
                niter=niter,
            ),
        )

    def reset_parameters(self) -> None:
        r"""Resets parameters of the module. This includes the weight and bias
        parameters, if they are used.
        """
        super().reset_parameters()
        # # Reset the parametrization
        # init kernel using the orthogonal kernel
        if not (
            self.in_channels // self.groups == 0
            and self.out_channels // self.groups == 0
        ):
            self.kernel = conv_orthogonal_(
                self.weight,
                stride=self.stride,
                groups=self.groups,
            )

`init(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None, niter=1)` ¶

Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer is guaranteed to be less than 1.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`int or tuple`	Size of the convolution kernel.	required
`stride`	`int or tuple`	Stride of the convolution. Default is 1.	`1`
`padding`	`int or tuple`	Padding size. Default is 0.	`0`
`dilation`	`int or tuple`	Dilation rate. Default is 1.	`1`
`groups`	`int`	Number of groups. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`padding_mode`	`str`	Padding mode. Default is "zeros".	`'zeros'`
`device`	`device`	Device to store the layer parameters. Default is None.	`None`
`dtype`	`dtype`	Data type to store the layer parameters. Default is None.	`None`

References

[1] Prach, B., & Lampert, C. H. (2022). "Almost-orthogonal layers for efficient general-purpose lipschitz networks." ECCV.https://arxiv.org/abs/2208.03160`_

Source code in orthogonium\layers\conv\AOL\aol.py

def __init__(
    self,
    in_channels,
    out_channels,
    kernel_size,
    stride=1,
    padding=0,
    dilation=1,
    groups=1,
    bias=True,
    padding_mode="zeros",
    device=None,
    dtype=None,
    niter=1,
):
    """
    Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce
    almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer
    is guaranteed to be less than 1.

    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (int or tuple): Size of the convolution kernel.
        stride (int or tuple, optional): Stride of the convolution. Default is 1.
        padding (int or tuple, optional): Padding size. Default is 0.
        dilation (int or tuple, optional): Dilation rate. Default is 1.
        groups (int, optional): Number of groups. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        padding_mode (str, optional): Padding mode. Default is "zeros".
        device (torch.device, optional): Device to store the layer parameters. Default is None.
        dtype (torch.dtype, optional): Data type to store the layer parameters. Default is None.


    References:
        `[1] Prach, B., & Lampert, C. H. (2022).
               "Almost-orthogonal layers for efficient general-purpose lipschitz networks."
               ECCV.`<https://arxiv.org/abs/2208.03160>`_
    """
    super().__init__(
        in_channels=in_channels,
        out_channels=out_channels,
        kernel_size=kernel_size,
        stride=stride,
        padding=padding,
        dilation=dilation,
        groups=groups,
        bias=bias,
        padding_mode=padding_mode,
        device=device,
        dtype=dtype,
    )
    self.niter = niter

    parametrize.register_parametrization(
        self,
        "weight",
        MultiStepAOLReparametrizer(
            min(out_channels, in_channels),
            groups=groups,
            niter=niter,
        ),
    )

`reset_parameters()` ¶

Resets parameters of the module. This includes the weight and bias parameters, if they are used.

Source code in orthogonium\layers\conv\AOL\aol.py

def reset_parameters(self) -> None:
    r"""Resets parameters of the module. This includes the weight and bias
    parameters, if they are used.
    """
    super().reset_parameters()
    # # Reset the parametrization
    # init kernel using the orthogonal kernel
    if not (
        self.in_channels // self.groups == 0
        and self.out_channels // self.groups == 0
    ):
        self.kernel = conv_orthogonal_(
            self.weight,
            stride=self.stride,
            groups=self.groups,
        )

`AOLConvTranspose2D` ¶

Bases: ConvTranspose2d

Source code in orthogonium\layers\conv\AOL\aol.py

class AOLConvTranspose2D(nn.ConvTranspose2d):

    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size,
        stride=1,
        padding=0,
        output_padding=0,
        groups=1,
        bias=True,
        dilation=1,
        padding_mode="zeros",
        device=None,
        dtype=None,
        niter=1,
    ):
        """
        Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce
        almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer
        is guaranteed to be less than 1.

        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            kernel_size (int or tuple): Size of the convolution kernel.
            stride (int or tuple, optional): Stride of the convolution. Default is 1.
            padding (int or tuple, optional): Padding size. Default is 0.
            output_padding (int or tuple, optional): Additional size added to the output shape. Default is 0.
            groups (int, optional): Number of groups. Default is 1.
            bias (bool, optional): Whether to include a learnable bias. Default is True.
            dilation (int or tuple, optional): Dilation rate. Default is 1.
            padding_mode (str, optional): Padding mode. Default is "zeros".
            device (torch.device, optional): Device to store the layer parameters. Default is None.
            dtype (torch.dtype, optional): Data type to store the layer parameters. Default is None.


        References:
            `[1] Prach, B., & Lampert, C. H. (2022).
                   "Almost-orthogonal layers for efficient general-purpose lipschitz networks."
                   ECCV.`<https://arxiv.org/abs/2208.03160>`_
        """
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            output_padding=output_padding,
            groups=groups,
            bias=bias,
            dilation=dilation,
            padding_mode=padding_mode,
            device=device,
            dtype=dtype,
        )
        self.niter = niter

        # Register the same AOLReparametrizer
        parametrize.register_parametrization(
            self,
            "weight",
            MultiStepAOLReparametrizer(
                min(out_channels, in_channels), groups=groups, niter=niter
            ),
        )

`init(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', device=None, dtype=None, niter=1)` ¶

Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer is guaranteed to be less than 1.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`int or tuple`	Size of the convolution kernel.	required
`stride`	`int or tuple`	Stride of the convolution. Default is 1.	`1`
`padding`	`int or tuple`	Padding size. Default is 0.	`0`
`output_padding`	`int or tuple`	Additional size added to the output shape. Default is 0.	`0`
`groups`	`int`	Number of groups. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`dilation`	`int or tuple`	Dilation rate. Default is 1.	`1`
`padding_mode`	`str`	Padding mode. Default is "zeros".	`'zeros'`
`device`	`device`	Device to store the layer parameters. Default is None.	`None`
`dtype`	`dtype`	Data type to store the layer parameters. Default is None.	`None`

References

[1] Prach, B., & Lampert, C. H. (2022). "Almost-orthogonal layers for efficient general-purpose lipschitz networks." ECCV.https://arxiv.org/abs/2208.03160`_

Source code in orthogonium\layers\conv\AOL\aol.py

def __init__(
    self,
    in_channels,
    out_channels,
    kernel_size,
    stride=1,
    padding=0,
    output_padding=0,
    groups=1,
    bias=True,
    dilation=1,
    padding_mode="zeros",
    device=None,
    dtype=None,
    niter=1,
):
    """
    Almost-Orthogonal Convolution layer. This layer implements the method proposed in [1] to enforce
    almost-orthogonality. While orthogonality is not enforced, the lipschitz constant of the layer
    is guaranteed to be less than 1.

    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (int or tuple): Size of the convolution kernel.
        stride (int or tuple, optional): Stride of the convolution. Default is 1.
        padding (int or tuple, optional): Padding size. Default is 0.
        output_padding (int or tuple, optional): Additional size added to the output shape. Default is 0.
        groups (int, optional): Number of groups. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        dilation (int or tuple, optional): Dilation rate. Default is 1.
        padding_mode (str, optional): Padding mode. Default is "zeros".
        device (torch.device, optional): Device to store the layer parameters. Default is None.
        dtype (torch.dtype, optional): Data type to store the layer parameters. Default is None.


    References:
        `[1] Prach, B., & Lampert, C. H. (2022).
               "Almost-orthogonal layers for efficient general-purpose lipschitz networks."
               ECCV.`<https://arxiv.org/abs/2208.03160>`_
    """
    super().__init__(
        in_channels=in_channels,
        out_channels=out_channels,
        kernel_size=kernel_size,
        stride=stride,
        padding=padding,
        output_padding=output_padding,
        groups=groups,
        bias=bias,
        dilation=dilation,
        padding_mode=padding_mode,
        device=device,
        dtype=dtype,
    )
    self.niter = niter

    # Register the same AOLReparametrizer
    parametrize.register_parametrization(
        self,
        "weight",
        MultiStepAOLReparametrizer(
            min(out_channels, in_channels), groups=groups, niter=niter
        ),
    )

`MultiStepAOLReparametrizer` ¶

Bases: Module

Source code in orthogonium\layers\conv\AOL\aol.py

class MultiStepAOLReparametrizer(nn.Module):
    def __init__(self, nb_features, groups, niter=4):
        super(MultiStepAOLReparametrizer, self).__init__()
        self.groups = groups
        self.nb_features = nb_features
        self.niter = niter
        self.q = nn.Parameter(torch.ones(nb_features, 1, 1, 1))

    def forward(self, kernel):
        co, cig, ks, ks2 = kernel.shape
        if co // self.groups >= cig:
            kernel = transpose_kernel(kernel, self.groups, flip=True)
        kkt = kernel
        log_curr_norm = 0
        for i in range(self.niter):
            kkt_norm = kkt.norm().detach()
            kkt = kkt / kkt_norm
            log_curr_norm = 2 * (log_curr_norm + kkt_norm.log())
            kkt = fast_matrix_conv(
                transpose_kernel(kkt, self.groups, flip=True), kkt, self.groups
            )

        inverse_power = 2 ** (-self.niter)
        t = torch.abs(kkt)
        q = torch.exp(self.q)
        q_inv = torch.exp(-self.q)
        t = q_inv * t * q
        t = t.sum((1, 2, 3)).pow(inverse_power)
        norm = torch.exp(log_curr_norm * inverse_power)
        t = t * norm
        t = t.reshape(-1, 1, 1, 1)
        kernel = kernel / t
        if co // self.groups >= cig:
            kernel = transpose_kernel(kernel, self.groups, flip=True)
        return kernel

    def right_inverse(self, kernel):
        return kernel

    def reset_parameters(self):
        """
        Resets the parameters of the reparametrizer.
        """
        # Reset the q parameter to its initial value
        self.q.data.fill_(1.0)

`reset_parameters()` ¶

Resets the parameters of the reparametrizer.

Source code in orthogonium\layers\conv\AOL\aol.py

def reset_parameters(self):
    """
    Resets the parameters of the reparametrizer.
    """
    # Reset the q parameter to its initial value
    self.q.data.fill_(1.0)

`AdaptiveSOCConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

Factory function to create an orthogonal convolutional layer, selecting the appropriate class based on kernel size and stride. This is a modified implementation of the Skew orthogonal convolution [1], with significant modification from the original paper:

This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done in a single iteration. As described in [2].
This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is handled natively using the ad adaptive scheme.
the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to converge.

It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers. This layer also intend to be compatible with all the feature of the nn.Conv2d class (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

Note

this implementation changes the size of the kernel, which also change the padding semantics. Please adjust the padding according to the kernel size and the number of iterations.
current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous. Similarly, the stable rank is evaluated loosely (must be greater than 0.5).

Key Features:¶

- Enforces orthogonality, preserving gradient norms.
- Supports native striding, dilation, grouped convolutions, and flexible padding.

Behavior:¶

- When kernel_size == stride, the layer is an `RKOConv2d`.
- When stride == 1, the layer is a `FastBlockConv2d`.
- Otherwise, the layer is a `BcopRkoConv2d`.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`_size_2_t`	Size of the convolution kernel.	required
`stride`	`_size_2_t`	Stride of the convolution. Default is 1.	`1`
`padding`	`str or _size_2_t`	Padding mode or size. Default is "same".	`'same'`
`dilation`	`_size_2_t`	Dilation rate. Default is 1.	`1`
`groups`	`int`	Number of blocked connections from input to output channels. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`padding_mode`	`str`	Padding mode. Default is "circular".	`'circular'`
`ortho_params`	`OrthoParams`	Parameters to control orthogonality. Default is `OrthoParams()`.	`OrthoParams()`

Returns:

Type	Description
`Conv2d`	A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

Raises:

Type	Description
`ValueError`	If kernel_size < stride, as orthogonality cannot be enforced.

References

[1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference on Machine Learning (pp. 9756-9766). PMLR.https://arxiv.org/abs/2105.11417
[2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\adaptiveSOC\ortho_conv.py

def AdaptiveSOCConv2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: Union[str, _size_2_t] = "same",
    dilation: _size_2_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "circular",
    ortho_params: OrthoParams = OrthoParams(),
) -> nn.Conv2d:
    """
    Factory function to create an orthogonal convolutional layer, selecting the appropriate class based on kernel
    size and stride. This is a modified implementation of the `Skew orthogonal convolution` [1], with significant
    modification from the original paper:


    - This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done
        in a single iteration. As described in [2].
    - This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is
        handled natively using the ad adaptive scheme.
    - the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to
        converge.

    It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the
    convolutional layers. This layer also intend to be compatible with all the feature of the `nn.Conv2d` class
    (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward
    operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

    Note:
        - this implementation changes the size of the kernel, which also change the padding semantics. Please adjust
            the padding according to the kernel size and the number of iterations.
        - current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous.
            Similarly, the stable rank is evaluated loosely (must be greater than 0.5).

    Key Features:
    -------------
        - Enforces orthogonality, preserving gradient norms.
        - Supports native striding, dilation, grouped convolutions, and flexible padding.

    Behavior:
    -------------
        - When kernel_size == stride, the layer is an `RKOConv2d`.
        - When stride == 1, the layer is a `FastBlockConv2d`.
        - Otherwise, the layer is a `BcopRkoConv2d`.

    Arguments:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (_size_2_t): Size of the convolution kernel.
        stride (_size_2_t, optional): Stride of the convolution. Default is 1.
        padding (str or _size_2_t, optional): Padding mode or size. Default is "same".
        dilation (_size_2_t, optional): Dilation rate. Default is 1.
        groups (int, optional): Number of blocked connections from input to output channels. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        padding_mode (str, optional): Padding mode. Default is "circular".
        ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.

    Returns:
        A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

    Raises:
        `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.


    References:
        - [1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference
        on Machine Learning (pp. 9756-9766). PMLR.<https://arxiv.org/abs/2105.11417>
        - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """
    if kernel_size < stride:
        raise ValueError(
            "kernel size must be smaller than stride. The set of orthonal convolutions is empty in this setting."
        )
    if kernel_size == stride:
        convclass = RKOConv2d
    elif stride == 1:
        convclass = FastSOC
    else:
        convclass = SOCRkoConv2d
    return convclass(
        in_channels,
        out_channels,
        kernel_size,
        stride,
        padding,
        dilation,
        groups,
        bias,
        padding_mode,
        # ortho_params=ortho_params,
    )

`AdaptiveSOCConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams())` ¶

Factory function to create an orthogonal transposed convolutional layer, selecting the appropriate class based on kernel size and stride. This is a modified implementation of the Skew orthogonal convolution [1], with significant modification from the original paper:

This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done in a single iteration. As described in [2].
This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is handled natively using the ad adaptive scheme.
the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to converge.

It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the convolutional layers. This layer also intend to be compatible with all the feature of the nn.Conv2d class (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

Note

this implementation changes the size of the kernel, which also change the padding semantics. Please adjust the padding according to the kernel size and the number of iterations.
current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous. Similarly, the stable rank is evaluated loosely (must be greater than 0.5).

Key Features:¶

- Enforces orthogonality, preserving gradient norms.
- Supports native striding, dilation, grouped convolutions, and flexible padding.

Behavior:¶

- When kernel_size == stride, the layer is an `RKOConv2d`.
- When stride == 1, the layer is a `FastBlockConv2d`.
- Otherwise, the layer is a `BcopRkoConv2d`.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of input channels.	required
`out_channels`	`int`	Number of output channels.	required
`kernel_size`	`_size_2_t`	Size of the convolution kernel.	required
`stride`	`_size_2_t`	Stride of the convolution. Default is 1.	`1`
`padding`	`str or _size_2_t`	Padding mode or size. Default is "same".	`0`
`dilation`	`_size_2_t`	Dilation rate. Default is 1.	`1`
`groups`	`int`	Number of blocked connections from input to output channels. Default is 1.	`1`
`bias`	`bool`	Whether to include a learnable bias. Default is True.	`True`
`padding_mode`	`str`	Padding mode. Default is "circular".	`'zeros'`
`ortho_params`	`OrthoParams`	Parameters to control orthogonality. Default is `OrthoParams()`.	`OrthoParams()`

Returns:

Type	Description
`ConvTranspose2d`	A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

Raises:

Type	Description
`ValueError`	If kernel_size < stride, as orthogonality cannot be enforced.

References

[1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference on Machine Learning (pp. 9756-9766). PMLR.https://arxiv.org/abs/2105.11417
[2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025). An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures. https://arxiv.org/abs/2501.07930

Source code in orthogonium\layers\conv\adaptiveSOC\ortho_conv.py

def AdaptiveSOCConvTranspose2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: _size_2_t = 0,
    output_padding: _size_2_t = 0,
    groups: int = 1,
    bias: bool = True,
    dilation: _size_2_t = 1,
    padding_mode: str = "zeros",
    ortho_params: OrthoParams = OrthoParams(),
) -> nn.ConvTranspose2d:
    """
    Factory function to create an orthogonal transposed convolutional layer, selecting the appropriate class based on
    kernel size and stride. This is a modified implementation of the `Skew orthogonal convolution` [1], with significant
    modification from the original paper:

    - This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done
        in a single iteration. As described in [2].
    - This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is
        handled natively using the ad adaptive scheme.
    - the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to
        converge.

    It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the
    convolutional layers. This layer also intend to be compatible with all the feature of the `nn.Conv2d` class
    (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward
    operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.

    Note:
        - this implementation changes the size of the kernel, which also change the padding semantics. Please adjust
            the padding according to the kernel size and the number of iterations.
        - current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous.
            Similarly, the stable rank is evaluated loosely (must be greater than 0.5).

    Key Features:
    -------------
        - Enforces orthogonality, preserving gradient norms.
        - Supports native striding, dilation, grouped convolutions, and flexible padding.

    Behavior:
    -------------
        - When kernel_size == stride, the layer is an `RKOConv2d`.
        - When stride == 1, the layer is a `FastBlockConv2d`.
        - Otherwise, the layer is a `BcopRkoConv2d`.

    Arguments:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (_size_2_t): Size of the convolution kernel.
        stride (_size_2_t, optional): Stride of the convolution. Default is 1.
        padding (str or _size_2_t, optional): Padding mode or size. Default is "same".
        dilation (_size_2_t, optional): Dilation rate. Default is 1.
        groups (int, optional): Number of blocked connections from input to output channels. Default is 1.
        bias (bool, optional): Whether to include a learnable bias. Default is True.
        padding_mode (str, optional): Padding mode. Default is "circular".
        ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.

    Returns:
        A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).

    Raises:
        `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.


    References:
        - [1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference
        on Machine Learning (pp. 9756-9766). PMLR.<https://arxiv.org/abs/2105.11417>
        - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
        An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
        <https://arxiv.org/abs/2501.07930>
    """
    if kernel_size < stride:
        raise ValueError(
            "kernel size must be smaller than stride. The set of orthonal convolutions is empty in this setting."
        )
    if kernel_size == stride:
        convclass = RkoConvTranspose2d
    elif stride == 1:
        convclass = SOCTranspose
    else:
        convclass = SOCRkoConvTranspose2d
    return convclass(
        in_channels,
        out_channels,
        kernel_size,
        stride,
        padding,
        output_padding,
        groups,
        bias,
        dilation,
        padding_mode,
        # ortho_params=ortho_params,
    )

convolutions

AdaptiveOrthoConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams()) ¶

Key Features:¶

Behavior:¶

AdaptiveOrthoConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams()) ¶

Key Features:¶

Behavior:¶

SSL derived 1-Lipschitz Layers¶

References¶

Notes on the SLL approach¶

AOCLipschitzResBlock ¶

__init__(in_channels, inner_dim_factor, kernel_size, dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams()) ¶

SDPBasedLipschitzDense ¶

__init__(in_features, out_features, inner_dim, **kwargs) ¶

SDPBasedLipschitzResBlock ¶

__init__(cin, inner_dim_factor, kernel_size=3, groups=1, **kwargs) ¶

SLLxAOCLipschitzResBlock ¶

__init__(cin, cout, inner_dim_factor, kernel_size=3, stride=2, groups=1, **kwargs) ¶

AOLConv2D ¶

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None, niter=1) ¶

reset_parameters() ¶

AOLConvTranspose2D ¶

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', device=None, dtype=None, niter=1) ¶

MultiStepAOLReparametrizer ¶

reset_parameters() ¶

AdaptiveSOCConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams()) ¶

Key Features:¶

Behavior:¶

AdaptiveSOCConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams()) ¶

Key Features:¶

Behavior:¶

`AdaptiveOrthoConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

`AdaptiveOrthoConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams())` ¶

`AOCLipschitzResBlock` ¶

`init(in_channels, inner_dim_factor, kernel_size, dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

`SDPBasedLipschitzDense` ¶

`init(in_features, out_features, inner_dim, **kwargs)` ¶

`SDPBasedLipschitzResBlock` ¶

`init(cin, inner_dim_factor, kernel_size=3, groups=1, **kwargs)` ¶

`SLLxAOCLipschitzResBlock` ¶

`init(cin, cout, inner_dim_factor, kernel_size=3, stride=2, groups=1, **kwargs)` ¶

`AOLConv2D` ¶

`init(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None, niter=1)` ¶

`reset_parameters()` ¶

`AOLConvTranspose2D` ¶

`init(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', device=None, dtype=None, niter=1)` ¶

`MultiStepAOLReparametrizer` ¶

`reset_parameters()` ¶

`AdaptiveSOCConv2d(in_channels, out_channels, kernel_size, stride=1, padding='same', dilation=1, groups=1, bias=True, padding_mode='circular', ortho_params=OrthoParams())` ¶

`AdaptiveSOCConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', ortho_params=OrthoParams())` ¶