👋 Getting started 1: Creating a 1-Lipschitz neural network¶

The goal of this series of tutorials is to show the different usages of deel-lip.

In this first notebook, our objective is to show how to create 1-Lipschitz neural networks with deel-lip.

In particular, we will cover the following: 1. 📚 Theoretical background
A brief theoretical background on Lipschitz continuous functions. This section can be safely skipped if one is not interested in the theory. 2. 🧱 Creating a 1-Lipschitz neural network with deel-lip and keras
An example of how to create a 1-Lipschitz neural network with deel-lip and keras. 3. 🔨 Design rules for 1-Lipschitz neural networks with deel-lip
A set of neural network design rules that one must respect in order to enforce the 1-Lipschitz constraint.

📚 Theoretical background ¶

What is the Lipschitz constant¶

The deel-lip package allows to control the Lipschitz constant of a layer or of a whole neural network. The Lipschitz constant is a mathematical property of a function (in our context of work, a layer or a model) that characterizes how much the output of the function can change with respect to changes in its input.

In mathematical terms, a function $f$ is Lipschitz continuous with a Lipschitz constant L or more simply L-Lipschitz if for any given pair of points $x_1,x_2$, $L$ provides a bound on the rate of change of $f$:

\[||f(x_1)-f(x_2)||\leq L||x_1-x_2||.\]

For instance, given a 1-Lipschitz dense layer (a.k.a fully connected layer) with a weight matrix $W$ and a bias vector $b$, we have for any two inputs $x_1$ and $x_2$: $\(||(W.x_1+b)-(W.x_2+b)|| \leq 1||x_1-x_2||.$\)

💡 The norm we refer to throughout our notebooks is the Euclidean norm (L2). This is because deel-lip operates with this norm. You will find more information about the role of the norm in the context of adversarially robust 1-Lipschitz deep learning models in the notebook titled 'Getting Started 2'.

A simple requirement for creating 1-Lipschitz neural network¶

The composition property of Lipschitz continuous functions states that if you have a function f that is $L_1$-Lipschitz and another function g that is $L_2$-Lispchitz, then their composition function h = (f o g) which applies f after g is also Lipschitz continuous with a Lipschitz constant $L \leq L_1$ * $L_2$.

A feed-forward or sequential neural network is essentially a stack of layers, where each layer transforms the output of the previous layer(s) and feeds its output to the next ones.

By the composition property of Lipschitz functions, it suffices for each of the n individual layers of a neural network model to be 1-Lipschitz, for the whole model to be 1-Lipschitz.

For instance, given a 1-Lipschitz dense layer parametrized by $(W,b)$, and a ReLU (Rectified Linear Unit) activation layer which is naturally 1-Lipschitz, the combination of the two is also 1-Lispchitz.
This is shown in the equations below, where we have for any two inputs $x_1$ and $x_2$:

\[||(W.x_1+b)-(W.x_2+b)||\leq 1||x_1-x_2||,$$ $$||ReLU(x_1)-ReLU(x_2)||\leq 1||x_1-x_2||,$$ and: $$||ReLU(W.x_1+b)-ReLU(W.x_2+b)||\leq 1||(W.x_1+b)-(W.x_2+b)||\leq 1^2||x_1-x_2||.\]

The deel-lip package allows to create 1-Lipschitz neural networks, by providing the user with means to enforce the Lipschitz constant at one on a selected set of layers (such as dense layers). It also ensures that 1-Lipschitz continuity is retained during training.

🧱 Creating a 1-Lipschitz neural network with `deel-lip` and `keras` ¶

keras is an open-source high-level deep learning API written in Python. It allows to build, train, and deploy deep learning models.

One can produce a neural network architecture using keras with a few lines of code, as shown in the toy-example multi-layer perceptron (MLP) below:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model

input_shape = (28, 28, 1)
num_classes=10

# a basic model that does not follow any Lipschitz constraint
model = keras.Sequential([
        layers.Input(shape=input_shape),
        layers.Flatten(),
        layers.Dense(64),
        layers.Activation('relu'),
        layers.Dense(32),
        layers.Activation('relu'),
        layers.Dense(num_classes)
    ])


model.compile(optimizer='adam',
          loss=keras.losses.CategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         

 dense (Dense)               (None, 64)                50240     

 activation (Activation)     (None, 64)                0         

 dense_1 (Dense)             (None, 32)                2080      

 activation_1 (Activation)   (None, 32)                0         

 dense_2 (Dense)             (None, 10)                330       

=================================================================
Total params: 52650 (205.66 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Alternatively, it is equivalent to write:

inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = layers.Dense(64)(x)
x = layers.Activation('relu')(x)
x = layers.Dense(32)(x)
x = layers.Activation('relu')(x)
y = layers.Dense(num_classes)(x)
model = Model(inputs=inputs, outputs=y)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         

 flatten_1 (Flatten)         (None, 784)               0         

 dense_3 (Dense)             (None, 64)                50240     

 activation_2 (Activation)   (None, 64)                0         

 dense_4 (Dense)             (None, 32)                2080      

 activation_3 (Activation)   (None, 32)                0         

 dense_5 (Dense)             (None, 10)                330       

=================================================================
Total params: 52650 (205.66 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

deel-lip extends keras' capabilities by introducing custom layers and model modules, to provide the ability to control the Lipschitz constant of layers objects or of complete neural networks, while keeping a user-friendly interface.

Below is a 1-Lipschitz replication of the previous MLP toy-example, using deel-lip:

import deel
from deel import lip

Lip_model = lip.model.Sequential([    
        keras.layers.Input(shape=input_shape),
        keras.layers.Flatten(),
        lip.layers.SpectralDense(64),
        lip.layers.GroupSort2(),
        lip.layers.SpectralDense(32),
        lip.layers.GroupSort2(),
        lip.layers.SpectralDense(num_classes)
    ],

)

Lip_model.compile(optimizer='adam',
          loss=keras.losses.CategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

Lip_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_2 (Flatten)         (None, 784)               0         

 spectral_dense (SpectralDe  (None, 64)                100481    
 nse)                                                            

 group_sort2 (GroupSort2)    (None, 64)                0         

 spectral_dense_1 (Spectral  (None, 32)                4161      
 Dense)                                                          

 group_sort2_1 (GroupSort2)  (None, 32)                0         

 spectral_dense_2 (Spectral  (None, 10)                661       
 Dense)                                                          

=================================================================
Total params: 105303 (411.34 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 52653 (205.68 KB)
_________________________________________________________________

C:\Users\kierszbaums\anaconda.related\envs\1_lipschitz\deel_lip\lib\site-packages\keras\src\initializers\initializers.py:120: UserWarning: The initializer Orthogonal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.
  warnings.warn(

Alternatively, it is equivalent to write:

inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = lip.layers.SpectralDense(64)(x)
x = lip.layers.GroupSort2()(x)
x = lip.layers.SpectralDense(32)(x)
x = lip.layers.GroupSort2()(x)
y = lip.layers.SpectralDense(num_classes)(x)
Lip_model = lip.model.Model(inputs=inputs, outputs=y)
Lip_model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_4 (InputLayer)        [(None, 28, 28, 1)]       0         

 flatten_3 (Flatten)         (None, 784)               0         

 spectral_dense_3 (Spectral  (None, 64)                100481    
 Dense)                                                          

 group_sort2_2 (GroupSort2)  (None, 64)                0         

 spectral_dense_4 (Spectral  (None, 32)                4161      
 Dense)                                                          

 group_sort2_3 (GroupSort2)  (None, 32)                0         

 spectral_dense_5 (Spectral  (None, 10)                661       
 Dense)                                                          

=================================================================
Total params: 105303 (411.34 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 52653 (205.68 KB)
_________________________________________________________________

💡 Keep in mind that all the classes above inherit from their respective keras equivalent (e.g. Dense for SpectralDense).
As a result, these objects conveniently use the same interface and the same parameters as their keras equivalent.

🔨 Design rules for 1-Lipschitz neural networks with `deel-lip` ¶

Layer selection: deel-lip vs keras

In our 1-Lipschitz MLP examples above, we have used a mixture of objects from both keras and deel-lip layers submodule (e.g. the Input layer for keras, the SpectralDense layer for deel-lip).

More generally, for the particular types of layers that do not interfere with the Lipschitz property of any neural network they belong to, no alternative has been coded in deel-lip and the existing keras layer object can be used.

This is the case for the following keras layers: MaxPooling, GlobalMaxPooling, Flatten and Input.

Below is the full list of keras layers for which deel-lip provides a Lipschitz equivalent. If one wants to ensure a model's Lipschitz continuity, the alternative deel-lip layers must be employed instead of the original keras counterparts.

tensorflow.keras.layers	deel.lip.layers
`Dense`	`SpectralDense`
`Conv2D`	`SpectralConv2D`
`AveragePooling2D` `GlobalAveragePooling2D`	`ScaledAveragePooling2D` `ScaledGlobalAveragePooling2D`

💡 Although there are additional Lipschitz continuous layers available in deel-lip, the ones mentioned above are perfectly suitable and recommended for practical use. Interested readers can find information about the other layers here.

🚨 Note: When creating a 1-Lipschitz neural network, one should avoid using the following layers:
- Dropout: Our current recommendation is to avoid using it, since it can induce a modification of the Lipschitz constant of the model. - BatchNormalization: It is not 1-Lipschitz

Activation function selection:

The ReLU activation function is Lipschitz continuous with a Lipschtiz constant of 1.

However, the 'GroupSort2' activation function provided in the layers submodule of deel-lip has additional properties that can enhance the adversarial robustness of 1-Lipschitz neural networks.

💡 Interested readers can find information relevant to other 1-Lipschitz activation functions that exist within deel-lip here.

Loss function selection:

One can use keras loss functions to train 1-Lipschitz neural networks. Doing so will not interfere with the 1-Lipschitz continuity of the model.

💡 deel-lip also has a losses submodule that contains several loss functions. They have been developed to enhance the adversarial robustness of the learnt 1-Lipschitz models.

🎉 Congratulations¶

You now know how to create 1-Lipschitz neural networks!

In the next tutorial, we will see how to train and assess adversarially robust 1-Lipschitz neural networks on the classification task, using deel-lip's losses submodule.