👋 Getting started 1: Creating a 1-Lipschitz neural network¶
The goal of this series of tutorials is to show the different usages of deel-lip
.
In this first notebook, our objective is to show how to create 1-Lipschitz neural networks with deel-lip
.
In particular, we will cover the following:
1. 📚 Theoretical background
A brief theoretical background on Lipschitz continuous functions. This section can be safely skipped if one is not interested in the theory.
2. 🧱 Creating a 1-Lipschitz neural network with deel-lip
and keras
An example of how to create a 1-Lipschitz neural network with deel-lip
and keras
.
3. 🔨 Design rules for 1-Lipschitz neural networks with deel-lip
A set of neural network design rules that one must respect in order to enforce the 1-Lipschitz constraint.
📚 Theoretical background ¶
What is the Lipschitz constant¶
The deel-lip
package allows to control the Lipschitz constant of a layer or of a whole neural network. The Lipschitz constant is a mathematical property of a function (in our context of work, a layer or a model) that characterizes how much the output of the function can change with respect to changes in its input.
In mathematical terms, a function \(f\) is Lipschitz continuous with a Lipschitz constant L or more simply L-Lipschitz if for any given pair of points \(x_1,x_2\), \(L\) provides a bound on the rate of change of \(f\):
For instance, given a 1-Lipschitz dense layer (a.k.a fully connected layer) with a weight matrix \(W\) and a bias vector \(b\), we have for any two inputs \(x_1\) and \(x_2\): \(\(||(W.x_1+b)-(W.x_2+b)|| \leq 1||x_1-x_2||.\)\)
💡 The norm we refer to throughout our notebooks is the Euclidean norm (L2). This is because deel-lip
operates with this norm. You will find more information about the role of the norm in the context of adversarially robust 1-Lipschitz deep learning models in the notebook titled 'Getting Started 2'.
A simple requirement for creating 1-Lipschitz neural network¶
The composition property of Lipschitz continuous functions states that if you have a function f that is \(L_1\)-Lipschitz and another function g that is \(L_2\)-Lispchitz, then their composition function h = (f o g) which applies f after g is also Lipschitz continuous with a Lipschitz constant \(L \leq L_1\) * \(L_2\).
A feed-forward or sequential neural network is essentially a stack of layers, where each layer transforms the output of the previous layer(s) and feeds its output to the next ones.
By the composition property of Lipschitz functions, it suffices for each of the n individual layers of a neural network model to be 1-Lipschitz, for the whole model to be 1-Lipschitz.
For instance, given a 1-Lipschitz dense layer parametrized by \((W,b)\), and a ReLU (Rectified Linear Unit) activation layer which is naturally 1-Lipschitz, the combination of the two is also 1-Lispchitz.
This is shown in the equations below, where we have for any two inputs \(x_1\) and \(x_2\):
The deel-lip
package allows to create 1-Lipschitz neural networks, by providing the user with means to enforce the Lipschitz constant at one on a selected set of layers (such as dense layers).
It also ensures that 1-Lipschitz continuity is retained during training.
🧱 Creating a 1-Lipschitz neural network with deel-lip
and keras
¶
keras
is an open-source high-level deep learning API written in Python. It allows to build, train, and deploy deep learning models.
One can produce a neural network architecture using keras with a few lines of code, as shown in the toy-example multi-layer perceptron (MLP) below:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model
input_shape = (28, 28, 1)
num_classes=10
# a basic model that does not follow any Lipschitz constraint
model = keras.Sequential([
layers.Input(shape=input_shape),
layers.Flatten(),
layers.Dense(64),
layers.Activation('relu'),
layers.Dense(32),
layers.Activation('relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
Alternatively, it is equivalent to write:
inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = layers.Dense(64)(x)
x = layers.Activation('relu')(x)
x = layers.Dense(32)(x)
x = layers.Activation('relu')(x)
y = layers.Dense(num_classes)(x)
model = Model(inputs=inputs, outputs=y)
model.summary()
deel-lip
extends keras
' capabilities by introducing custom layers
and model
modules, to provide the ability to control the Lipschitz constant of layers objects or of complete neural networks, while keeping a user-friendly interface.
Below is a 1-Lipschitz replication of the previous MLP toy-example, using deel-lip
:
import deel
from deel import lip
Lip_model = lip.model.Sequential([
keras.layers.Input(shape=input_shape),
keras.layers.Flatten(),
lip.layers.SpectralDense(64),
lip.layers.GroupSort2(),
lip.layers.SpectralDense(32),
lip.layers.GroupSort2(),
lip.layers.SpectralDense(num_classes)
],
)
Lip_model.compile(optimizer='adam',
loss=keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Lip_model.summary()
Alternatively, it is equivalent to write:
inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = lip.layers.SpectralDense(64)(x)
x = lip.layers.GroupSort2()(x)
x = lip.layers.SpectralDense(32)(x)
x = lip.layers.GroupSort2()(x)
y = lip.layers.SpectralDense(num_classes)(x)
Lip_model = lip.model.Model(inputs=inputs, outputs=y)
Lip_model.summary()
💡
Keep in mind that all the classes above inherit from their respective keras
equivalent (e.g. Dense
for SpectralDense
).
As a result, these objects conveniently use the same interface and the same parameters as their keras equivalent.
🔨 Design rules for 1-Lipschitz neural networks with deel-lip
¶
Layer selection: deel-lip
vs keras
In our 1-Lipschitz MLP examples above, we have used a mixture of objects from both keras
and deel-lip
layers
submodule (e.g. the Input
layer for keras
, the SpectralDense
layer for deel-lip
).
More generally, for the particular types of layers that do not interfere with the Lipschitz property of any neural network they belong to, no alternative has been coded in deel-lip
and the existing keras
layer object can be used.
This is the case for the following keras layers: MaxPooling
, GlobalMaxPooling
, Flatten
and Input
.
Below is the full list of keras
layers for which deel-lip
provides a Lipschitz equivalent. If one wants to ensure a model's Lipschitz continuity, the alternative deel-lip
layers must be employed instead of the original keras
counterparts.
tensorflow.keras.layers | deel.lip.layers |
---|---|
Dense |
SpectralDense |
Conv2D |
SpectralConv2D |
AveragePooling2D GlobalAveragePooling2D |
ScaledAveragePooling2D ScaledGlobalAveragePooling2D |
💡 Although there are additional Lipschitz continuous layers available in deel-lip
, the ones mentioned above are perfectly suitable and recommended for practical use. Interested readers can find information about the other layers here.
🚨 Note: When creating a 1-Lipschitz neural network, one should avoid using the following layers:
- Dropout
: Our current recommendation is to avoid using it, since it can induce a modification of the Lipschitz constant of the model.
- BatchNormalization
: It is not 1-Lipschitz
Activation function selection:
The ReLU activation function is Lipschitz continuous with a Lipschtiz constant of 1.
However, the 'GroupSort2' activation function provided in the layers
submodule of deel-lip
has additional properties that can enhance the adversarial robustness of 1-Lipschitz neural networks.
💡 Interested readers can find information relevant to other 1-Lipschitz activation functions that exist within deel-lip
here.
Loss function selection:
One can use keras
loss functions to train 1-Lipschitz neural networks. Doing so will not interfere with the 1-Lipschitz continuity of the model.
💡 deel-lip
also has a losses
submodule that contains several loss functions. They have been developed to enhance the adversarial robustness of the learnt 1-Lipschitz models.
🎉 Congratulations¶
You now know how to create 1-Lipschitz neural networks!
In the next tutorial, we will see how to train and assess adversarially robust 1-Lipschitz neural networks on the classification task, using deel-lip
's losses
submodule.