Rethinking Lipschitz Neural Networks

For a neural network $\mathbf{f}$ with Lipschitz constant $L$ under $\ell_P$ norm $||\cdot||_p$, define the resulting classifier $g$ as $g(x) := \text{argmax}_kf_k(x)$ for an input $x$. Then $g$ is provably robust under perturbations $||\mathbf{\delta}||_p < \frac{c}{L}\text{margin}(\mathbf{f}(x))$, i.e.,

$$g(x+\delta)=g(x) \ \ \ \forall \delta \ s.t. \ ||\delta||_p < \frac{c}{L}\cdot \text{margin}(\mathbf{f}(x))$$

Here $c = \frac{\sqrt[p]{2}}{2}$ and $\text{margin}(\mathbf{f}(x))$ is the margin between the largest and second largest output logits.