Dec 05, 2024 |
Information & Inference : A Journal of the IMA!
Global Convergence of SGD On Two Layer Neural Nets
This is one of our most interesting lines of work that has opened up a whole new mechanism for proving the convergence of noisy gradient based algorithms for neural nets. We have already had a TMLR paper in this theme - and more drafts in this topic are on the way.
The crux of this work is that we isolate neural net loss functions - at constant regularization, at arbitrary size and for any data - that are ``Villani functions’’ - hence their corresponding Gibbs’ meaasures satisfy the Poincare inequalities. Thus, we have a first-of-its-kind result, of finding neural loss setups which induce isoperimetric inequalities. Very critically, in this work the amount of regularization needed is independent of the number of gates and only depends on the bounds of the data.
Given the identification of isoperimetry in neural setups, various recent results in mathematics can be used to get the convergence of noisy gradient methods to the global minima of such losses. In this work, we go via the observations in this JMLR paper. Thus we have the first proof of convergent training of neural nets beyond the NTK regime
|
Dec 02, 2024 |
TMLR! (for the 3rd time in 2024!)
Towards Size-Independent Generalization Bounds for Deep Operator Nets
This work is the end result of a long project that was completed by my PhD student Dibyakanti Kumar. It was started with Pulkit Gopalani (PhD student in University of Michigan) and Sayar Karmakar (faculty in statistics at The University of Florida). The summary is this: use Huber loss to solve PDEs - it is likely to perform better and this hope has significant mathematical grounding. This work is a tour-de-force in Rademacher theory - for non-Lipschitz predictors. We showed that a very natural and relevant loss class of DeepONet have a size-independent generalization bound for well-tuned Huber losses. This arises out of showing that the underlying Rademacher complexity for the the DeepONets is itself width-independent and rather scales with a very complicated form of capacity measure. We believe that this work motivates a whole lot of future work about extending these results and testing its implications for the entire range of neural operators that are being thought about these days.
|
Nov 22, 2024 |
London Mathematical Society (LMS)
Got elected as a member of LMS.
|
Sep 16, 2024 |
Sebastien Andre-Sloan joins our group as the second PhD student.
He did a pretty large third-year undergraduate project with us before taking this leap :)
|
Aug 01, 2024 |
With Dr. Jakob Zech at the University of Heidelberg we secured a grant from the Manchester–Heidelberg Research Fund for our project proposal titled “Optimization Errors in Scientific Machine Learning”.
|
Jul 01, 2024 |
Our amazing co-author Dibyakanti Kumar joins our group as a PhD student.
He would be co-advised by Prof. Alex Frangi and myself.
Dibyakanti becomes the first student to begin their doctoral studies in our group!
|
Jun 11, 2024 |
IOP-MLST!
Investigating the Ability of PINNs to Solve Burgers’ PDE Near Finite-Time Blow Up
What is the relationship between an ML model’s error in approximating the PDE solution and the risk of its PINN loss function? Recall
that it’s only the latter that the codes try to minimize. This is in general quite unclear - and in this latest work with my amazing
collaborator Dibyakanti Kumar, we try to prove relationships between these two critical quantities, for non-viscous pressure-less fluids
(Burgers’ PDE in d-dimensions), while allowing for divergence of the flow.
This is an interesting edge because it allows for PDE solutions that blow-up in finite-time while starting from smooth initial
conditions. Our way of analyzing this population risk leads to indications for why penalizing for the gradients of the surrogate (the
net) has previously helped in such experiments.
|
May 27, 2024 |
Got the UniCS undergraduate project advising award!
|
Feb 25, 2024 |
TMLR!
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Here we give a first-of-its-kind proof of SGD convergence on finitely large neural nets - for logistic loss in the binary classification setting. This continues our investigation that neural loss functions can be “Villani functions” and that uncovering this almost magical mathematical property of neural loss functions can help prove convergence to global minima of gradient-based algorithms for it. This is work with Pulkit Gopalani (PhD student at UMichigan) and Samyak Jha (undergrad at IIT-Bombay).
|
Feb 01, 2024 |
TMLR!
Size Lowerbounds for Deep Operator Networks
This work is with Amartya Roy @ Bosch, India. As far as we know, this is among the rare few proofs of any kind of architectural constraint for training performance that has ever been derived for any neural architecture. And this is almost entirely data independent – hence, a “universal lower bound” and in particular, the lower bounds we derive do not depend on the Partial Differential Equation being targetted to be solved by a DeepONet. Also, this analysis leads to certain “scaling law” conjectures for DeepONets as we touch upon in the short experimental section.
|