09:20- 09:30 Intro
09:30- 10:00 What could a theory of deep learning look like? Olivier Bousquet, Google
10:00- 10:30 Expressiveness of ConvNets as a function of Network Architecture Amnon Shashua, Hebrew University of Jerusalem
10:30- 11:00 Expressivity of Deep Neural Networks Gitta Kutyniok, TU Berlin
11:00- 11:30 Coffee Break
11:30- 12:00 Explaining Neural Network Predictions with Deep Taylor Decompositions Gregoire Montavon, TU Berlin
12:00- 12:30 Learning in the presence of label noise Raja Giryes, Tel Aviv University
12:30- 13:00 TBA Luigi Malago, RIST
13:00- 18:00 Lunch
18:00- 18:30 Why can Deep Networks avoid the curse of dimensionality and two other theoretical puzzles Tomaso Poggio, MIT
18:45- 19:15 Discussion
What could a theory of deep learning look like?
While there is a wide consensus that 'we don't understand Deep Learning' and in particular, we don't understand generalization of Deep Networks, there is not clear consensus of what 'understanding' would even mean in this context. We thus propose to review what we 'understand' about other machine learning algorithms to see if any of that understanding can be transferred to Deep Learning. Also, we will try and frame some of the questions that remained unanswered in as precise a way as possible, so as to contribute to guiding the development of a theory of Deep Learning.
Expressiveness of ConvNets as a function of Network Architecture
I will go through a series of recent works performed at my lab focusing on the effect of depth, connectivity, pooling geometry, number of channels per layer, size of kernels - in short architectural decisions of a ConvNet design - affects expressivity and inductive bias in a super-linear (mostly exponential) manner.
Expressivity of Deep Neural Networks
In this talk, we will be concerned with the general question, how well a function can be approximated by a structured neural network such as with sparse connectivity. We will in particular present results on the minimal possible sparse connectivity for a prescribed approximation accuracy, which lead to a construction of memory-optimal neural networks. Moreover, we will discuss a comprehensive approximation theory-driven approach to understand the impact of the depth of a neural network.
Explaining Neural Network Predictions with Deep Taylor Decompositions
Deep neural networks (DNNs) have been highly successful at extracting complex nonlinear predictive models from large amounts of data. In practical applications, it is important to verify that the high predictive accuracy is reflective of a proper problem representation and not on the exploitation of artefacts in the data. Methods for visualizing and interpreting how the model predicts have therefore received growing attention. In this talk, a recently proposed framework for explaining machine learning predictions, deep Taylor decomposition (DTD), will be presented. The framework applies robustly and systematically to complex DNNs. It can be understood as performing a Taylor decomposition at each neuron of the DNN, and recomposing the result of each analysis. In practice, the method is implemented as a backward propagation pass in the DNN. Emphasis will be placed on describing peculiarities of the explanation problem, and how the DTD framework relates to them, in particular: (1) global vs. local explanations, (2) explaining function value vs.~function variation, (3) the gradient shattering effect in DNNs, and (4) the need to combine the prediction function and the input domain to produce meaningful explanations.
Learning in the presence of label noise
Training deep neural networks require a large amount of labeled data. As these labels are generated by human annotators, it may happen that some of the labels are wrongly annotated. Therefore, an important question is how robust are neural networks in training with label noise. In this work, we study the attributes of deep neural networks that make them robust to label noise.
Why can Deep Networks avoid the curse of dimensionality and two other theoretical puzzles
A mathematical theory of deep networks and of why they work as well as they do is now emerging. I will review recent theoretical results on the approximation power of deep networks including conditions under which they can be exponentially better than shallow networks. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. I will also discuss two other puzzles associated with deep networks: * the unreasonable easiness of optimization of the training error; * the strange lack of overfitting despite large overparametrization in the absence of explicit regularization.