**W**hile reading the Nature article Swarm Learning, by Warnat-Herresthal et [many] al., which goes beyond federated learning by removing the need for a central coordinator, [if resorting to naïve averaging of the neural network parameters] I came across this reporting summary on the statistics checks made by the authors. With a specific box on Bayesian analysis and MCMC implementation!

## Archive for neural network

## reproducibility check [Nature]

Posted in Statistics with tags Bayesian Analysis, federated learning, machine learning, MCMC convergence, Nature, neural network, prior selection, reproducibility, swarm learning on September 1, 2021 by xi'an## statistical analysis of GANs

Posted in Books, Statistics with tags Annals of Statistics, asymptotics, GANs, generative discriminative networks, image analysis, Kullback-Leibler divergence, machine learning, neural network, Université Paris-La Sorbonne on May 24, 2021 by xi'an**M**y friend Gérard Biau and his coauthors have published a paper in the Annals of Statistics last year on the theoretical [statistical] analysis of GANs, which I had missed and recently read with a definitive interest in the issues. (With no image example!)

If the discriminator is unrestricted the unique optimal solution is the Bayes posterior probability

when the model density is everywhere positive. And the optimal parameter θ corresponds to the closest model in terms of Kullback-Leibler divergence. The pseudo-true value of the parameter. This is however the ideal situation, while in practice D is restricted to a parametric family. In this case, if the family is wide enough to approximate the ideal discriminator in the sup norm, with error of order ε, and if the parameter space Θ is compact, the optimal parameter found under the restricted family approximates the pseudo-true value in the sense of the GAN loss, at the order ε². With a stronger assumption on the family ability to approximate any discriminator, the same property holds for the empirical version (and in expectation). (As an aside, the figure illustrating this property confusedly uses an histogramesque rectangle to indicate the expectation of the discriminator loss!) And both parameter (θ and α) estimators converge to the optimal ones with the sample size. An interesting foray from statisticians in a method whose statistical properties are rarely if ever investigated. Missing a comparison with alternative approaches, like MLE, though.

## simulation-based inference for neuroscience [One World ABC seminar]

Posted in Books, pictures, Statistics, University life with tags ABC, Approximate Bayesian computation, neural network, neurosciences, numerical simulator, One World ABC Seminar, University of Warwick, webinar on April 26, 2021 by xi'an**T**he next One World ABC seminar will take place on Thursday at 11:30, UK time, and will broadcast a talk by Jakob Macke on *Simulation-based inference for neuroscience*. Here is the abstract

Neuroscience research makes extensive use of mechanistic models of neural dynamics — these models are often implemented through numerical simulators, requiring the use of simulation-based approaches to statistical inference. I will talk about our recent work on developing simulation based inference-methods using flexible density estimators parameterised with neural networks, our efforts on benchmarking these approaches, and applications to modelling problems in neuroscience.

Remember you need to register beforehand to receive the access code!

## mathematical understanding of neural networks through mean-field analysis [PhD studenship]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags ANR, Auvergne, Clermont-Ferrand, mean field analysis, neural network, PhD fellowship on June 26, 2020 by xi'anArnaud Guillin and Manon Michel from the Université Clermont-Auvergne are currently looking for PhD candidates interested in the mathematical analysis of neural networks via the tool of mean-field analysis. With full funding available. Candidates can contact Arnaud Guillin at uca.fr.

## frontier of simulation-based inference

Posted in Books, Statistics, University life with tags ABC, Bayesian deep learning, classification, deep learning, GANs, kernel density estimator, National Academy of Science, neural network, neural networks and learning machines, PNAS, simulation-based inference, Statistics, summary statistics, Wasserstein distance on June 11, 2020 by xi'an

“This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, `The Science of Deep Learning,’ held March 13–14, 2019, at the National Academy of Sciences in Washington, DC.”

**A** paper by Kyle Cranmer, Johann Brehmer, and Gilles Louppe just appeared in PNAS on the frontier of simulation-based inference. Sounding more like a tribune than a research paper producing new input. Or at least like a review. Providing a quick introduction to simulators, inference, ABC. Stating the shortcomings of simulation-based inference as three-folded:

- costly, since required a large number of simulated samples
- loosing information through the use of insufficient summary statistics or poor non-parametric approximations of the sampling density.
- wasteful as requiring new computational efforts for new datasets, primarily for ABC as learning the likelihood function (as a function of both the parameter θ and the data x) is only done once.

And the difficulties increase with the dimension of the data. While the points made above are correct, I want to note that ideally ABC (and Bayesian inference as a whole) only depends on a single dimension observation, which is the likelihood value. Or more practically that it only depends on the distance from the observed data to the simulated data. (Possibly the Wasserstein distance between the cdfs.) And that, somewhat unrealistically, that ABC could store the reference table once for all. Point 3 can also be debated in that the effort of learning an approximation can only be amortized when exactly the same model is re-employed with new data, which is likely in industrial applications but less in scientific investigations, I would think. About point 2, the paper misses part of the ABC literature on selecting summary statistics, e.g., the culling afforded by random forests ABC, or the earlier use of the score function in Martin et al. (2019).

The paper then makes a case for using machine-, active-, and deep-learning advances to overcome those blocks. Recouping other recent publications and talks (like Dennis on One World ABC’minar!). Once again presenting machine-learning techniques such as normalizing flows as more efficient than traditional non-parametric estimators. Of which I remain unconvinced without deeper arguments [than the repeated mention of powerful machine-learning techniques] on the convergence rates of these estimators (rather than extolling the super-powers of neural nets).

“A classifier is trained using supervised learning to discriminate two sets of data, although in this case both sets come from the simulator and are generated for different parameter points θ⁰ and θ¹. The classifier output function can be converted into an approximation of the likelihood ratio between θ⁰ and θ¹ (…) learning the likelihood or posterior is an unsupervised learning problem, whereas estimating the likelihood ratio through a classifier is an example of supervised learning and often a simpler task.”

The above comment is highly connected to the approach set by Geyer in 1994 and expanded in Gutmann and Hyvärinen in 2012. Interestingly, at least from my narrow statistician viewpoint!, the discussion about using these different types of approximation to the likelihood and hence to the resulting Bayesian inference never engages into a quantification of the approximation or even broaches upon the potential for inconsistent inference unlocked by using fake likelihoods. While insisting on the information loss brought by using summary statistics.

“Can the outcome be trusted in the presence of imperfections such as limited sample size, insufficient network capacity, or inefficient optimization?”

Interestingly [the more because the paper is classified as statistics] the above shows that the statistical question is set instead in terms of numerical error(s). With proposals to address it ranging from (unrealistic) parametric bootstrap to some forms of GANs.