Hi. My name is Louis. I am a machine learning researcher and PhD candidate at the University of Sydney. My main research interests lie at the intersection of Bayesian deep learning, approximate inference, and probabilistic models with intractable likelihoods.

Previously, I was a research software engineer at NICTA (now incorporated under CSIRO as Data61) in the inference systems engineering group, working on scalable probabilistic machine learning. Prior to that, I studied computer science at the University of New South Wales, where I had a major emphasis on algorithm design and analysis, theoretical computer science, programming language theory, artificial intelligence, machine learning, and a minor emphasis on mathematics and statistics.

Ph.D. in Computer Science

University of Sydney

B.Sc. (Honours 1st Class) in Computer Science

University of New South Wales

Bayesian optimization (BO) is among the most effective and widely-used blackbox optimization methods. BO proposes solutions according to an explore-exploit trade-off criterion encoded in an acquisition function, many of which are derived from the posterior predictive of a probabilistic surrogate model. Prevalent among these is the expected improvement (EI). Naturally, the need to ensure analytical tractability in the model poses limitations that can ultimately hinder the efficiency and applicability of BO. In this paper, we cast the computation of EI as a binary classification problem, building on the well-known link between class probability estimation (CPE) and density-ratio estimation (DRE), and the lesser-known link between density-ratios and EI. By circumventing the tractability constraints imposed on the model, this reformulation provides several natural advantages, not least in scalability, increased flexibility, and greater representational capacity.

We propose a framework that lifts the capabilities of graph convolutional networks (GCNs) to scenarios where no input graph is given and increases their robustness to adversarial attacks. We formulate a joint probabilistic model that considers a prior distribution over graphs along with a GCN-based likelihood and develop a stochastic variational inference algorithm to estimate the graph posterior and the GCN parameters jointly. To address the problem of propagating gradients through latent variables drawn from discrete distributions, we use their continuous relaxations known as Concrete distributions. We show that, on real datasets, our approach can outperform state-of-the-art Bayesian and non-Bayesian graph neural network algorithms on the task of semi-supervised classification in the absence of graph data and when the network structure is subjected to adversarial perturbations.

We introduce a model-based asynchronous multi-fidelity method for hyperparameter and neural architecture search that combines the strengths of asynchronous Hyperband and Gaussian process-based Bayesian optimization. At the heart of our method is a probabilistic model that can simultaneously reason across hyperparameters and resource levels, and supports decision-making in the presence of pending evaluations. We demonstrate the effectiveness of our method on a wide range of challenging benchmarks, for tabular data, image classification and language modelling, and report substantial speed-ups over current state-of-the-art methods. Our new methods, along with asynchronous baselines, are implemented in a distributed framework which will be open sourced along with this publication.

Research Experience

Conducted research in the area of AutoML, in contribution to
the AWS SageMaker Automatic Model Tuning service.
I had the good fortune of working with Matthias Seeger and Cédric Archambeau, and together, we tackled the challenges of extending *multi-fidelity Bayesian optimization* with *asynchronous parallelism*.
The research developed during my internship culminated in a paper and the release of our code as part of the open-source AutoGluon library.

After NICTA was subsumed under CSIRO (Australia’s national science agency), I continued as a member of the Inference Systems Engineering team, working to apply probabilistic machine learning to a multitude of problem domains, including spatial inference and Bayesian experimental design, with an emphasis on scalability. In this time, I led the design and implementation of new microservices and contributed to the development of open-source libraries for Bayesian deep learning.

During this period, I also served a brief stint with the Graph Analytics Engineering
team (the team behind StellarGraph),
where I contributed to research into *graph representation learning* from a
probabilistic perspective. These efforts culminated in a research paper
that went on to be awarded a spotlight presentation at the field’s
premier conference.

As a software engineer with a specialization in machine learning,
I was a member of a team of machine learning researchers and engineers
engaged in an interdisciplinary collaboration with leading researchers from
multiple areas of the natural sciences, as part of the
Big Data Knowledge Discovery
initiative sponsored by the
Science Industry Endowment Fund (SIEF).
During this time I helped lead the development and release of numerous
open-source libraries for applying
Bayesian machine learning at scale.

I joined CSIRO’s Language and Social Computing team as a
Summer Vacation Scholar for the summer of 2013-14 and worked on applying
machine learning and natural language processing (NLP) techniques to develop
a text classification system for automated *sentiment analysis*.

AutoGluon is a library that implements numerous state-of-the-art methods for asynchronously distributed hyperparameter optimization (HPO) and neural architecture search (NAS). I was a core developer of the Gaussian process-based multi-fidelity searcher module.

Expanding the scope and applicability of variational inference to encompass implicit probabilistic models.

Aboleth is a minimalistic TensorFlow framework for scalable Bayesian deep learning and Gaussian process approximation.

Determinant is a software service that makes predictions from sparse data, and learns what data it needs to optimise its performance.

Revrand is a full-featured Python library for Bayesian generalized linear models, with random basis kernels for large-scale Gaussian process approximations.

The course has a primary focus on probabilistic machine learning methods, covering the topics of exact and approximate inference in directed and undirected probabilistic graphical models - continuous latent variable models, structured prediction models, and non-parametric models based on Gaussian processes.

This course has a major emphasis on maintaining a good balance between theory and practice. As the teaching assistant (TA) for this course, my primary responsibility was to create lab exercises that aid students in gaining hands-on experience with these methods, specifically applying them to real-world data using the most current tools and libraries. The labs were Python-based, and relied heavily on the Python scientific computing and data analysis stack ( NumPy, SciPy, Matplotlib, Seaborn, Pandas, IPython/Jupyter notebooks), and the popular machine learning libraries scikit-learn and TensorFlow.

Students were given the chance to experiment with a broad range of methods on various problems, such as Markov chain Monte Carlo (MCMC) for Bayesian logistic regression, probabilistic PCA (PPCA), factor analysis (FA) and independent component analysis (ICA) for dimensionality reduction, hidden Markov models (HMMs) for speech recognition, conditional random fields (CRFs) for named-entity recognition, and Gaussian processes (GPs) for regression and classification.

One weird trick to make exact inference in Bayesian logistic regression tractable.

A short illustrated reference guide to the Knowledge Gradient acquisition function with an implementation from scratch in TensorFlow Probability.

This series explores market data provided by official API from Binance, one of the world’s largest cryptocurrency exchanges, using Python. In this post we examine various useful ways to visualize the orderbook.

A summary of notation, identities and derivations for the sparse variational Gaussian process (SVGP) framework.

This post demonstrates how to approximate the KL divergence (in fact, any f-divergence) between implicit distributions, using density ratio estimation by probabilistic classification.

We illustrate how to build complicated probability distributions in a modular fashion using the Bijector API from TensorFlow Probability.

BORE: Bayesian Optimization by Density-Ratio Estimation.
In *NeurIPS2020* Meta-Learn. Accepted as **Contributed Talk** (Awarded to Best 3 Papers).

(2020).
Preprint PDF Code Poster Slides Video Supplementary material

Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings.
In *Advances in Neural Information Processing Systems 33 (NeurIPS2020)*. Accepted as **Spotlight Presentation** (Awarded to Top 3% of Papers).

(2020).
(2020).
Variational Graph Convolutional Networks.
In *NeurIPS2019* Graph Representation Learning. Accepted as **Outstanding Contribution Talk** (Awarded to Best 3 Papers).

(2019).
Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference.
In *ICML2018* Theoretical Foundations and Applications of Deep Generative Models. Accepted as **Contributed Talk**..

(2018).