Hi. My name is Louis. I am a budding machine learning researcher and PhD candidate at the University of Sydney, working with Edwin Bonilla and Fabio Ramos. My main research interests lie at the intersection of Bayesian deep learning, approximate inference, and probabilistic models with intractable likelihoods.
Until 2017, I was a software engineer at NICTA (now incorporated under CSIRO as Data61) in the inference systems group, working on scalable Bayesian machine learning. I now work at Data61 on a part-time basis when I am not teaching.
Prior to that, I studied computer science at the University of New South Wales, where I had a major emphasis on algorithm design and analysis, theoretical computer science, programming language theory, artificial intelligence, machine learning, and a minor emphasis on mathematics and statistics. I undertook my final-year thesis under Aleksandar Ignjatovic, and graduated with first-class honours in 2015.
This post demonstrates how to approximate the KL divergence (in fact, any f-divergence) between implicit distributions, using density ratio estimation by probabilistic classification.
We illustrate how to build complicated probability distributions in a modular fashion using the Bijector API from TensorFlow Probability.
An in-depth practical guide to variational encoders from a probabilistic perspective.
The meshgrid function is useful for creating coordinate arrays to vectorize function evaluations over a grid. Experienced NumPy users will have noticed some discrepancy between meshgrid and the mgrid, a function that is used just as often, for exactly the same purpose.
We introduce a model-based asynchronous multi-fidelity hyperparameter optimization (HPO) method, combining strengths of asynchronous Hyperband and Gaussian process-based Bayesian optimization. Our method obtains substantial speed-ups in wall-clock time over, both, synchronous and asynchronous Hyperband, as well as a prior model-based extension of the former. Candidate hyperparameters to evaluate are selected by a novel jointly dependent Gaussian process-based surrogate model over all resource levels, allowing evaluations at one level to be informed by evaluations gathered at all others. We benchmark several covariance functions and conduct extensive experiments on hyperparameter tuning for multi-layer perceptrons on tabular data, convolutional networks on image classification, and recurrent networks on language modelling, demonstrating the benefits of our approach.
We propose a framework that lifts the capabilities of graph convolutional networks (GCNs) to scenarios where no input graph is given and increases their robustness to adversarial attacks. We formulate a joint probabilistic model that considers a prior distribution over graphs along with a GCN-based likelihood and develop a stochastic variational inference algorithm to estimate the graph posterior and the GCN parameters jointly. To address the problem of propagating gradients through latent variables drawn from discrete distributions, we use their continuous relaxations known as Concrete distributions. We show that, on real datasets, our approach can outperform state-of-the-art Bayesian and non-Bayesian graph neural network algorithms on the task of semi-supervised classification in the absence of graph data and when the network structure is subjected to adversarial perturbations.
The course has a primary focus on probabilistic machine learning methods, covering the topics of exact and approximate inference in directed and undirected probabilistic graphical models - continuous latent variable models, structured prediction models, and non-parametric models based on Gaussian processes.
This course has a major emphasis on maintaining a good balance between theory and practice. As the teaching assistant (TA) for this course, my primary responsibility was to create lab exercises that aid students in gaining hands-on experience with these methods, specifically applying them to real-world data using the most current tools and libraries. The labs were Python-based, and relied heavily on the Python scientific computing and data analysis stack ( NumPy, SciPy, Matplotlib, Seaborn, Pandas, IPython/Jupyter notebooks), and the popular machine learning libraries scikit-learn and TensorFlow.
Students were given the chance to experiment with a broad range of methods on various problems, such as Markov chain Monte Carlo (MCMC) for Bayesian logistic regression, probabilistic PCA (PPCA), factor analysis (FA) and independent component analysis (ICA) for dimensionality reduction, hidden Markov models (HMMs) for speech recognition, conditional random fields (CRFs) for named-entity recognition, and Gaussian processes (GPs) for regression and classification.