Ax: A Platform for Adaptive Experimentation

Mon, 01 Sep 2025 00:00:00 +0000

Ax

Thu, 01 Aug 2024 00:00:00 +0000

is an open-source platform for developed by Meta’s Adaptive Experimentation team. It provides a unified interface for , , multi-objective, and constrained optimization, built on top of .

I contribute to Ax as part of my work at Meta, with a particular focus on sample-efficient methods for , capacity management, and scaling-law-based modeling. Co-first author on (AutoML 2025).

Batch Bayesian Optimisation via Density-ratio Estimation with Guarantees

Thu, 01 Dec 2022 00:00:00 +0000

📄 One paper accepted to NeurIPS 2022

Sat, 15 Oct 2022 18:36:59 +0000

Our paper was accepted to NeurIPS 2022. Led by Rafael Oliveira, this is the batch extension of with theoretical convergence guarantees for parallel Bayesian optimization. Joint work with Rafael Oliveira, Edwin Bonilla, and Fabio Ramos.

Long Talk: BORE — Bayesian Optimization by Density-Ratio Estimation

Wed, 21 Jul 2021 14:00:00 +0000

BORE

Thu, 01 Jul 2021 00:00:00 +0000

is the reference implementation of (Tiao et al., ICML 2021). It recasts the acquisition function in as a probabilistic classification problem via , sidestepping the analytical-tractability constraints of conventional surrogate-based methods.

Developed with Aaron Klein.

Invited Talk: BORE — Bayesian Optimization by Density-Ratio Estimation

Wed, 12 May 2021 16:00:00 +0000

📄 One paper accepted to ICML 2021

Sat, 08 May 2021 00:00:00 +0000

Our paper was accepted to ICML 2021 as a Long Talk (awarded to the top 3% of submissions). This is joint work with Aaron Klein, Cédric Archambeau, Edwin Bonilla, Matthias Seeger, and Fabio Ramos — much of it carried out during my AWS Berlin internship.

BORE: Bayesian Optimization by Density-Ratio Estimation

Sat, 08 May 2021 00:00:00 +0000

Bayesian Optimization (BO) by Density-Ratio Estimation (DRE), or BORE, is a simple, yet effective framework for the optimization of blackbox functions. BORE is built upon the correspondence between expected improvement (EI)—arguably the predominant acquisition functions used in BO—and the density-ratio between two unknown distributions.

One of the far-reaching consequences of this correspondence is that we can reduce the computation of EI to a probabilistic classification problem—a problem we are well-equipped to tackle, as evidenced by the broad range of streamlined, easy-to-use and, perhaps most importantly, battle-tested tools and frameworks available at our disposal for applying a variety of approaches. Notable among these are / and / for Deep Learning, for Gradient Tree Boosting, not to mention for just about everything else. The BORE framework lets us take direct advantage of these tools.

Code Example

We provide an simple example with Keras to give you a taste of how BORE can be implemented using a feed-forward neural network (NN) classifier. A useful class that the package provides is , a subclass of from Keras that inherits all of its existing functionalities, and provides just one additional method. We can build and compile a feed-forward NN classifier as usual:

from bore.models import MaximizableSequential
from tensorflow.keras.layers import Dense

# build model
classifier = MaximizableSequential()
classifier.add(Dense(16, activation="relu"))
classifier.add(Dense(16, activation="relu"))
classifier.add(Dense(1, activation="sigmoid"))

# compile model
classifier.compile(optimizer="adam", loss="binary_crossentropy")

See from the if this seems unfamiliar to you.

The additional method provided is argmax, which returns the maximizer of the network, i.e. the input $\mathbf{x}$ that maximizes the final output of the network:

x_argmax = classifier.argmax(bounds=bounds, method="L-BFGS-B", num_start_points=3)

Since the network is differentiable end-to-end wrt to input $\mathbf{x}$, this method can be implemented efficiently using a multi-started quasi-Newton hill-climber such as . We will see the pivotal role this method plays in the next section.

Using this classifier, the BO loop in BORE looks as follows:

import numpy as np

features = []
targets = []

# initialize design
features.extend(features_initial_design)
targets.extend(targets_initial_design)

for i in range(num_iterations):

 # construct classification problem
 X = np.vstack(features)
 y = np.hstack(targets)

 tau = np.quantile(y, q=0.25)
 z = np.less(y, tau)

 # update classifier
 classifier.fit(X, z, epochs=200, batch_size=64)

 # suggest new candidate
 x_next = classifier.argmax(bounds=bounds, method="L-BFGS-B", num_start_points=3)

 # evaluate blackbox
 y_next = blackbox.evaluate(x_next)

 # update dataset
 features.append(x_next)
 targets.append(y_next)

Let’s break this down a bit:

At the start of the loop, we construct the classification problem—by labeling instances $\mathbf{x}$ whose corresponding target value $y$ is in the top q=0.25 quantile of all target values as positive, and the rest as negative.
Next, we train the classifier to discriminate between these instances. This classifier should converge towards
$$ \pi^{*}(\mathbf{x}) = \frac{\gamma \ell(\mathbf{x})}{\gamma \ell(\mathbf{x}) + (1-\gamma) g(\mathbf{x})}, $$
where $\ell(\mathbf{x})$ and $g(\mathbf{x})$ are the unknown distributions of instances belonging to the positive and negative classes, respectively, and $\gamma$ is the class balance-rate and, by construction, simply the quantile we specified (i.e. $\gamma=0.25$).
Once the classifier is a decent approximation to $\pi^{*}(\mathbf{x})$, we propose the maximizer of this classifier as the next input to evaluate. In other words, we are now using the classifier itself as the acquisition function.

How is it justifiable to use this in lieu of EI, or some other acquisition function we’re used to? And what is so special about $\pi^{*}(\mathbf{x})$?

Well, as it turns out, $\pi^{*}(\mathbf{x})$ is equivalent to EI, up to some constant factors.

The remainder of the loop should now be self-explanatory. Namely, we
evaluate the blackbox function at the suggested point, and
update the dataset.

Step-by-step Illustration

Here is a step-by-step animation of six iterations of this loop in action, using the Forrester synthetic function as an example. The noise-free function is shown as the solid gray curve in the main pane. This procedure is warm-started with four random initial designs.

The right pane shows the empirical CDF (ECDF) of the observed $y$ values. The vertical dashed black line in this pane is located at $\Phi(y) = \gamma$, where $\gamma = 0.25$. The horizontal dashed black line is located at $\tau$, the value of $y$ such that $\Phi(y) = 0.25$, i.e. $\tau = \Phi^{-1}(0.25)$.

The instances below this horizontal line are assigned binary label $z=1$, while those above are assigned $z=0$. This is visualized in the bottom pane, alongside the probabilistic classifier $\pi_{\boldsymbol{\theta}}(\mathbf{x})$ represented by the solid gray curve, which is trained to discriminate between these instances.

Finally, the maximizer of the classifier is represented by the vertical solid green line. This is the location at which the BO procedure suggests be evaluated next.

We see that the procedure converges toward to global minimum of the blackbox function after half a dozen iterations.

To understand how and why this works in more detail, please read our paper! If you only have 15 minutes to spare, please watch the video recording of our talk!

Video

Simulation-based Scoring for Model-based Asynchronous Hyperparameter and Neural Architecture Search

Sat, 01 May 2021 00:00:00 +0000

Contributed Talk: BORE — Bayesian Optimization by Density-Ratio Estimation

Fri, 11 Dec 2020 15:00:00 +0000

Bayesian Optimization by Density Ratio Estimation

Tue, 01 Dec 2020 00:00:00 +0000

Model-based Asynchronous Hyperparameter and Neural Architecture Search

Sun, 01 Mar 2020 00:00:00 +0000

AutoGluon

Sun, 01 Sep 2019 00:00:00 +0000

is an open-source toolkit from AWS that automates ML for tabular, image, and text data. During my AWS Berlin internship I was a core developer of the -based searcher module — described in and later forming the basis of .

Hyperparameter Optimization |