The Hidden Geometry of Particle Collisions

We unify many concepts in collider physics, including infrared and collinear safety, observables, jet finding, pileup mitigation and more, using a geometric language based on the Energy Mover’s Distance. Along the way, we develop new techniques grounded in this geometry, including extensions of observables, new jet-finding algorithms, novel pileup mitigation based on Apollonius diagrams, and a concrete notion of “theory space.”

OmniFold: A Method to Simultaneously Unfold All Observables

We develop OmniFold, an ML-based unfolding technique that can incorporate full-phase-space information, works without binning, and can avoid choosing specific observables.

Cutting Multiparticle Correlators Down to Size

We show that a broad class of mathematical objects, multiparticle correlators, can be manipulated by “cutting” the vertices and edges of their graphical representation, leading to many identities, computational speedups, and surprising connections to string theory.

Exploring the Space of Jets with CMS Open Data

We explore the CMS 2011A Jet Primary Dataset using standard jet substructure observables as well as the Energy Mover’s Distance. Our reprocessed datasets and analysis code are made public to facilitate future Open Data studies.

The Machine Learning Landscape of Top Taggers

A community report on a variety of ML top taggers to which we contributed a PFN, EFN, and EFP model.

The Metric Space of Collider Events

We develop a metric, the Energy Mover’s Distance (EMD), on the space of events that, intuitively, is the amount of “work” required to rearrange one event into another. Many techniques that require a pairwise distance between objects can now be applied to collider events, including quantifying event distortion, classification based on density estimation, and studying the space of events itself.

Energy Flow Networks: Deep Sets for Particle Jets

We adapt and specialize the Deep Sets neural network architecture for use with collider events, since the particles in an event naturally form a variable length, unordered set of objects. Our resulting Energy Flow Networks (EFNs) and Particle Flow Networks (PFNs) are incredibly powerful and simple architectures for use in collider physics.

An operational definition of quark and gluon jets

We develop a precise, practical, hadron-level definition of quark and gluon jets based on topic modeling of two mixed samples of jets. This allows for data-driven extractions of separate quark- and gluon-jet cross sections, among other things.

Learning to classify from impure samples with high-dimensional data

We study two methods of weakly supervised training in the context of jet classification, extending them to deep neural network architectures. We find that the Classification Without Labels (CWoLa) paradigm outperforms Learning from Label Proportions (LLP).

Energy Flow Polynomials: A complete linear basis for jet substructure

We develop the Energy Flow Polynomials (EFPs), a set of IRC-safe observables that form an (over)complete basis for any IRC-safe observable. This supports the sufficiency of linear methods for tasks such as classifying different jets, and indeed we find that a linear classifier using EFPs performs surprisingly well on a variety of jet discrimination tasks.

Pileup Mitigation with Machine Learning (PUMML)

We develop the PUMML framework for mitigating the contamination from extra protons colliding at the LHC using machine learning. We demonstrate that a convolutional neural network can clean up such contamination at least as well as existing methods, with improvements in robustness across a wide variety of pileup levels.

Deep learning in color: Towards automated quark/gluon jet discrimination

We show for the first time that deep learning is quite successful at discriminating between quark and gluon jets. We use a convolutional neural network trained on jet images and observable large improvements in classification efficiency, as well as rough insensitivity to the mismodeling of quark and gluon jets by Monte Carlo simulations.