Anomalies are rare objects or events that deviate significantly from the norm, often representing unknown astrophysical phenomena. Traditionally, new phenomena are detected by manual inspection of the data, something that is quickly becoming impossible. To address this, I developed, in collaboration with Bruce Bassett, Astronomaly: a general-purpose anomaly detection framework designed to optimally combine machine learning methods with minimal human labelling to make new discoveries (Lochner & Bassett, Astronomy & Computing, 2020). We also proved its scalability and applicability to large datasets in Etsebeth et al. (MNRAS, 2024), where we used it to analyse over 4 million galaxies (see my Student Papers page for more details).
This framework not only identifies potential anomalies but also learns from expert feedback, refining its performance over time. Astronomaly has already been used to make discoveries in several different types of datasets and the original paper has nearly 100 citations. Using Astronomaly, my team discovered SAURON, a Steep and Uneven Ring of Non-thermal radiation in MeerKAT data, marking a new class of radio galaxy (Lochner et al., MNRAS, 2023). The story of SAURON and its discovery captured the public’s imagination, resulting in several news articles (including Nature Africa), radio interviews and some popular science YouTube videos with hundreds of thousands of views (see the Science Communication page for more details). This demonstrates broader impact beyond the confines of academia.
Feature extraction is a critical component of anomaly detection. It transforms raw data into numerical representations that algorithms can analyse, allowing subtle patterns to emerge. Deep learning has the potential to automate this process, but is usually applied in a supervised context where large numbers of human labels are available. In Walmsley et al. (MNRAS, 2022), we developed a novel approach using a pretrained neural network to extract morphological features from thousands of images of galaxies, dramatically improving Astronomaly's ability to recover interesting sources.
To generalise this feature extraction method to datasets where a pretrained network is not available (such as new radio data), my team pioneered the use of self-supervised learning in astronomy, enabling feature extraction from completely unlabelled datasets (Mohale & Lochner, MNRAS, 2024). This powerful approach allows for fully automated sorting of data into classes and anomaly detection. With the growing popularity of foundation models, this framework could dramatically accelerate the development of catalogues and training sets, and the rapid discovery of new classes of sources.
With an expected 10 million alerts per night (LSST Science Book, 2009), automated anomaly detection will be crucial for discovery in the LSST real-time transient data. In Webb et al. (MNRAS, 2020), we demonstrated the use of unsupervised clustering coupled with Astronomaly to identify novel variable stars and flare stars in optical data from DECam. This paper has been cited 45 times, forming a foundation for a growing field. In Muthukrishna et al. (MNRAS, 2022) and Gupta, Muthukrishna & Lochner (RASTI, 2024), we explored real-time anomaly detection, developing new approaches to deep learning-based feature extraction and ensemble anomaly detection methods.
In a more recent paper titled Astronomaly Protege: Discovery Through Human-Machine Collaboration (Lochner & Rudnick, AJ , 2025), we developed a new approach that enhances active learning to rapidly identify areas of interest in feature space. Applied to the MeerKAT Galaxy Cluster Legacy Survey, Protege revealed a wealth of unusual radio sources, including a second candidate SAURON and other unique systems such as X-shaped radio galaxies and diffuse emission. Our work paves the way for making new discoveries in future SKA surveys, cementing South Africa's place on the global stage of machine learning in research.