Yan Scholten

Hello 🙂 I am a PhD student in the Data Analytics and Machine Learning group at the Technical University of Munich (TUM) located in Munich 🇩🇪, supervised by Prof. Stephan Günnemann. My research centers on trustworthy AI, with a focus on developing methods that make machine learning more safe and reliable. I tackle core challenges in machine learning, such as machine unlearning and alignment, adversarial robustness, robustness certification, and conformal prediction. My recent work advances the capabilities and reliability of large language models (LLMs).

News

Jan 2026	Two papers (on LLM unlearning and LLM robustness) accepted at ICLR 2026! 🎉
Nov 2025	I presented our new unlearning approach in Prof. Gidel's research group at Mila.
Oct 2025	I gave a talk on "Recent Frontiers in Trustworthy AI for Large Language Models" in Prof. Kutyniok's research group at LMU Munich.
Oct 2025	I got selected as Top Reviewer at NeurIPS 2025!
May 2025	I am visiting Carnegie Mellon University for a research stay!
Jan 2025	Two papers (one oral and one spotlight) accepted at ICLR 2025! 🎉

Selected Publications (full list)

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Yan Scholten, Sophie Xhonneux, Leo Schwinn*, and Stephan Günnemann*
International Conference on Learning Representations, ICLR 2026.

Abs Web PDF Blogpost Slides Poster Code BibTeX

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method–Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes four key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that better aligns with real-world privacy constraints.
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Yan Scholten, Stephan Günnemann, and Leo Schwinn
International Conference on Learning Representations, ICLR 2025 (Oral).

Abs Web PDF Slides Poster Code BibTeX

Comprehensive evaluation of Large Language Models (LLMs) is an open research problem. Existing evaluations rely on deterministic point estimates generated via greedy decoding. However, we find that deterministic evaluations fail to capture the whole output distribution of a model, yielding inaccurate estimations of model capabilities. This is particularly problematic in critical contexts such as unlearning and alignment, where precise model evaluations are crucial. To remedy this, we introduce the first formal probabilistic evaluation framework for LLMs. Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model. Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment. Our experimental analysis reveals that deterministic evaluations falsely indicate successful unlearning and alignment, whereas our probabilistic evaluations better capture model capabilities. We show how to overcome challenges associated with probabilistic outputs in a case study on unlearning by introducing (1) a novel loss based on entropy optimization, and (2) adaptive temperature scaling. We demonstrate that our approach significantly enhances unlearning in probabilistic settings on recent benchmarks. Overall, our proposed shift from point estimates to probabilistic evaluations of output distributions represents an important step toward comprehensive evaluations of LLMs.
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

Yan Scholten, and Stephan Günnemann
International Conference on Learning Representations, ICLR 2025 (Spotlight).

Abs Web PDF Poster Code BibTeX

Conformal prediction provides model-agnostic and distribution-free uncertainty quantification through prediction sets that are guaranteed to include the ground truth with any user-specified probability. Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data, which can significantly alter prediction sets in practice. As a solution, we propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning. To ensure reliability under training poisoning, we introduce smoothed score functions that reliably aggregate predictions of classifiers trained on distinct partitions of the training data. To ensure reliability under calibration poisoning, we construct multiple prediction sets, each calibrated on distinct subsets of the calibration data. We then aggregate them into a majority prediction set, which includes a class only if it appears in a majority of the individual sets. Both proposed aggregations mitigate the influence of datapoints in the training and calibration data on the final prediction set. We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. Overall, our approach represents an important step towards more trustworthy uncertainty quantification in the presence of data poisoning.
Hierarchical Randomized Smoothing

Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, and Stephan Günnemann
Advances in Neural Information Processing Systems, NeurIPS 2023.

Abs Web PDF Talk Slides Poster Code BibTeX

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs – by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both – certifiably robust to perturbations and accurate.
Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks

Yan Scholten, Jan Schuchardt, Simon Geisler, Aleksandar Bojchevski, and Stephan Günnemann
Advances in Neural Information Processing Systems, NeurIPS 2022.

Abs Web PDF Talk Slides Poster Code BibTeX

Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification.