Florian Vogt

Hej! I am a Research Engineer at KTH Royal Institute of Technology in Stockholm, where I work on Reinforcement Learning.

I received my Master's degree from the University of Freiburg. I now want to focus on applying Reinforcement Learning to real-world applications.

I love solving problems that require an effort in both research and engineering. My work mostly focuses on sample and computational efficient RL, scaling it effectively for challenging tasks.

This effort resulted in XQC, a state-of-the-art algorithm in off-policy RL that achieves its performance through surprisingly simple methods.

Email  /  GitHub  /  Scholar

Florian Vogt
Research
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
Donghu Kim, Youngdoo Lee, Minho Park, Kinam Kim, Aswin Nahrendra, Takuma Seno, Sehee Min, Daniel Palenicek, Florian Vogt, Danica Kragic, Jan Peters, Jaesik Choo, Honglak Lee
Robotics: Science and Systems (RSS), 2026
Project Page / Code / ArXiv

Optimizing SAC for high-speed robotics training. Adopted as a baseline for high-dimensional robotic benchmarks.

XQC Baselines Plot XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
Daniel Palenicek, Florian Vogt, Joe Watson, Ingmar Posner, Jan Peters
International Conference on Learning Representations (ICLR), 2026
Project Page / Code / ArXiv

Accelerating training by improving the conditioning of the optimization landscape in deep RL.

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
Daniel Palenicek, Florian Vogt, Joe Watson, Jan Peters
NeurIPS, 2025
ArXiv

A study on how normalization stabilizes and scales off-policy learning for complex tasks.

Tree-Based RL Figure Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
Sascha Marton, Tim Grams, Florian Vogt, Stefan Luedtke, Christian Bartelt, Heiner Stuckenschmidt
International Conference on Learning Representations (ICLR), 2025
Code / ArXiv

Handles information bottleneck issues in tree-structured RL via direct optimization.