I am a PhD candidate in Machine Learning at ETH Zurich, researching representation learning and structural inductive biases. I am currently applying these ideas to integrating images and other modalities into large language models. My research quality is supported by four first-author publications at NeurIPS (Spotlight), ICML, and EMNLP, each accepted on first submission. Recent industry research at Meta focused on large-scale tokenization and training objectives for multimodal generative pretraining (highest performance rating).