RainbowZero: Combining Advancements in Search and Reinforcement Learning for Complex Environments

Jonathan Lamontagne-Kratz

doi:10.26443/msurj.v21i2.428

Vol. 21 No. 2 (2026), OSE Undergraduate Science Showcase Proceedings

Vol. 21 No. 2 (2026)

RainbowZero: Combining Advancements in Search and Reinforcement Learning for Complex Environments

OSE Undergraduate Science Showcase Proceedings

https://doi.org/10.26443/msurj.v21i2.428

Published 2026-04-10

Jonathan Lamontagne-Kratz⁺⁻

Jonathan Lamontagne-Kratz

Department of Computer Science, McGill University, Montreal, Quebec, Canada

PDF

Keywords

Artifical Intelligence
MuZero
Reinforcement learning
Stochastic optimization
Monte Carlo tree search

How to Cite

Lamontagne-Kratz, J. (2026). RainbowZero: Combining Advancements in Search and Reinforcement Learning for Complex Environments. McGill Science Undergraduate Research Journal, 21(2). https://doi.org/10.26443/msurj.v21i2.428

Abstract

While deep reinforcement learning has achieved superhuman performance in fully observable, zero-sum games, environments characterized by high stochasticity and complex state spaces remain a significant challenge. This ongoing research introduces "Rainbow Zero," a novel algorithmic framework that synergizes the forward-planning capabilities of search-based algorithms (such as MuZero and Muesli) with advanced value estimation techniques and world models (e.g., Dreamer and Rainbow). We utilize the board game Settlers of Catan as our primary testbed due to its complex mechanics, varying board layouts, and high variance. Currently, our framework is being evaluated on reduced-complexity, randomized Catan boards, alongside classic control environments and foundational games. By benchmarking against the Catantron library baselines, this project aims to isolate the impact of combining specific architectural components—such as search-based planning and distributional reinforcement learning—via planned ablation studies. Ultimately, the goal of this research extends beyond game-playing. By mastering the stochastic resource generation and long-term planning required in Catan, the Rainbow Zero architecture is being developed as a proxy for complex, real-world optimization problems, with future applications specifically targeting autonomous decision-making in mining operations and genomic sequence alignment.

https://doi.org/10.26443/msurj.v21i2.428

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.

RainbowZero: Combining Advancements in Search and Reinforcement Learning for Complex Environments

Keywords

How to Cite

Download Citation

Abstract

Downloads