FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent Reinforcement Learning

Abstract

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory—a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-à-vis the backbone, compared to existing methods. Benchmarks, implementations, and trained models are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

FlickerFusion Visualization Demo

FlickerFusion is visualized from the perspective of a single agent (dark blue). Entities that are dropped out are set to 80% transparency.

FlickerFusion-Attention OOD1

Adversary

FlickerFusion-MLP OOD2

Guard

FlickerFusion vs. Baselines Demo

All visual renderings are 20 episodes on loop. The seed corresponding to the selected model is chosen blindly. Videos are recorded only once, therefore, not cherry picked.
Order of baselines are chosen a priori, and never changed post-rendering.

Out-of-Domain 1