FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning

Anonymous Authors

Abstract

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory—a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. Our results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-à-vis the backbone, compared to existing methods. For standardized evaluation, we introduce MPEv2, an enhanced version of Multi Particle Environments (MPE), consisting of 12 benchmarks. Benchmarks, implementations, and trained models are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

FlickerFusion Visualization Demo

FlickerFusion is visualized from the perspective of a single agent (dark blue). Entities that are dropped out are set to 80% transparency.

MY ALT TEXT

FlickerFusion-Attention OOD1

Adversary

MY ALT TEXT

FlickerFusion-MLP OOD2

Guard


FlickerFusion vs. Baselines Demo

All visual renderings are 20 episodes on loop. The seed corresponding to the selected model is chosen blindly. Videos are recorded only once, therefore, not cherry picked.
Order of baselines are chosen a priori, and never changed post-rendering.

Out-of-Domain 1

MY ALT TEXT

FlickerFusion-MLP

Spread ACORM OOD1

ACORM

Spread

MY ALT TEXT

FlickerFusion-MLP

MY ALT TEXT

CAMA

Repel

MY ALT TEXT

FlickerFusion-Attention

MY ALT TEXT

UPDeT

Tag

MY ALT TEXT

FlickerFusion-MLP

Guard ODIS OOD1

ODIS

Guard

MY ALT TEXT

FlickerFusion-Attention

MY ALT TEXT

REFIL

Adversary

Hunt FF-MLP OOD1

FlickerFusion-MLP

Hunt ACORM OOD1

ACORM

Hunt

Out-of-Domain 2

MY ALT TEXT

FlickerFusion-MLP

Spread ODIS OOD2

ODIS

Spread

MY ALT TEXT

FlickerFusion-MLP

MY ALT TEXT

REFIL

Repel

MY ALT TEXT

FlickerFusion-Attention

Tag ACORM OOD2

ACORM

Tag

MY ALT TEXT

FlickerFusion-MLP

Guard ODIS OOD1

CAMA

Guard

MY ALT TEXT

FlickerFusion-Attention

MY ALT TEXT

UPDeT

Adversary

Hunt FF-MLP OOD2

FlickerFusion-MLP

Hunt ODIS OOD2

ODIS

Hunt


Flicker Fusion Phenomena Analogy Demo

4 white circles represent visible entities. As the flicker frequency increases, we are able to emulate an aggregate view of all 8 entities (circles).


Source Code and Paper are under Apache-2.0 License © Authors