A2R2 Group | Autonomous Agents and Robotics Research Group

Adaptive Multi-Source Fusion for Interactive Reinforcement Learning

This project investigates how reinforcement learning agents can learn effectively from multiple imperfect sources of guidance. Rather than assuming that advice sources are uniformly reliable, the project studies how the structure of their errors affects learning, and develops adaptive fusion methods that decide when and how to use different sources of advice.

Date: 2025 - 2028

Persons participating in the project:

PIs: Dr. Francisco Cruz, Dr. Pamela Carreno-Medrano, Prof. Claude Sammut
Associates: Maher Mesto
Corresponding contact: m.mesto@unsw.edu.au

Research areas:

Interactive reinforcement learning
Human-guided reinforcement learning
Multi-source advice
Adaptive fusion
Ambiguity reduction
Robot learning
Deep reinforcement learning
Soft actor-critic
Continuous control

Description:
Reinforcement learning agents are increasingly expected to learn in settings where guidance may come from multiple sources, such as human feedback, demonstrations, heuristics, scripted advisors, or learned models. These sources can be useful but also ambiguous: they may disagree, make systematic mistakes, or only be reliable in particular regions of the task.

This project studies how agents can reduce ambiguity by adaptively combining heterogeneous advice sources during learning. A central question is whether the structure of an advisor’s errors matters more than its overall accuracy. The work examines cases where apparently less accurate advice may still be useful if its errors are predictable or task-aligned, and where superficially accurate advice may be harmful if its errors mislead the agent at critical moments.

The current stage of the project extends this idea to continuous-control robot learning, using soft actor-critic as the reinforcement learning backbone and simulated Kinova Gen3 reaching tasks as the experimental domain. The aim is to develop adaptive multi-source fusion methods that can identify when each source of advice is useful, reduce harmful ambiguity, and improve the reliability of learning from imperfect guidance.

Media:
Additional images/video

Selected Publications	Web
Mesto, M., & Cruz, F. (2025, November). The Consensus Paradox: When Low Disagreement Leads to Catastrophic Failure in Multi-teacher Reinforcement Learning. In Australasian Joint Conference on Artificial Intelligence, (pp. 426-438). Singapore: Springer Nature Singapore. Best paper award.
Mesto, M., & Cruz, F. (2025). Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2025.

A2R2 Research Group	CONTACT	QUICK LINKS
Autonomous Agents and Robotics Research	f.cruz@unsw.edu.au	Google Scholar
School of Computer Science and Engineering	Room 510J, Ainsworth Building (J17)	LinkedIn
UNSW Sydney	Kensington NSW 2052, Australia	Personal webpage