Logo

Adaptive Multi-Source Fusion for Interactive Reinforcement Learning


This project investigates how reinforcement learning agents can learn effectively from multiple imperfect sources of guidance. Rather than assuming that advice sources are uniformly reliable, the project studies how the structure of their errors affects learning, and develops adaptive fusion methods that decide when and how to use different sources of advice.

Date: 2025 - 2028

Persons participating in the project:

  • PIs: Dr. Francisco Cruz, Dr. Pamela Carreno-Medrano, Prof. Claude Sammut
  • Associates: Maher Mesto
  • Corresponding contact: m.mesto@unsw.edu.au

Research areas:
  • Interactive reinforcement learning
  • Human-guided reinforcement learning
  • Multi-source advice
  • Adaptive fusion
  • Ambiguity reduction
  • Robot learning
  • Deep reinforcement learning
  • Soft actor-critic
  • Continuous control

Description:
Reinforcement learning agents are increasingly expected to learn in settings where guidance may come from multiple sources, such as human feedback, demonstrations, heuristics, scripted advisors, or learned models. These sources can be useful but also ambiguous: they may disagree, make systematic mistakes, or only be reliable in particular regions of the task.

This project studies how agents can reduce ambiguity by adaptively combining heterogeneous advice sources during learning. A central question is whether the structure of an advisor’s errors matters more than its overall accuracy. The work examines cases where apparently less accurate advice may still be useful if its errors are predictable or task-aligned, and where superficially accurate advice may be harmful if its errors mislead the agent at critical moments.

The current stage of the project extends this idea to continuous-control robot learning, using soft actor-critic as the reinforcement learning backbone and simulated Kinova Gen3 reaching tasks as the experimental domain. The aim is to develop adaptive multi-source fusion methods that can identify when each source of advice is useful, reduce harmful ambiguity, and improve the reliability of learning from imperfect guidance.

Media:
Additional images/video





Selected Publications Web
Mesto, M., & Cruz, F. (2025, November). The Consensus Paradox: When Low Disagreement Leads to Catastrophic Failure in Multi-teacher Reinforcement Learning. In Australasian Joint Conference on Artificial Intelligence, (pp. 426-438). Singapore: Springer Nature Singapore. Best paper award.
Mesto, M., & Cruz, F. (2025). Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2025.