Machine learningDeep learning / NLP / CV

Weakly Supervised Reinforcement Learning

Weakly supervised reinforcement learning (WSRL) trains agents in environments where the reward signal is imperfect, sparse, delayed, or only partially informative — unlike dense fully-supervised RL. The agent must learn effective policies despite incomplete feedback, using auxiliary signals, reward modeling, or preference learning to compensate for the weak supervision.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN: 978-0-262-03924-6
  2. Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems (NeurIPS), 30. link

Related methods

Referenced by

ScholarGateWeakly supervised reinforcement learning (Weakly Supervised Reinforcement Learning). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/weakly-supervised-reinforcement-learning