Investigating How Non-Expert Humans Design Curricula

In this project, we are interested in interactive learning setting where humans could design a curriculum of tasks and study how they design curricula. A better understanding of the curriculum-design strategies used by non-expert humans may help us to 1) understand the general principles that make some curriculum-design strategies better than others, and 2) inspire the design of new machine-learning algorithms and interfaces that better accommodate the natural tendencies of human trainers.

Adapting Agent Action Speed to Improve Task Learning from Humans [Video]

In this project, we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This is different than most exisitng work on Interactive Reinforcement Learning, where they focus on interpreting and incorporating non-expert human feedback to speed up learning.

Learning from Discrete Human Feedback

In this project, we consider the problem of a human trainer teaching an agent via providing positive or negative feedback. Most existing work on Interactive Reinforcement Learning has treated human feedback as a numerical value that the agent seeks to maximize, and has assumed that all trainers will give feedback in the same way when teaching the same behavior. In contrast, we treat the feedback as a human-delivered discrete communication between the trainers and the learners and different training strategies will be chosen by them. We propose a probabilistic model to classify different training strategies. We also present the SABL and I-SABL algorithms, which consider multiple interpretations of trainer feedback in order to learn behaviors more efficiently. Our online user studies show that human trainers follow various training strategies when teaching virtual agents and explicitly considering trainer strategy can allow a learner to make inferences from cases where no feedback is given.

Training an Agent to Ground Commands with Reward and Punishment

As increasing need for humans to convey complex tasks to robots without any technical expertise, conveying tasks through natural language provides an intuitive interface. But it needs the agent to learn a grounding of natural language commands. In this work, we developed a simple simulated home environment in which the robot needs to complete some tasks via learning from human positive or negative feedback.

Agent Corrections to Pac-Man from the Crowd

Reinforcement learning suffers from poor initial performance. Our approach uses crowdsourcing to provide non-expert suggestions to speed up learning of an RL agent. Currently, we are using Mrs. Pac-Man as our application domain for its popularity as a game. From our studies, we have already concluded that crowdsourcing, although non-experts, are good in identifying mistakes. We are now working on how we can integrate the crowd’s advice to speed up the RL agent’s learning. In the future, we intend to implement this approach to a physical robot.