For example, a human who has learned how to cut an apple with a knife will likely then know how to cut a dragon fruit with a knife. Applying past knowledge to a new task is very easy for humans but still difficult for current end-to-end learning machines.
Although machines may know what a “dragon fruit” looks like, they can’t perform the task “cut the dragon fruit with a knife” unless they have been explicitly trained with the dataset containing this command.
By contrast, our agent demonstrated the ability to transfer what they knew about the visual appearance of a dragon fruit as well as the task of “cut X with a knife” successfully, without explicitly being trained to perform “cut the dragon fruit with a knife.”
In the figures below, our agent successfully executed the commands in the navigation tests.
Our next steps are two-fold: one is to teach the agent more abilities with natural language commands in the current 2D environment, and the other is to migrate it to a virtual 3D environment.
A virtual 3D environment poses more challenges, and it is more like the realistic environment we live in. Our ultimate goal is to train a physical robot in a realistic environment by a human teacher with natural language.
This article was republished with permission from Baidu Research.