Neural Scene Understanding For Robotics
Einleitung
Autonomous robots need to be able to identify and recognize objects in a scene. For this crucial task we need to represent the scene and objects within it and fully explore the scene. This has to be done in a time efficient way that also considers the problem of scene reconstruction as a constraint to the exploration, because the autonomous robot should ideally be helping in for example household task and cannot have unreasonable runtimes. After the exploration and scene reconstruction we would like to classify the objects, so that we can, for example, perform tasks with certain objects.
Methoden
We employ a path planning algorithm to explore the scene and increase the captured percentage of the scene which makes object recognition easier by simplifying the object completion from partial observations. We need to complete objects as our measurements will be incomplete due to the nature of a cluttered scene, where objects occlude another. IN order to achieve this task and complete a partially observed object to a complete one, we leverage the power of neural networks. By learning a latent space representation of objects, we try to complete perceived pointclouds of objects to a reasonable shape by estimating their latent representation and predicting the full objects shape. Additionally the network also quantifies how certain it is about the reconstruction based on a bayesian likelihood called uncertainty. With this value we can gauge how well the reconstructed shape is fitting the observed data and if the value is high, it indicates that our reconstruction is not sufficient. Our exploration algorithm considers this uncertainty value as well as the free space in the scene to both explore the unseen parts of the scene while also trying to further cover insufficiently reconstructed shapes, since more observed parts of an object make their reconstruction easier. Both parts of the thesis contribution, especially the neural network, are computationally expensive, which is why we relied on the HPC to lower the computational time required to infer a good solution during execution of the algorithm and for training the neural network, which required cumbersome optimization.
Ergebnisse
Our uncertainty guided exploration algorithm is comparable to more costly alternatives such as furthest viewpoint planning in terms of the resulting accuracy of object completion and scene reonstruction but is not relying on a heuristic approach as furthest viewpoint planning. It also travels less distance and is therefore more energy efficient. Given sufficient pointcloud coverage we can precisely reconstruct objects with our neural network from learnt shapes. Dependent on how divers the training set of the individual object is we can also approximate unseen objects of that category but the results tend to be worse and can even be totally insufficient.
Diskussion
The results showcase the usefulness of incorporating uncertainty estimation in the step of covering a scene, when trying to complete the objects within it. Improvements can be done in terms of weighing the uncertainty guidance during estimation, enlarging the database of objects, which currently focusses on 10 kinds of household items, and the accuracy of the reconstructions with bigger networks or more sophisticated representations.