Science Cast

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

Kento KawaharazukaOctober 26, 2023 7:35am

Views (195)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

arXivPDFOctober 25, 2023 12:00am

Authors

Kento Kawaharazuka, Yoshiki Obinata, Naoaki Kanazawa, Kei Okada, Masayuki Inaba

Abstract

Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV is turned on or off. Until now, these states have been recognized by programmatically describing the state of a point cloud or raw image, by annotating and learning images, by using special sensors, etc. In contrast to these methods, we apply Visual Question Answering (VQA) from a Pre-Trained Vision-Language Model (PTVLM) trained on a large-scale dataset, to such binary state recognition. This idea allows us to intuitively describe state recognition in language without any re-training, thereby improving the recognition ability of robots in a simple and general way. We summarize various techniques in questioning methods and image processing, and clarify their properties through experiments.

TwitterandLinkedIn

0 comments

Add comment

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments