Date
Journal Title
Journal ISSN
Volume Title
Publisher
J. J. Gibson suggested that objects in our environment can be represented by an agent in terms of the types of actions that the agent may perform on or with that object. This affordance representation allows the agent to make the connection between the perception of key properties of an object and these actions. In this dissertation, I explore the automatic construction of visual representations that are associated with components of objects that afford certain types of grasping actions. I propose that the type of grasp used on a class of objects should form the basis of these visual representations. The visual categories are driven by grasp types. A grasp type is defined as a cluster of grasp samples in the 6D hand position and orientation space relative to the object. Specifically, for each grasp type, a set of view-dependent visual
operators can be learned that match the appearance of the part of the object that is to be grasped. By focusing on object parts, as opposed to entire objects, the resulting visual operators can generalize across different object types that exhibit some similarities in 3D shape. In this dissertation, the training/testing data set is composed of a large set of example grasps made by a human teacher, and includes a set of fifty unique objects. Each grasp example consists of a stereo image pair of the object alone, a stereo image pair of the object being grasped, and information about the 3D pose of the hand relative to the object. The grasp regions in a training/testing image that correspond to locations at which certain grasp types could be applied to the object are automatically estimated. First, I show that classes of objects can be
formed on the basis of how the individual objects are grasped. Second, I show that visual models based on Pair of Adjacent Segments (PAS) features can capture view-dependent similarities in object part appearance for different objects of the same class. Third, I show that these visual operators can suggest grasp types and hand locations
and orientations for novel objects in novel scenarios. Given a novel image of a novel object, the proposed algorithm matches the learned shape models to this image. A match of the shape model in a novel image is interpreted as that the corresponding component of the image affords a particular grasp action. Experimental results show that the proposed algorithm is capable of identifying the occurrence of learned grasp options in images containing novel objects.