ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

  • PDF / 534,048 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 92 Downloads / 172 Views

DOWNLOAD

REPORT


ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming Joseph M. Romano · Jordan P. Brindza · Katherine J. Kuchenbecker

Received: 1 May 2012 / Accepted: 3 January 2013 / Published online: 5 February 2013 © Springer Science+Business Media New York 2013

Abstract Advances in audio recognition have enabled the real-world success of a wide variety of interactive voice systems over the last two decades. More recently, these same techniques have shown promise in recognizing non-speech audio events. Sounds are ubiquitous in real-world manipulation, such as the click of a button, the crash of an object being knocked over, and the whine of activation from an electric power tool. Surprisingly, very few autonomous robots leverage audio feedback to improve their performance. Modern audio recognition techniques exist that are capable of learning and recognizing real-world sounds, but few implementations exist that are easily incorporated into modern robotic programming frameworks. This paper presents a new software library known as the ROS Open-source Audio Recognizer (ROAR). ROAR provides a complete set of end-to-end tools for online supervised learning of new audio events, feature extraction, automatic one-class Support Vector Machine model tuning, and real-time audio event detection. Through implementation on a Barrett WAM arm, we show that comElectronic supplementary material The online version of this article (doi:10.1007/s10514-013-9323-6) contains supplementary material, which is available to authorized users. J. M. Romano (B) Robotics and Controls Group, Rethink Robotics Inc., Boston, MA, USA e-mail: [email protected] J. P. Brindza Department of Computer and Information Science, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA e-mail: [email protected] K. J. Kuchenbecker Department of Mechanical Engineering and Applied Mechanics, Haptics Group, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA e-mail: [email protected]

bining the contextual information of the manipulation action with a set of learned audio events yields significant improvements in robotic task-completion rates. Keywords Audio event detection · Robot manipulation · Audio recognition 1 Introduction Developing robots capabable of completing everyday tasks within the home has become a topic of increased focus within the last decade (Borst et al. 2009; Ciocarlie et al. 2010; Graf et al. 2004; Jain and Kemp 2010; Romano et al. 2011; Sakagami et al. 2002; Srinivasa et al. 2009). Interestingly, many common home appliances produce artificial sounds using audible buzzers or bells to indicate their status, such as clothes washers and dryers, dish-washing machines, doorbells, coffee makers, alarm clocks, ovens, microwaves, and telephones. Additionally, a variety of natural sounds occur during household manipulation tasks, such as the click of a successfully pushed button, the bark of a hungry pet dog, the whir of a vacuum cleaner’s motor (or lack thereof indicating a problem