arXiv daily: Robotics

arXiv daily: Robotics (cs.RO)

1.Learning to Simulate Tree-Branch Dynamics for Manipulation

Authors:Jayadeep Jacob, Tirthankar Bandyopadhyay, Jason Williams, Paulo Borges, Fabio Ramos

Abstract: We propose to use a simulation driven inverse inference approach to model the joint dynamics of tree branches under manipulation. Learning branch dynamics and gaining the ability to manipulate deformable vegetation can help with occlusion-prone tasks, such as fruit picking in dense foliage, as well as moving overhanging vines and branches for navigation in dense vegetation. The underlying deformable tree geometry is encapsulated as coarse spring abstractions executed on parallel, non-differentiable simulators. The implicit statistical model defined by the simulator, reference trajectories obtained by actively probing the ground truth, and the Bayesian formalism, together guide the spring parameter posterior density estimation. Our non-parametric inference algorithm, based on Stein Variational Gradient Descent, incorporates biologically motivated assumptions into the inference process as neural network driven learnt joint priors; moreover, it leverages the finite difference scheme for gradient approximations. Real and simulated experiments confirm that our model can predict deformation trajectories, quantify the estimation uncertainty, and it can perform better when base-lined against other inference algorithms, particularly from the Monte Carlo family. The model displays strong robustness properties in the presence of heteroscedastic sensor noise; furthermore, it can generalise to unseen grasp locations.

2.A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch

Authors:Federico Ceola, Elisa Maiettini, Lorenzo Rosasco, Lorenzo Natale

Abstract: Multi-fingered robotic hands could enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, training these policies with state-of-the-art model-free algorithms is greatly challenging for multi-fingered hands. The main problem is that an efficient exploration of the environment is not possible for such high-dimensional problems, thus causing issues in the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations. However, oftentimes this is incredibly demanding in terms of time and computational resources. In this work, we overcome these requirements and propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. The results show that G-PAYN outperforms current DRL techniques in the considered setting, in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments will be released upon acceptance.

3.Online Estimation of Self-Body Deflection With Various Sensor Data Based on Directional Statistics

Authors:Hiroya Sato, Kento Kawaharazuka, Tasuku Makabe, Kei Okada, Masayuki Inaba

Abstract: In this paper, we propose a method for online estimation of the robot's posture. Our method uses von Mises and Bingham distributions as probability distributions of joint angles and 3D orientation, which are used in directional statistics. We constructed a particle filter using these distributions and configured a system to estimate the robot's posture from various sensor information (e.g., joint encoders, IMU sensors, and cameras). Furthermore, unlike tangent space approximations, these distributions can handle global features and represent sensor characteristics as observation noises. As an application, we show that the yaw drift of a 6-axis IMU sensor can be represented probabilistically to prevent adverse effects on attitude estimation. For the estimation, we used an approximate model that assumes the actual robot posture can be reproduced by correcting the joint angles of a rigid body model. In the experiment part, we tested the estimator's effectiveness by examining that the joint angles generated with the approximate model can be estimated using the link pose of the same model. We then applied the estimator to the actual robot and confirmed that the gripper position could be estimated, thereby verifying the validity of the approximate model in our situation.

4.A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics

Authors:Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Achim J. Lilienthal, Martin Magnusson

Abstract: Human motion prediction is essential for the safe and smooth operation of mobile service robots and intelligent vehicles around people. Commonly used neural network-based approaches often require large amounts of complete trajectories to represent motion dynamics in complex semantically-rich spaces. This requirement may complicate deployment of physical systems in new environments, especially when the data is being collected online from onboard sensors. In this paper we explore a data-efficient alternative using maps of dynamics (MoD) to represent place-dependent multi-modal spatial motion patterns, learned from prior observations. Our approach can perform efficient human motion prediction in the long-term perspective of up to 60 seconds. We quantitatively evaluate its accuracy with limited amount of training data in comparison to an LSTM-based baseline, and qualitatively show that the predicted trajectories reflect the natural semantic properties of the environment, e.g. the locations of short- and long-term goals, navigation in narrow passages, around obstacles, etc.

5.Single-Shot Global Localization via Graph-Theoretic Correspondence Matching

Authors:Shigemichi Matsuzaki, Kenji Koide, Shuji Oishi, Masashi Yokozuka, Atsuhiko Banno

Abstract: This paper describes a method of global localization based on graph-theoretic association of instances between a query and the prior map. The proposed framework employs correspondence matching based on the maximum clique problem (MCP). The framework is potentially applicable to other map and/or query modalities thanks to the graph-based abstraction of the problem, while many of existing global localization methods rely on a query and the dataset in the same modality. We implement it with a semantically labeled 3D point cloud map, and a semantic segmentation image as a query. Leveraging the graph-theoretic framework, the proposed method realizes global localization exploiting only the map and the query. The method shows promising results on multiple large-scale simulated maps of urban scenes.

6.GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model

Authors:Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

Abstract: Energy consumption of memory accesses dominates the compute energy in energy-constrained robots which require a compact 3D map of the environment to achieve autonomy. Recent mapping frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to multi-pass processing of each depth image. In this work, we present a memory-efficient continuous occupancy map, named GMMap, that accurately models the 3D environment using a Gaussian Mixture Model (GMM). Memory-efficient GMMap construction is enabled by the single-pass compression of depth images into local GMMs which are directly fused together into a globally-consistent map. By extending Gaussian Mixture Regression to model unexplored regions, occupancy probability is directly computed from Gaussians. Using a low-power ARM Cortex A57 CPU, GMMap can be constructed in real-time at up to 60 images per second. Compared with prior works, GMMap maintains high accuracy while reducing the map size by at least 56%, memory overhead by at least 88%, DRAM access by at least 78%, and energy consumption by at least 69%. Thus, GMMap enables real-time 3D mapping on energy-constrained robots.

7.Exploring the effects of robotic design on learning and neural control

Authors:Joshua Paul Powers

Abstract: The ongoing deep learning revolution has allowed computers to outclass humans in various games and perceive features imperceptible to humans during classification tasks. Current machine learning techniques have clearly distinguished themselves in specialized tasks. However, we have yet to see robots capable of performing multiple tasks at an expert level. Most work in this field is focused on the development of more sophisticated learning algorithms for a robot's controller given a largely static and presupposed robotic design. By focusing on the development of robotic bodies, rather than neural controllers, I have discovered that robots can be designed such that they overcome many of the current pitfalls encountered by neural controllers in multitask settings. Through this discovery, I also present novel metrics to explicitly measure the learning ability of a robotic design and its resistance to common problems such as catastrophic interference. Traditionally, the physical robot design requires human engineers to plan every aspect of the system, which is expensive and often relies on human intuition. In contrast, within the field of evolutionary robotics, evolutionary algorithms are used to automatically create optimized designs, however, such designs are often still limited in their ability to perform in a multitask setting. The metrics created and presented here give a novel path to automated design that allow evolved robots to synergize with their controller to improve the computational efficiency of their learning while overcoming catastrophic interference. Overall, this dissertation intimates the ability to automatically design robots that are more general purpose than current robots and that can perform various tasks while requiring less computation.

8.Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction

Authors:Guillaume Bono, Leonid Antsfeld, Assem Sadek, Gianluca Monaci, Christian Wolf

Abstract: Agents navigating in 3D environments require some form of memory, which should hold a compact and actionable representation of the history of observations useful for decision taking and planning. In most end-to-end learning approaches the representation is latent and usually does not have a clearly defined interpretation, whereas classical robotics addresses this with scene reconstruction resulting in some form of map, usually estimated with geometry and sensor models and/or learning. In this work we propose to learn an actionable representation of the scene independently of the targeted downstream task and without explicitly optimizing reconstruction. The learned representation is optimized by a blind auxiliary agent trained to navigate with it on multiple short sub episodes branching out from a waypoint and, most importantly, without any direct visual observation. We argue and show that the blindness property is important and forces the (trained) latent representation to be the only means for planning. With probing experiments we show that the learned representation optimizes navigability and not reconstruction. On downstream tasks we show that it is robust to changes in distribution, in particular the sim2real gap, which we evaluate with a real physical robot in a real office building, significantly improving performance.

9.Simultaneous Position-and-Stiffness Control of Underactuated Antagonistic Tendon-Driven Continuum Robots

Authors:Bowen Yi, Yeman Fan, Dikai Liu, Jose Guadalupe Romero

Abstract: Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have focused on incorporating stiffness regulation in their feedback control design; however, this is one of the initial motivations to develop continuum robots. This paper aims to address the crucial challenge of controlling both the position and stiffness of a class of highly underactuated continuum robots that are actuated by antagonistic tendons. To this end, the first step involves presenting a high-dimensional rigid-link dynamical model that can analyze the open-loop stiffening of tendon-driven continuum robots. Based on this model, we propose a novel passivity-based position-and-stiffness controller adheres to the non-negative tension constraint. To demonstrate the effectiveness of our approach, we tested the theoretical results on our continuum robot, and the experimental results show the efficacy and precise performance of the proposed methodology.

10.Biological Organisms as End Effectors

Authors:Josephine Galipon, Shoya Shimizu, Kenjiro Tadakuma

Abstract: In robotics, an end effector is a device at the end of a robotic arm designed to interact with the environment. Effectively, it serves as the hand of the robot, carrying out tasks on behalf of humans. But could we turn this concept on its head and consider using living organisms themselves as end-effectors? This paper introduces a novel idea of using whole living organisms as end effectors for robotics. We showcase this by demonstrating that pill bugs and chitons -- types of small, harmless creatures -- can be utilized as functional grippers. Crucially, this method does not harm these creatures, enabling their release back into nature after use. How this concept may be expanded to other organisms and applications is also discussed.

1.Bridging the Domain Gap between Synthetic and Real-World Data for Autonomous Driving

Authors:Xiangyu Bai, Yedi Luo, Le Jiang, Aniket Gupta, Pushyami Kaveti, Hanumant Singh, Sarah Ostadabbas

Abstract: Modern autonomous systems require extensive testing to ensure reliability and build trust in ground vehicles. However, testing these systems in the real-world is challenging due to the lack of large and diverse datasets, especially in edge cases. Therefore, simulations are necessary for their development and evaluation. However, existing open-source simulators often exhibit a significant gap between synthetic and real-world domains, leading to deteriorated mobility performance and reduced platform reliability when using simulation data. To address this issue, our Scoping Autonomous Vehicle Simulation (SAVeS) platform benchmarks the performance of simulated environments for autonomous ground vehicle testing between synthetic and real-world domains. Our platform aims to quantify the domain gap and enable researchers to develop and test autonomous systems in a controlled environment. Additionally, we propose using domain adaptation technologies to address the domain gap between synthetic and real-world data with our SAVeS$^+$ extension. Our results demonstrate that SAVeS$^+$ is effective in helping to close the gap between synthetic and real-world domains and yields comparable performance for models trained with processed synthetic datasets to those trained on real-world datasets of same scale. This paper highlights our efforts to quantify and address the domain gap between synthetic and real-world data for autonomy simulation. By enabling researchers to develop and test autonomous systems in a controlled environment, we hope to bring autonomy simulation one step closer to realization.

2.Music Mode: Transforming Robot Movement into Music Increases Likability and Perceived Intelligence

Authors:Catie Cuan, Emre Fisher, Allison Okamura, Tom Engbersen

Abstract: As robots enter everyday spaces like offices, the sounds they create affect how they are perceived. We present "Music Mode", a novel mapping between a robot's joint motions and sounds, programmed by artists and engineers to make the robot generate music as it moves. Two experiments were designed to characterize the effect of this musical augmentation on human users. In the first experiment, a robot performed three tasks while playing three different sound mappings. Results showed that participants observing the robot perceived it as more safe, animate, intelligent, anthropomorphic, and likable when playing the Music Mode Orchestral software. To test whether the results of the first experiment were due to the Music Mode algorithm, rather than music alone, we conducted a second experiment. Here the robot performed the same three tasks, while a participant observed via video, but the Orchestral music was either linked to its movement or random. Participants rated the robots as more intelligent when the music was linked to the movement. Robots using Music Mode logged approximately two hundred hours of operation while navigating, wiping tables, and sorting trash, and bystander comments made during this operating time served as an embedded case study. The contributions are: (1) an interdisciplinary choreographic, musical, and coding design process to develop a real-world robot sound feature, (2) a technical implementation for movement-based sound generation, and (3) two experiments and an embedded case study of robots running this feature during daily work activities that resulted in increased likeability and perceived intelligence of the robot.

3.Hybrid Trajectory Optimization for Autonomous Terrain Traversal of Articulated Tracked Robots

Authors:Zhengzhe Xu, Yanbo Chen, Zhuozhu Jian, Xueqian Wang, Bin Liang

Abstract: Autonomous terrain traversal of articulated tracked robots can reduce operator cognitive load to enhance task efficiency and facilitate extensive deployment. We present a novel hybrid trajectory optimization method aimed at generating smooth, stable, and efficient traversal motions. To achieve this, we develop a planar robot-terrain interaction model and partition the robot's motion into hybrid modes of driving and traversing. By using a generalized coordinate description, the configuration space dimension is reduced, which provides real-time planning capability. The hybrid trajectory optimization is transcribed into a nonlinear programming problem and solved in a receding-horizon planning fashion. Mode switching is facilitated by associating optimized motion durations with a predefined traversal sequence. A multi-objective cost function is formulated to further improve the traversal performance. Additionally, map sampling, terrain simplification, and tracking controller modules are integrated into the autonomous terrain traversal system. Our approach is validated in simulation and real-world experiments with the Searcher robotic platform, effectively achieving smooth and stable motion with high time and energy efficiency compared to expert operator control.

4.Social Robots As Companions for Lonely Hearts: The Role of Anthropomorphism and Robot Appearances

Authors:Yoonwon Jung, Sowon Hahn

Abstract: Loneliness is a distressing personal experience and a growing social issue. Social robots could alleviate the pain of loneliness, particularly for those who lack in-person interaction. This paper investigated how the effect of loneliness on anthropomorphizing social robots differs by robot appearances, and how it leads to the purchase intention of social robots. Participants viewed a video of one of the three robots(machine-like, animal-like, and human-like) moving and interacting with a human counterpart. The results revealed that when individuals were lonelier, the tendency to anthropomorphize human-like robots increased more than that of animal-like robots. The moderating effect remained significant after covariates were included. The increase in anthropomorphic tendency predicted the heightened purchase intent. The findings imply that human-like robots induce lonely individuals' desire to replenish the sense of connectedness from robots more than animal-like robots, and that anthropomorphic tendency reveals the potential of social robots as real-life companions of lonely individuals.

5.Situational Adaptive Motion Prediction for Firefighting Squads in Indoor Search and Rescue

Authors:Nils Mandischer, Frederik Schicks, Burkhard Corves

Abstract: Firefighting is a complex, yet low automated task. To mitigate ergonomic and safety related risks on the human operators, robots could be deployed in a collaborative approach. To allow human-robot teams in firefighting, important basics are missing. Amongst other aspects, the robot must predict the human motion as occlusion is ever-present. In this work, we propose a novel motion prediction pipeline for firefighters' squads in indoor search and rescue. The squad paths are generated with an optimal graph-based planning approach representing firefighters' tactics. Paths are generated per room which allows to dynamically adapt the path locally without global re-planning. The motion of singular agents is simulated using a modification of the headed social force model. We evaluate the pipeline for feasibility with a novel data set generated from real footage and show the computational efficiency.

6.Knowledge-Driven Robot Program Synthesis from Human VR Demonstrations

Authors:Benjamin Alt, Franklin Kenghagho Kenfack, Andrei Haidu, Darko Katic, Rainer Jäkel, Michael Beetz

Abstract: Aging societies, labor shortages and increasing wage costs call for assistance robots capable of autonomously performing a wide array of real-world tasks. Such open-ended robotic manipulation requires not only powerful knowledge representations and reasoning (KR&R) algorithms, but also methods for humans to instruct robots what tasks to perform and how to perform them. In this paper, we present a system for automatically generating executable robot control programs from human task demonstrations in virtual reality (VR). We leverage common-sense knowledge and game engine-based physics to semantically interpret human VR demonstrations, as well as an expressive and general task representation and automatic path planning and code generation, embedded into a state-of-the-art cognitive architecture. We demonstrate our approach in the context of force-sensitive fetch-and-place for a robotic shopping assistant. The source code is available at

7.Motion Control based on Disturbance Estimation and Time-Varying Gain for Robotic Manipulators

Authors:Xinyu Jia, Jun Yang, Kaixin Lu, Haoyong Yu

Abstract: To achieve high-accuracy manipulation in the presence of unknown dynamics and external disturbance, we propose an efficient and robust motion controller (named TvUDE) for robotic manipulators. The controller incorporates a disturbance estimation mechanism that utilizes reformulated robot dynamics and filtering operations to obtain uncertainty and disturbance without requiring measurement of acceleration. Furthermore, we design a time-varying control input gain to enhance the control system's robustness. Finally, we analyze the boundness of the control signal and the stability of the closed-loop system, and conduct a set of experiments on a six-DOF robotic manipulator. The experimental results verify the effectiveness of TvUDE in handling internal uncertainty and external static or transient disturbance.

8.Robot Patrol: Using Crowdsourcing and Robotic Systems to Provide Indoor Navigation Guidance to The Visually Impaired

Authors:Ike Obi, Ruiqi Wang, Prakash Shukla, Byung-Cheol Min

Abstract: Indoor navigation is a challenging activity for persons with disabilities, particularly, for those with low vision and visual impairment. Researchers have explored numerous solutions to resolve these challenges; however, several issues remain unsolved, particularly around providing dynamic and contextual information about potential obstacles in indoor environments. In this paper, we developed Robot Patrol, an integrated system that employs a combination of crowdsourcing, computer vision, and robotic frameworks to provide contextual information to the visually impaired to empower them to navigate indoor spaces safely. In particular, the system is designed to provide information to the visually impaired about 1) potential obstacles on the route to their indoor destination, 2) information about indoor events on their route which they may wish to avoid or attend, and 3) any other contextual information that might support them to navigate to their indoor destinations safely and effectively. Findings from the Wizard of Oz experiment of our demo system provide insights into the benefits and limitations of the system. We provide a concise discussion on the implications of our findings.

9.Long-range UAV Thermal Geo-localization with Satellite Imagery

Authors:Jiuhong Xiao, Daniel Tortei, Eloy Roura, Giuseppe Loianno

Abstract: Onboard sensors, such as cameras and thermal sensors, have emerged as effective alternatives to Global Positioning System (GPS) for geo-localization in Unmanned Aerial Vehicle (UAV) navigation. Since GPS can suffer from signal loss and spoofing problems, researchers have explored camera-based techniques such as Visual Geo-localization (VG) using satellite imagery. Additionally, thermal geo-localization (TG) has become crucial for long-range UAV flights in low-illumination environments. This paper proposes a novel thermal geo-localization framework using satellite imagery, which includes multiple domain adaptation methods to address the limited availability of paired thermal and satellite images. The experimental results demonstrate the effectiveness of the proposed approach in achieving reliable thermal geo-localization performance, even in thermal images with indistinct self-similar features. We evaluate our approach on real data collected onboard a UAV. We also release the code and \textit{Boson-nighttime}, a dataset of paired satellite-thermal and unpaired satellite images for thermal geo-localization with satellite imagery. To the best of our knowledge, this work is the first to propose a thermal geo-localization method using satellite imagery in long-range flights.

10.MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Authors:Chiyu Max Jiang, Andre Cornman, Cheolho Park, Ben Sapp, Yin Zhou, Dragomir Anguelov

Abstract: We present MotionDiffuser, a diffusion based representation for the joint distribution of future trajectories over multiple agents. Such representation has several key advantages: first, our model learns a highly multimodal distribution that captures diverse future outcomes. Second, the simple predictor design requires only a single L2 loss training objective, and does not depend on trajectory anchors. Third, our model is capable of learning the joint distribution for the motion of multiple agents in a permutation-invariant manner. Furthermore, we utilize a compressed trajectory representation via PCA, which improves model performance and allows for efficient computation of the exact sample log probability. Subsequently, we propose a general constrained sampling framework that enables controlled trajectory sampling based on differentiable cost functions. This strategy enables a host of applications such as enforcing rules and physical priors, or creating tailored simulation scenarios. MotionDiffuser can be combined with existing backbone architectures to achieve top motion forecasting results. We obtain state-of-the-art results for multi-agent motion prediction on the Waymo Open Motion Dataset.

1.Efficient volumetric mapping of multi-scale environments using wavelet-based compression

Authors:Victor Reijgwart, Cesar Cadena, Roland Siegwart, Lionel Ott

Abstract: Volumetric maps are widely used in robotics due to their desirable properties in applications such as path planning, exploration, and manipulation. Constant advances in mapping technologies are needed to keep up with the improvements in sensor technology, generating increasingly vast amounts of precise measurements. Handling this data in a computationally and memory-efficient manner is paramount to representing the environment at the desired scales and resolutions. In this work, we express the desirable properties of a volumetric mapping framework through the lens of multi-resolution analysis. This shows that wavelets are a natural foundation for hierarchical and multi-resolution volumetric mapping. Based on this insight we design an efficient mapping system that uses wavelet decomposition. The efficiency of the system enables the use of uncertainty-aware sensor models, improving the quality of the maps. Experiments on both synthetic and real-world data provide mapping accuracy and runtime performance comparisons with state-of-the-art methods on both RGB-D and 3D LiDAR data. The framework is open-sourced to allow the robotics community at large to explore this approach.

2.Nonholonomic Motion Planning as Efficient as Piano Mover's

Authors:David Nister, Jaikrishna Soundararajan, Yizhou Wang, Harshad Sane

Abstract: We present an algorithm for non-holonomic motion planning (or 'parking a car') that is as computationally efficient as a simple approach to solving the famous Piano-mover's problem, where the non-holonomic constraints are ignored. The core of the approach is a graph-discretization of the problem. The graph-discretization is provably accurate in modeling the non-holonomic constraints, and yet is nearly as small as the straightforward regular grid discretization of the Piano-mover's problem into a 3D volume of 2D position plus angular orientation. Where the Piano mover's graph has one vertex and edges to six neighbors each, we have three vertices with a total of ten edges, increasing the graph size by less than a factor of two, and this factor does not depend on spatial or angular resolution. The local edge connections are organized so that they represent globally consistent turn and straight segments. The graph can be used with Dijkstra's algorithm, A*, value iteration or any other graph algorithm. Furthermore, the graph has a structure that lends itself to processing with deterministic massive parallelism. The turn and straight curves divide the configuration space into many parallel groups. We use this to develop a customized 'kernel-style' graph processing method. It results in an N-turn planner that requires no heuristics or load balancing and is as efficient as a simple solution to the Piano mover's problem even in sequential form. In parallel form it is many times faster than the sequential processing of the graph, and can run many times a second on a consumer grade GPU while exploring a configuration space pose grid with very high spatial and angular resolution. We prove approximation quality and computational complexity and demonstrate that it is a flexible, practical, reliable, and efficient component for a production solution.

3.Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Authors:David Millard, Daniel Pastor, Joseph Bowkett, Paul Backes, Gaurav S. Sukhatme

Abstract: Granular materials are of critical interest to many robotic tasks in planetary science, construction, and manufacturing. However, the dynamics of granular materials are complex and often computationally very expensive to simulate. We propose a set of methodologies and a system for the fast simulation of granular materials on Graphics Processing Units (GPUs), and show that this simulation is fast enough for basic training with Reinforcement Learning algorithms, which currently require many dynamics samples to achieve acceptable performance. Our method models granular material dynamics using implicit timestepping methods for multibody rigid contacts, as well as algorithmic techniques for efficient parallel collision detection between pairs of particles and between particle and arbitrarily shaped rigid bodies, and programming techniques for minimizing warp divergence on Single-Instruction, Multiple-Thread (SIMT) chip architectures. We showcase our simulation system on several environments targeted toward robotic tasks, and release our simulator as an open-source tool.

4.CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Authors:Ayush Agrawal, Raghav Arora, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna

Abstract: This paper introduces a novel method for determining the best room to place an object in, for embodied scene rearrangement. While state-of-the-art approaches rely on large language models (LLMs) or reinforcement learned (RL) policies for this task, our approach, CLIPGraphs, efficiently combines commonsense domain knowledge, data-driven methods, and recent advances in multimodal learning. Specifically, it (a)encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines

5.Temporal-controlled Frame Swap for Generating High-Fidelity Stereo Driving Data for Autonomy Analysis

Authors:Yedi Luo, Xiangyu Bai, Le Jiang, Aniket Gupta, Eric Mortin, Hanumant Singh Sarah Ostadabbas

Abstract: This paper presents a novel approach, TeFS (Temporal-controlled Frame Swap), to generate synthetic stereo driving data for visual simultaneous localization and mapping (vSLAM) tasks. TeFS is designed to overcome the lack of native stereo vision support in commercial driving simulators, and we demonstrate its effectiveness using Grand Theft Auto V (GTA V), a high-budget open-world video game engine. We introduce GTAV-TeFS, the first large-scale GTA V stereo-driving dataset, containing over 88,000 high-resolution stereo RGB image pairs, along with temporal information, GPS coordinates, camera poses, and full-resolution dense depth maps. GTAV-TeFS offers several advantages over other synthetic stereo datasets and enables the evaluation and enhancement of state-of-the-art stereo vSLAM models under GTA V's environment. We validate the quality of the stereo data collected using TeFS by conducting a comparative analysis with the conventional dual-viewport data using an open-source simulator. We also benchmark various vSLAM models using the challenging-case comparison groups included in GTAV-TeFS, revealing the distinct advantages and limitations inherent to each model. The goal of our work is to bring more high-fidelity stereo data from commercial-grade game simulators into the research domain and push the boundary of vSLAM models. %Our dataset also demonstrates the effectiveness of pre-trained state-of-the-art stereo matching networks, which show considerable performance gains on KITTI stereo depth estimation benchmarks. All code and datasets will be released upon acceptance.

1.Low Voltage Electrohydraulic Actuators for Untethered Robotics

Authors:Stephan-Daniel Gravert, Elia Varini, Amirhossein Kazemipour, Mike Y. Michelis, Thomas Buchner, Ronan Hinchet, Robert K. Katzschmann

Abstract: Rigid robots can be precise in repetitive tasks but struggle in unstructured environments. Nature's versatility in such environments inspires researchers to develop biomimetic robots that incorporate compliant and contracting artificial muscles. Among the recently proposed artificial muscle technologies, electrohydraulic actuators are promising since they offer comparable performance to mammalian muscles in terms of speed and power density. However, they require high driving voltages and have safety concerns due to exposed electrodes. These high voltages lead to either bulky or inefficient driving electronics that make untethered, high-degree-of-freedom bio-inspired robots difficult to realize. Here, we present low voltage electrohydraulic actuators (LEAs) that match mammalian skeletal muscles in average power density (50.5 W/kg) and peak strain rate (971 percent/s) at a driving voltage of just 1100 V. This driving voltage is approx. 5 - 7 times lower compared to other electrohydraulic actuators using paraelectric dielectrics. Furthermore, LEAs are safe to touch, waterproof, and self-clearing, which makes them easy to implement in wearables and robotics. We characterize, model, and physically validate key performance metrics of the actuator and compare its performance to state-of-the-art electrohydraulic designs. Finally, we demonstrate the utility of our actuators on two muscle-based electrohydraulic robots: an untethered soft robotic swimmer and a robotic gripper. We foresee that LEAs can become a key building block for future highly-biomimetic untethered robots and wearables with many independent artificial muscles such as biomimetic hands, faces, or exoskeletons.

2.Stay on Track: A Frenet Wrapper to Overcome Off-road Trajectories in Vehicle Motion Prediction

Authors:Marcel Hallgarten, Ismail Kisa, Martin Stoll, Andreas Zell

Abstract: Predicting the future motion of observed vehicles is a crucial enabler for safe autonomous driving. The field of motion prediction has seen large progress recently with State-of-the-Art (SotA) models achieving impressive results on large-scale public benchmarks. However, recent work revealed that learning-based methods are prone to predict off-road trajectories in challenging scenarios. These can be created by perturbing existing scenarios with additional turns in front of the target vehicle while the motion history is left unchanged. We argue that this indicates that SotA models do not consider the map information sufficiently and demonstrate how this can be solved, by representing model inputs and outputs in a Frenet frame defined by lane centreline sequences. To this end, we present a general wrapper that leverages a Frenet representation of the scene and that can be applied to SotA models without changing their architecture. We demonstrate the effectiveness of this approach in a comprehensive benchmark using two SotA motion prediction models. Our experiments show that this reduces the off-road rate on challenging scenarios by more than 90\%, without sacrificing average performance.

3.Progressive Learning for Physics-informed Neural Motion Planning

Authors:Ruiqi Ni, Ahmed H. Qureshi

Abstract: Motion planning (MP) is one of the core robotics problems requiring fast methods for finding a collision-free robot motion path connecting the given start and goal states. Neural motion planners (NMPs) demonstrate fast computational speed in finding path solutions but require a huge amount of expert trajectories for learning, thus adding a significant training computational load. In contrast, recent advancements have also led to a physics-informed NMP approach that directly solves the Eikonal equation for motion planning and does not require expert demonstrations for learning. However, experiments show that the physics-informed NMP approach performs poorly in complex environments and lacks scalability in multiple scenarios and high-dimensional real robot settings. To overcome these limitations, this paper presents a novel and tractable Eikonal equation formulation and introduces a new progressive learning strategy to train neural networks without expert data in complex, cluttered, multiple high-dimensional robot motion planning scenarios. The results demonstrate that our method outperforms state-of-the-art traditional MP, data-driven NMP, and physics-informed NMP methods by a significant margin in terms of computational planning speed, path quality, and success rates. We also show that our approach scales to multiple complex, cluttered scenarios and the real robot set up in a narrow passage environment. The proposed method's videos and code implementations are available at

4.Experimental Energy Consumption Analysis of a Flapping-Wing Robot

Authors:Raul Tapia, Alvaro Cesar Satue, Saeed Rafee Nekoo, José Ramiro Martínez-de Dios, Anibal Ollero

Abstract: One of the motivations for exploring flapping-wing aerial robotic systems is to seek energy reduction, by maintaining manoeuvrability, compared to conventional unmanned aerial systems. A Flapping Wing Flying Robot (FWFR) can glide in favourable wind conditions, decreasing energy consumption significantly. In addition, it is also necessary to investigate the power consumption of the components in the flapping-wing robot. In this work, two sets of the FWFR components are analyzed in terms of power consumption: a) motor/electronics components and b) a vision system for monitoring the environment during the flight. A measurement device is used to record the power utilization of the motors in the launching and ascending phases of the flight and also in cruising flight around the desired height. Additionally, an analysis of event cameras and stereo vision systems in terms of energy consumption has been performed. The results provide a first step towards decreasing battery usage and, consequently, providing additional flight time.

5.Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers

Authors:Jacob J Johnson, Ahmed H Qureshi, Michael Yip

Abstract: Motion planning is integral to robotics applications such as autonomous driving, surgical robots, and industrial manipulators. Existing planning methods lack scalability to higher-dimensional spaces, while recent learning based planners have shown promise in accelerating sampling-based motion planners (SMP) but lack generalizability to out-of-distribution environments. To address this, we present a novel approach, Vector Quantized-Motion Planning Transformers (VQ-MPT) that overcomes the key generalization and scaling drawbacks of previous learning-based methods. VQ-MPT consists of two stages. Stage 1 is a Vector Quantized-Variational AutoEncoder model that learns to represent the planning space using a finite number of sampling distributions, and stage 2 is an Auto-Regressive model that constructs a sampling region for SMPs by selecting from the learned sampling distribution sets. By splitting large planning spaces into discrete sets and selectively choosing the sampling regions, our planner pairs well with out-of-the-box SMPs, generating near-optimal paths faster than without VQ-MPT's aid. It is generalizable in that it can be applied to systems of varying complexities, from 2D planar to 14D bi-manual robots with diverse environment representations, including costmaps and point clouds. Trained VQ-MPT models generalize to environments unseen during training and achieve higher success rates than previous methods.

6.A Probabilistic Relaxation of the Two-Stage Object Pose Estimation Paradigm

Authors:Onur Beker

Abstract: Existing object pose estimation methods commonly require a one-to-one point matching step that forces them to be separated into two consecutive stages: visual correspondence detection (e.g., by matching feature descriptors as part of a perception front-end) followed by geometric alignment (e.g., by optimizing a robust estimation objective for pointcloud registration or perspective-n-point). Instead, we propose a matching-free probabilistic formulation with two main benefits: i) it enables unified and concurrent optimization of both visual correspondence and geometric alignment, and ii) it can represent different plausible modes of the entire distribution of likely poses. This in turn allows for a more graceful treatment of geometric perception scenarios where establishing one-to-one matches between points is conceptually ill-defined, such as textureless, symmetrical and/or occluded objects and scenes where the correct pose is uncertain or there are multiple equally valid solutions.

7.Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

Authors:Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

Abstract: Developing embodied agents in simulation has been a key research topic in recent years. Exciting new tasks, algorithms, and benchmarks have been developed in various simulators. However, most of them assume deaf agents in silent environments, while we humans perceive the world with multiple senses. We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear. Sonicverse models realistic continuous audio rendering in 3D environments in real-time. Together with a new audio-visual VR interface that allows humans to interact with agents with audio, Sonicverse enables a series of embodied AI tasks that need audio-visual perception. For semantic audio-visual navigation in particular, we also propose a new multi-task learning model that achieves state-of-the-art performance. In addition, we demonstrate Sonicverse's realism via sim-to-real transfer, which has not been achieved by other simulators: an agent trained in Sonicverse can successfully perform audio-visual navigation in real-world environments. Sonicverse is available at:

8.Train Offline, Test Online: A Real Robot Learning Benchmark

Authors:Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta

Abstract: Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods on common tasks and an open-source dataset of these tasks for offline training. Its manipulation task suite requires challenging generalization to unseen objects, positions, and lighting. We present initial results on TOTO comparing five pretrained visual representations and four offline policy learning baselines, remotely contributed by five institutions. The real promise of TOTO, however, lies in the future: we release the benchmark for additional submissions from any user, enabling easy, direct comparison to several methods without the need to obtain hardware or collect data.

9.LIV: Language-Image Representations and Rewards for Robotic Control

Authors:Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman

Abstract: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks specified as language or image goals. We use LIV to pre-train the first control-centric vision-language representation from large human video datasets such as EpicKitchen. Given only a language or image goal, the pre-trained LIV model can assign dense rewards to each frame in videos of unseen robots or humans attempting that task in unseen environments. Further, when some target domain-specific data is available, the same objective can be used to fine-tune and improve LIV and even other pre-trained representations for robotic control and reward specification in that domain. In our experiments on several simulated and real-world robot environments, LIV models consistently outperform the best prior input state representations for imitation learning, as well as reward specification methods for policy synthesis. Our results validate the advantages of joint vision-language representation and reward learning within the unified, compact LIV framework.

1.A Surrogate Model Framework for Explainable Autonomous Behaviour

Authors:Konstantinos Gavriilidis, Andrea Munafo, Wei Pang, Helen Hastie

Abstract: Adoption and deployment of robotic and autonomous systems in industry are currently hindered by the lack of transparency, required for safety and accountability. Methods for providing explanations are needed that are agnostic to the underlying autonomous system and easily updated. Furthermore, different stakeholders with varying levels of expertise, will require different levels of information. In this work, we use surrogate models to provide transparency as to the underlying policies for behaviour activation. We show that these surrogate models can effectively break down autonomous agents' behaviour into explainable components for use in natural language explanations.

2.Adaptive and Explainable Deployment of Navigation Skills via Hierarchical Deep Reinforcement Learning

Authors:Kyowoon Lee, Seongun Kim, Jaesik Choi

Abstract: For robotic vehicles to navigate robustly and safely in unseen environments, it is crucial to decide the most suitable navigation policy. However, most existing deep reinforcement learning based navigation policies are trained with a hand-engineered curriculum and reward function which are difficult to be deployed in a wide range of real-world scenarios. In this paper, we propose a framework to learn a family of low-level navigation policies and a high-level policy for deploying them. The main idea is that, instead of learning a single navigation policy with a fixed reward function, we simultaneously learn a family of policies that exhibit different behaviors with a wide range of reward functions. We then train the high-level policy which adaptively deploys the most suitable navigation skill. We evaluate our approach in simulation and the real world and demonstrate that our method can learn diverse navigation skills and adaptively deploy them. We also illustrate that our proposed hierarchical learning framework presents explainability by providing semantics for the behavior of an autonomous agent.

3.Biography-based Robot Games for Older Adults

Authors:Benedetta Catricalà, Miriam Ledda, Marco Manca, Fabio Paternò, Carmen Santoro, Eleonora Zedda

Abstract: One issue in aging is how to stimulate the cognitive skills of older adults. One way to address it is the use of serious games delivered through humanoid robots, to provide engaging ways to perform exercises to train memory, attention, processing, and planning activities. We present an approach in which a humanoid robot, by using various modalities, propose the games in a way personalised to specific individuals' experiences using their personal memories associated with facts and events that occurred in older adults' life. This personalization can increase their interest and engagement, and thus potentially reduce the cognitive training drop-out.

4.Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Authors:Raphael Trumpp, Martin Büchner, Abhinav Valada, Marco Caccamo

Abstract: Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users.

5.Regulated Pure Pursuit for Robot Path Tracking

Authors:Steve Macenski, Shrijit Singh, Francisco Martin, Jonatan Gines

Abstract: The accelerated deployment of service robots have spawned a number of algorithm variations to better handle real-world conditions. Many local trajectory planning techniques have been deployed on practical robot systems successfully. While most formulations of Dynamic Window Approach and Model Predictive Control can progress along paths and optimize for additional criteria, the use of pure path tracking algorithms is still commonplace. Decades later, Pure Pursuit and its variants continues to be one of the most commonly utilized classes of local trajectory planners. However, few Pure Pursuit variants have been proposed with schema for variable linear velocities - they either assume a constant velocity or fails to address the point at all. This paper presents a variant of Pure Pursuit designed with additional heuristics to regulate linear velocities, built atop the existing Adaptive variant. The Regulated Pure Pursuit algorithm makes incremental improvements on state of the art by adjusting linear velocities with particular focus on safety in constrained and partially observable spaces commonly negotiated by deployed robots. We present experiments with the Regulated Pure Pursuit algorithm on industrial-grade service robots. We also provide a high-quality reference implementation that is freely included ROS 2 Nav2 framework at for fast evaluation.

6.Probabilistic Uncertainty Quantification of Prediction Models with Application to Visual Localization

Authors:Junan Chen, Josephine Monica, Wei-Lun Chao, Mark Campbell

Abstract: The uncertainty quantification of prediction models (e.g., neural networks) is crucial for their adoption in many robotics applications. This is arguably as important as making accurate predictions, especially for safety-critical applications such as self-driving cars. This paper proposes our approach to uncertainty quantification in the context of visual localization for autonomous driving, where we predict locations from images. Our proposed framework estimates probabilistic uncertainty by creating a sensor error model that maps an internal output of the prediction model to the uncertainty. The sensor error model is created using multiple image databases of visual localization, each with ground-truth location. We demonstrate the accuracy of our uncertainty prediction framework using the Ithaca365 dataset, which includes variations in lighting, weather (sunny, snowy, night), and alignment errors between databases. We analyze both the predicted uncertainty and its incorporation into a Kalman-based localization filter. Our results show that prediction error variations increase with poor weather and lighting condition, leading to greater uncertainty and outliers, which can be predicted by our proposed uncertainty model. Additionally, our probabilistic error model enables the filter to remove ad hoc sensor gating, as the uncertainty automatically adjusts the model to the input data

7.Latent Exploration for Reinforcement Learning

Authors:Alberto Silvio Chiappa, Alessandro Marin Vargas, Ann Zixiang Huang, Alexander Mathis

Abstract: In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. Due to the curse of dimensionality, learning policies that map high-dimensional sensory input to motor output is particularly challenging. During training, state of the art methods (SAC, PPO, etc.) explore the environment by perturbing the actuation with independent Gaussian noise. While this unstructured exploration has proven successful in numerous tasks, it ought to be suboptimal for overactuated systems. When multiple actuators, such as motors or muscles, drive behavior, uncorrelated perturbations risk diminishing each other's effect, or modifying the behavior in a task-irrelevant way. While solutions to introduce time correlation across action perturbations exist, introducing correlation across actuators has been largely ignored. Here, we propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network, which can be seamlessly integrated with on- and off-policy algorithms. We demonstrate that the noisy actions generated by perturbing the network's activations can be modeled as a multivariate Gaussian distribution with a full covariance matrix. In the PyBullet locomotion tasks, Lattice-SAC achieves state of the art results, and reaches 18% higher reward than unstructured exploration in the Humanoid environment. In the musculoskeletal control environments of MyoSuite, Lattice-PPO achieves higher reward in most reaching and object manipulation tasks, while also finding more energy-efficient policies with reductions of 20-60%. Overall, we demonstrate the effectiveness of structured action noise in time and actuator space for complex motor control tasks.

8.TOFG: A Unified and Fine-Grained Environment Representation in Autonomous Driving

Authors:Zihao Wen, Yifan Zhang, Xinhong Chen, Jianping Wang

Abstract: In autonomous driving, an accurate understanding of environment, e.g., the vehicle-to-vehicle and vehicle-to-lane interactions, plays a critical role in many driving tasks such as trajectory prediction and motion planning. Environment information comes from high-definition (HD) map and historical trajectories of vehicles. Due to the heterogeneity of the map data and trajectory data, many data-driven models for trajectory prediction and motion planning extract vehicle-to-vehicle and vehicle-to-lane interactions in a separate and sequential manner. However, such a manner may capture biased interpretation of interactions, causing lower prediction and planning accuracy. Moreover, separate extraction leads to a complicated model structure and hence the overall efficiency and scalability are sacrificed. To address the above issues, we propose an environment representation, Temporal Occupancy Flow Graph (TOFG). Specifically, the occupancy flow-based representation unifies the map information and vehicle trajectories into a homogeneous data format and enables a consistent prediction. The temporal dependencies among vehicles can help capture the change of occupancy flow timely to further promote model performance. To demonstrate that TOFG is capable of simplifying the model architecture, we incorporate TOFG with a simple graph attention (GAT) based neural network and propose TOFG-GAT, which can be used for both trajectory prediction and motion planning. Experiment results show that TOFG-GAT achieves better or competitive performance than all the SOTA baselines with less training time.

1.Distributed Hierarchical Distribution Control for Very-Large-Scale Clustered Multi-Agent Systems

Authors:Augustinos D. Saravanos, Yihui Li, Evangelos A. Theodorou

Abstract: As the scale and complexity of multi-agent robotic systems are subject to a continuous increase, this paper considers a class of systems labeled as Very-Large-Scale Multi-Agent Systems (VLMAS) with dimensionality that can scale up to the order of millions of agents. In particular, we consider the problem of steering the state distributions of all agents of a VLMAS to prescribed target distributions while satisfying probabilistic safety guarantees. Based on the key assumption that such systems often admit a multi-level hierarchical clustered structure - where the agents are organized into cliques of different levels - we associate the control of such cliques with the control of distributions, and introduce the Distributed Hierarchical Distribution Control (DHDC) framework. The proposed approach consists of two sub-frameworks. The first one, Distributed Hierarchical Distribution Estimation (DHDE), is a bottom-up hierarchical decentralized algorithm which links the initial and target configurations of the cliques of all levels with suitable Gaussian distributions. The second part, Distributed Hierarchical Distribution Steering (DHDS), is a top-down hierarchical distributed method that steers the distributions of all cliques and agents from the initial to the targets ones assigned by DHDE. Simulation results that scale up to two million agents demonstrate the effectiveness and scalability of the proposed framework. The increased computational efficiency and safety performance of DHDC against related methods is also illustrated. The results of this work indicate the importance of hierarchical distribution control approaches towards achieving safe and scalable solutions for the control of VLMAS. A video with all results is available in .

2.Multi-objective Anti-swing Trajectory Planning of Double-pendulum Tower Crane Operations using Opposition-based Evolutionary Algorithm

Authors:Souravik Dutta, Yiyu Cai, Jianmin Zheng

Abstract: Underactuated tower crane lifting requires time-energy optimal trajectories for the trolley/slew operations and reduction of the unactuated swings resulting from the trolley/jib motion. In scenarios involving non-negligible hook mass or long rig-cable, the hook-payload unit exhibits double-pendulum behaviour, making the problem highly challenging. This article introduces an offline multi-objective anti-swing trajectory planning module for a Computer-Aided Lift Planning (CALP) system of autonomous double-pendulum tower cranes, addressing all the transient state constraints. A set of auxiliary outputs are selected by methodically analyzing the payload swing dynamics and are used to prove the differential flatness property of the crane operations. The flat outputs are parameterized via suitable B\'{e}zier curves to formulate the multi-objective trajectory optimization problems in the flat output space. A novel multi-objective evolutionary algorithm called Collective Oppositional Generalized Differential Evolution 3 (CO-GDE3) is employed as the optimizer. To obtain faster convergence and better consistency in getting a wide range of good solutions, a new population initialization strategy is integrated into the conventional GDE3. The computationally efficient initialization method incorporates various concepts of computational opposition. Statistical comparisons based on trolley and slew operations verify the superiority of convergence and reliability of CO-GDE3 over the standard GDE3. Trolley and slew operations of a collision-free lifting path computed via the path planner of the CALP system are selected for a simulation study. The simulated trajectories demonstrate that the proposed planner can produce time-energy optimal solutions, keeping all the state variables within their respective limits and restricting the hook and payload swings.

1.SEIP: Simulation-based Design and Evaluation of Infrastructure-based Collective Perception

Authors:Ao Qu, Xuhuan Huang, Dajiang Suo

Abstract: Infrastructure-based collective perception, which entails the real-time sharing and merging of sensing data from different roadside sensors for object detection, has shown promise in preventing occlusions for traffic safety and efficiency. However, its adoption has been hindered by the lack of guidance for roadside sensor placement and high costs for ex-post evaluation. For infrastructure projects with limited budgets, the ex-ante evaluation for optimizing the configurations and placements of infrastructure sensors is crucial to minimize occlusion risks at a low cost. This paper presents algorithms and simulation tools to support the ex-ante evaluation of the cost-performance tradeoff in infrastructure sensor deployment for collective perception. More specifically, the deployment of infrastructure sensors is framed as an integer programming problem that can be efficiently solved in polynomial time, achieving near-optimal results with the use of certain heuristic algorithms. The solutions provide guidance on deciding sensor locations, installation heights, and configurations to achieve the balance between procurement cost, physical constraints for installation, and sensing coverage. Additionally, we implement the proposed algorithms in a simulation engine. This allows us to evaluate the effectiveness of each sensor deployment solution through the lens of object detection. The application of the proposed methods was illustrated through a case study on traffic monitoring by using infrastructure LiDARs. Preliminary findings indicate that when working with a tight sensing budget, it is possible that the incremental benefit derived from integrating additional low-resolution LiDARs could surpass that of incorporating more high-resolution ones. The results reinforce the necessity of investigating the cost-performance tradeoff.

2.Safety of autonomous vehicles: A survey on Model-based vs. AI-based approaches

Authors:Dimia Iberraken, Lounis Adouane

Abstract: The growing advancements in Autonomous Vehicles (AVs) have emphasized the critical need to prioritize the absolute safety of AV maneuvers, especially in dynamic and unpredictable environments or situations. This objective becomes even more challenging due to the uniqueness of every traffic situation/condition. To cope with all these very constrained and complex configurations, AVs must have appropriate control architectures with reliable and real-time Risk Assessment and Management Strategies (RAMS). These targeted RAMS must lead to reduce drastically the navigation risks. However, the lack of safety guarantees proves, which is one of the key challenges to be addressed, limit drastically the ambition to introduce more broadly AVs on our roads and restrict the use of AVs to very limited use cases. Therefore, the focus and the ambition of this paper is to survey research on autonomous vehicles while focusing on the important topic of safety guarantee of AVs. For this purpose, it is proposed to review research on relevant methods and concepts defining an overall control architecture for AVs, with an emphasis on the safety assessment and decision-making systems composing these architectures. Moreover, it is intended through this reviewing process to highlight researches that use either model-based methods or AI-based approaches. This is performed while emphasizing the strengths and weaknesses of each methodology and investigating the research that proposes a comprehensive multi-modal design that combines model-based and AI approaches. This paper ends with discussions on the methods used to guarantee the safety of AVs namely: safety verification techniques and the standardization/generalization of safety frameworks.

3.Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization

Authors:Luyao Ye, Zikang Zhou, Jianping Wang

Abstract: Predicting the future trajectories of nearby objects plays a pivotal role in Robotics and Automation such as autonomous driving. While learning-based trajectory prediction methods have achieved remarkable performance on public benchmarks, the generalization ability of these approaches remains questionable. The poor generalizability on unseen domains, a well-recognized defect of data-driven approaches, can potentially harm the real-world performance of trajectory prediction models. We are thus motivated to improve generalization ability of models instead of merely pursuing high accuracy on average. Due to the lack of benchmarks for quantifying the generalization ability of trajectory predictors, we first construct a new benchmark called argoverse-shift, where the data distributions of domains are significantly different. Using this benchmark for evaluation, we identify that the domain shift problem seriously hinders the generalization of trajectory predictors since state-of-the-art approaches suffer from severe performance degradation when facing those out-of-distribution scenes. To enhance the robustness of models against domain shift problem, we propose a plug-and-play strategy for domain normalization in trajectory prediction. Our strategy utilizes the Frenet coordinate frame for modeling and can effectively narrow the domain gap of different scenes caused by the variety of road geometry and topology. Experiments show that our strategy noticeably boosts the prediction performance of the state-of-the-art in domains that were previously unseen to the models, thereby improving the generalization ability of data-driven trajectory prediction methods.

4.FORFIS: A forest fire firefighting simulation tool for education and research

Authors:Marvin Bredlau, Alexander Weber, Alexander Knoll

Abstract: We present a forest fire firefighting simulation tool named FORFIS that is implemented in Python. Unlike other existing software, we focus on a user-friendly software interface with an easy-to-modify software engine. Our tool is published under GNU GPLv3 license and comes with a GUI as well as additional output functionality. The used wildfire model is based on the well-established approach by cellular automata in two variants - a rectangular and a hexagonal cell decomposition of the wildfire area. The model takes wind into account. In addition, our tool allows the user to easily include a customized firefighting strategy for the firefighting agents.

5.Emergent Incident Response for Unmanned Warehouses with Multi-agent Systems*

Authors:Yibo Guo, Mingxin Li, Jingting Zong, Mingliang Xu

Abstract: Unmanned warehouses are an important part of logistics, and improving their operational efficiency can effectively enhance service efficiency. However, due to the complexity of unmanned warehouse systems and their susceptibility to errors, incidents may occur during their operation, most often in inbound and outbound operations, which can decrease operational efficiency. Hence it is crucial to to improve the response to such incidents. This paper proposes a collaborative optimization algorithm for emergent incident response based on Safe-MADDPG. To meet safety requirements during emergent incident response, we investigated the intrinsic hidden relationships between various factors. By obtaining constraint information of agents during the emergent incident response process and of the dynamic environment of unmanned warehouses on agents, the algorithm reduces safety risks and avoids the occurrence of chain accidents; this enables an unmanned system to complete emergent incident response tasks and achieve its optimization objectives: (1) minimizing the losses caused by emergent incidents; and (2) maximizing the operational efficiency of inbound and outbound operations during the response process. A series of experiments conducted in a simulated unmanned warehouse scenario demonstrate the effectiveness of the proposed method.

6.Active Collaborative Localization in Heterogeneous Robot Teams

Authors:Igor Spasojevic, Xu Liu, Alejandro Ribeiro, George J. Pappas, Vijay Kumar

Abstract: Accurate and robust state estimation is critical for autonomous navigation of robot teams. This task is especially challenging for large groups of size, weight, and power (SWAP) constrained aerial robots operating in perceptually-degraded GPS-denied environments. We can, however, actively increase the amount of perceptual information available to such robots by augmenting them with a small number of more expensive, but less resource-constrained, agents. Specifically, the latter can serve as sources of perceptual information themselves. In this paper, we study the problem of optimally positioning (and potentially navigating) a small number of more capable agents to enhance the perceptual environment for their lightweight,inexpensive, teammates that only need to rely on cameras and IMUs. We propose a numerically robust, computationally efficient approach to solve this problem via nonlinear optimization. Our method outperforms the standard approach based on the greedy algorithm, while matching the accuracy of a heuristic evolutionary scheme for global optimization at a fraction of its running time. Ultimately, we validate our solution in both photorealistic simulations and real-world experiments. In these experiments, we use lidar-based autonomous ground vehicles as the more capable agents, and vision-based aerial robots as their SWAP-constrained teammates. Our method is able to reduce drift in visual-inertial odometry by as much as 90%, and it outperforms random positioning of lidar-equipped agents by a significant margin. Furthermore, our method can be generalized to different types of robot teams with heterogeneous perception capabilities. It has a wide range of applications, such as surveying and mapping challenging dynamic environments, and enabling resilience to large-scale perturbations that can be caused by earthquakes or storms.

7.Development of a ROS-based Architecture for Intelligent Autonomous on Demand Last Mile Delivery

Authors:Georg Novtony, Walter Morales-Alvarez, Nikita Smirnov, Cristina Olaverri-Monreal

Abstract: This paper presents the development of the JKU-ITS Last Mile Delivery Robot. The proposed approach utilizes a combination of one 3D LIDAR, RGB-D camera, IMU and GPS sensor on top of a mobile robot slope mower. An embedded computer, running ROS1, is utilized to process the sensor data streams to enable 2D and 3D Simultaneous Localization and Mapping, 2D localization and object detection using a convolutional neural network.

8.Experience Filter: Using Past Experiences on Unseen Tasks or Environments

Authors:Anil Yildiz, Esen Yel, Anthony L. Corso, Kyle H. Wray, Stefan J. Witwicki, Mykel J. Kochenderfer

Abstract: One of the bottlenecks of training autonomous vehicle (AV) agents is the variability of training environments. Since learning optimal policies for unseen environments is often very costly and requires substantial data collection, it becomes computationally intractable to train the agent on every possible environment or task the AV may encounter. This paper introduces a zero-shot filtering approach to interpolate learned policies of past experiences to generalize to unseen ones. We use an experience kernel to correlate environments. These correlations are then exploited to produce policies for new tasks or environments from learned policies. We demonstrate our methods on an autonomous vehicle driving through T-intersections with different characteristics, where its behavior is modeled as a partially observable Markov decision process (POMDP). We first construct compact representations of learned policies for POMDPs with unknown transition functions given a dataset of sequential actions and observations. Then, we filter parameterized policies of previously visited environments to generate policies to new, unseen environments. We demonstrate our approaches on both an actual AV and a high-fidelity simulator. Results indicate that our experience filter offers a fast, low-effort, and near-optimal solution to create policies for tasks or environments never seen before. Furthermore, the generated new policies outperform the policy learned using the entire data collected from past environments, suggesting that the correlation among different environments can be exploited and irrelevant ones can be filtered out.

9.HySST: A Stable Sparse Rapidly-Exploring Random Trees Optimal Motion Planning Algorithm for Hybrid Dynamical Systems

Authors:Nan Wang, Ricardo G. Sanfelice

Abstract: This paper proposes a stable sparse rapidly-exploring random trees (SST) algorithm to solve the optimal motion planning problem for hybrid systems. At each iteration, the proposed algorithm, called HySST, selects a vertex with the lowest cost among all the vertices within the neighborhood of a randomly selected sample and then extends the search tree by flow or jump, which is also chosen randomly when both regimes are possible. In addition, HySST maintains a static set of witness points such that all the vertices within the neighborhood of each witness are pruned except the vertex with the lowest cost. Through a definition of concatenation of functions defined on hybrid time domains, we show that HySST is asymptotically near optimal, namely, the probability of failing to find a motion plan such that its cost is close to the optimal cost approaches zero as the number of iterations of the algorithm increases to infinity. This property is guaranteed under mild conditions on the data defining the motion plan, which include a relaxation of the usual positive clearance assumption imposed in the literature of classical systems. The proposed algorithm is applied to an actuated bouncing ball system and a collision-resilient tensegrity multicopter system so as to highlight its generality and computational features.

1.Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Authors:Haodong He, Hao Fu, Qiang Wang, Shuai Zhou, Wei Liu

Abstract: The social robot navigation is an open and challenging problem. In existing work, separate modules are used to capture spatial and temporal features, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservative nature of navigation policy. In light of this, we present a spatio-temporal transformer-based policy optimization algorithm to enhance the utilization of spatio-temporal features, thereby facilitating the capture of human-robot interactions. Specifically, this paper introduces a gated embedding mechanism that effectively aligns the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduces the conservatism of navigation policies. Experimental results demonstrate the effectiveness of the proposed framework, where our method shows superior performance.

2.Pedestrian Trajectory Forecasting Using Deep Ensembles Under Sensing Uncertainty

Authors:Anshul Nayak, Azim Eskandarian, Zachary Doerzaph, Prasenjit Ghorai

Abstract: One of the fundamental challenges in the prediction of dynamic agents is robustness. Usually, most predictions are deterministic estimates of future states which are over-confident and prone to error. Recently, few works have addressed capturing uncertainty during forecasting of future states. However, these probabilistic estimation methods fail to account for the upstream noise in perception data during tracking. Sensors always have noise and state estimation becomes even more difficult under adverse weather conditions and occlusion. Traditionally, Bayes filters have been used to fuse information from noisy sensors to update states with associated belief. But, they fail to address non-linearities and long-term predictions. Therefore, we propose an end-to-end estimator that can take noisy sensor measurements and make robust future state predictions with uncertainty bounds while simultaneously taking into consideration the upstream perceptual uncertainty. For the current research, we consider an encoder-decoder based deep ensemble network for capturing both perception and predictive uncertainty simultaneously. We compared the current model to other approximate Bayesian inference methods. Overall, deep ensembles provided more robust predictions and the consideration of upstream uncertainty further increased the estimation accuracy for the model.

1.Accelerated K-Serial Stable Coalition for Dynamic Capture and Resource Defense

Authors:Junfeng Chen, Zili Tang, Meng Guo

Abstract: Coalition is an important mean of multi-robot systems to collaborate on common tasks. An effective and adaptive coalition strategy is essential for the online performance in dynamic and unknown environments. In this work, the problem of territory defense by large-scale heterogeneous robotic teams is considered. The tasks include surveillance, capture of dynamic targets, and perimeter defense over valuable resources. Since each robot can choose among many tasks, it remains a challenging problem to coordinate jointly these robots such that the overall utility is maximized. This work proposes a generic coalition strategy called K-serial stable coalition algorithm (KS-COAL). Different from centralized approaches, it is distributed and anytime, meaning that only local communication is required and a K-serial Nash-stable solution is ensured. Furthermore, to accelerate adaptation to dynamic targets and resource distribution that are only perceived online, a heterogeneous graph attention network (HGAN)-based heuristic is learned to select more appropriate parameters and promising initial solutions during local optimization. Compared with manual heuristics or end-to-end predictors, it is shown to both improve online adaptability and retain the quality guarantee. The proposed methods are validated rigorously via large-scale simulations with hundreds of robots, against several strong baselines including GreedyNE and FastMaxSum.

2.PRIMP: PRobabilistically-Informed Motion Primitives for Efficient Affordance Learning from Demonstration

Authors:Sipu Ruan, Weixiao Liu, Xiaoli Wang, Xin Meng, Gregory S. Chirikjian

Abstract: This paper proposes a learning-from-demonstration method using probability densities on the workspaces of robot manipulators. The method, named "PRobabilistically-Informed Motion Primitives (PRIMP)", learns the probability distribution of the end effector trajectories in the 6D workspace that includes both positions and orientations. It is able to adapt to new situations such as novel via poses with uncertainty and a change of viewing frame. The method itself is robot-agnostic, in which the learned distribution can be transferred to another robot with the adaptation to its workspace density. The learned trajectory distribution is then used to guide an optimization-based motion planning algorithm to further help the robot avoid novel obstacles that are unseen during the demonstration process. The proposed methods are evaluated by several sets of benchmark experiments. PRIMP runs more than 5 times faster while generalizing trajectories more than twice as close to both the demonstrations and novel desired poses. It is then combined with our robot imagination method that learns object affordances, illustrating the applicability of PRIMP to learn tool use through physical experiments.

3.Residual Dynamics Learning for Trajectory Tracking for Multi-rotor Aerial Vehicles

Authors:Geesara Kulathunga, Hany Hamed, Alexandr Klimchik

Abstract: This paper presents a technique to cope with the gap between high-level planning, e.g., reference trajectory tracking, and low-level controlling using a learning-based method in the plan-based control paradigm. The technique improves the smoothness of maneuvering through cluttered environments, especially targeting low-speed velocity profiles. In such a profile, external aerodynamic effects that are applied on the quadrotor can be neglected. Hence, we used a simplified motion model to represent the motion of the quadrotor when formulating the Nonlinear Model Predictive Control (NMPC)-based local planner. However, the simplified motion model causes residual dynamics between the high-level planner and the low-level controller. The Sparse Gaussian Process Regression-based technique is proposed to reduce these residual dynamics. The proposed technique is compared with Data-Driven MPC. The comparison results yield that an augmented residual dynamics model-based planner helps to reduce the nominal model error by a factor of 2 on average. Further, we compared the proposed complete framework with four other approaches. The proposed approach outperformed the others in terms of tracking the reference trajectory without colliding with obstacles with less flight time without losing computational efficiency.

4.Enhanced 6D Pose Estimation for Robotic Fruit Picking

Authors:Marco Costanzo, Marco De Simone, Sara Federico, Ciro Natale, Salvatore Pirozzi

Abstract: This paper proposes a novel method to refine the 6D pose estimation inferred by an instance-level deep neural network which processes a single RGB image and that has been trained on synthetic images only. The proposed optimization algorithm usefully exploits the depth measurement of a standard RGB-D camera to estimate the dimensions of the considered object, even though the network is trained on a single CAD model of the same object with given dimensions. The improved accuracy in the pose estimation allows a robot to grasp apples of various types and significantly different dimensions successfully; this was not possible using the standard pose estimation algorithm, except for the fruits with dimensions very close to those of the CAD drawing used in the training process. Grasping fresh fruits without damaging each item also demands a suitable grasp force control. A parallel gripper equipped with special force/tactile sensors is thus adopted to achieve safe grasps with the minimum force necessary to lift the fruits without any slippage and any deformation at the same time, with no knowledge of their weight.

5.Vision-based Safe Autonomous UAV Docking with Panoramic Sensors

Authors:Phuoc Nguyen Thuan, Jorge Peña Queralta, Tomi Westerlund

Abstract: The remarkable growth of unmanned aerial vehicles (UAVs) has also sparked concerns about safety measures during their missions. To advance towards safer autonomous aerial robots, this work presents a vision-based solution to ensuring safe autonomous UAV landings with minimal infrastructure. During docking maneuvers, UAVs pose a hazard to people in the vicinity. In this paper, we propose the use of a single omnidirectional panoramic camera pointing upwards from a landing pad to detect and estimate the position of people around the landing area. The images are processed in real-time in an embedded computer, which communicates with the onboard computer of approaching UAVs to transition between landing, hovering or emergency landing states. While landing, the ground camera also aids in finding an optimal position, which can be required in case of low-battery or when hovering is no longer possible. We use a YOLOv7-based object detection model and a XGBooxt model for localizing nearby people, and the open-source ROS and PX4 frameworks for communication, interfacing, and control of the UAV. We present both simulation and real-world indoor experimental results to show the efficiency of our methods.

6.Individuality in Swarm Robots with the Case Study of Kilobots: Noise, Bug, or Feature?

Authors:Mohsen Raoufi, Pawel Romanczuk, Heiko Hamann

Abstract: Inter-individual differences are studied in natural systems, such as fish, bees, and humans, as they contribute to the complexity of both individual and collective behaviors. However, individuality in artificial systems, such as robotic swarms, is undervalued or even overlooked. Agent-specific deviations from the norm in swarm robotics are usually understood as mere noise that can be minimized, for example, by calibration. We observe that robots have consistent deviations and argue that awareness and knowledge of these can be exploited to serve a task. We measure heterogeneity in robot swarms caused by individual differences in how robots act, sense, and oscillate. Our use case is Kilobots and we provide example behaviors where the performance of robots varies depending on individual differences. We show a non-intuitive example of phototaxis with Kilobots where the non-calibrated Kilobots show better performance than the calibrated supposedly ``ideal" one. We measure the inter-individual variations for heterogeneity in sensing and oscillation, too. We briefly discuss how these variations can enhance the complexity of collective behaviors. We suggest that by recognizing and exploring this new perspective on individuality, and hence diversity, in robotic swarms, we can gain a deeper understanding of these systems and potentially unlock new possibilities for their design and implementation of applications.

7.Failure Detection and Fault Tolerant Control of a Jet-Powered Flying Humanoid Robot

Authors:Gabriele Nava, Daniele Pucci

Abstract: Failure detection and fault tolerant control are fundamental safety features of any aerial vehicle. With the emergence of complex, multi-body flying systems such as jet-powered humanoid robots, it becomes of crucial importance to design fault detection and control strategies for these systems, too. In this paper we propose a fault detection and control framework for the flying humanoid robot iRonCub in case of loss of one turbine. The framework is composed of a failure detector based on turbines rotational speed, a momentum-based flight control for fault response, and an offline reference generator that produces far-from-singularities configurations and accounts for self and jet exhausts collision avoidance. Simulation results with Gazebo and MATLAB prove the effectiveness of the proposed control strategy.

8.L1 Adaptive Resonance Ratio Control for Series Elastic Actuator with Guaranteed Transient Performance

Authors:Feiyan Min, Gao Wang, Xueqin Chen

Abstract: To eliminate the static error, overshoot, and vibration of the series elastic actuator (SEA) position control, the resonance ratio control (RRC) algorithm is improved based on L1 adaptive control(L1AC)method. Based on the analysis of the factors affecting the control performance of SEA, the algorithm schema is proposed, the stability is proved, and the main control parameters are analyzed. The algorithm schema is further improved with gravity compensation, and the predicted error and reference error is reduced to guarantee transient performance. Finally, the effectiveness of the algorithm is validated by simulation and platform experiments. The simulation and experiment results show that the algorithm has good adaptability, can improve transient control performance, and can handle effectively the static error, overshoot, and vibration. In addition, when a link-side collision occurs, the algorithm automatically reduces the link speed and limits the motor current, thus protecting the humans and SEA itself, due to the low pass filter characterization of L1AC to disturbance.

9.Automatic off-line design of robot swarms: exploring the transferability of control software and design methods across different platforms

Authors:Miquel Kegeleirs, David Garzón Ramos, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari

Abstract: Automatic off-line design is an attractive approach to implementing robot swarms. In this approach, a designer specifies a mission for the swarm, and an optimization process generates suitable control software for the individual robots through computer-based simulations. Most relevant literature has focused on effectively transferring control software from simulation to physical robots. For the first time, we investigate (i) whether control software generated via automatic design is transferable across robot platforms and (ii) whether the design methods that generate such control software are themselves transferable. We experiment with two ground mobile platforms with equivalent capabilities. Our measure of transferability is based on the performance drop observed when control software and/or design methods are ported from one platform to another. Results indicate that while the control software generated via automatic design is transferable in some cases, better performance can be achieved when a transferable method is directly applied to the new platform.

10.Modeling and Control of a novel Variable Stiffness three DoF Wrist

Authors:Giuseppe Milazzo Soft Robotics for Human Cooperation and Rehabilitation, Istituto Italiano di Tecnologia, Genova, Italy, Manuel Giuseppe Catalano Soft Robotics for Human Cooperation and Rehabilitation, Istituto Italiano di Tecnologia, Genova, Italy, Antonio Bicchi Soft Robotics for Human Cooperation and Rehabilitation, Istituto Italiano di Tecnologia, Genova, Italy Centro di Ricerca Enrico Piaggio, Università di Pisa, Pisa, Italy, Giorgio Grioli Soft Robotics for Human Cooperation and Rehabilitation, Istituto Italiano di Tecnologia, Genova, Italy Centro di Ricerca Enrico Piaggio, Università di Pisa, Pisa, Italy

Abstract: This paper presents a novel design for a Variable Stiffness 3 DoF actuated wrist to improve task adaptability and safety during interactions with people and objects. The proposed design employs a hybrid serial-parallel configuration to achieve a 3 DoF wrist joint which can actively and continuously vary its overall stiffness thanks to the redundant elastic actuation system, using only four motors. Its stiffness control principle is similar to human muscular impedance regulation, with the shape of the stiffness ellipsoid mostly depending on posture, while the elastic cocontraction modulates its overall size. The employed mechanical configuration achieves a compact and lightweight device that, thanks to its anthropomorphous characteristics, could be suitable for prostheses and humanoid robots. After introducing the design concept of the device, this work provides methods to estimate the posture of the wrist by using joint angle measurements and to modulate its stiffness. Thereafter, this paper describes the first physical implementation of the presented design, detailing the mechanical prototype and electronic hardware, the control architecture, and the associated firmware. The reported experimental results show the potential of the proposed device while highlighting some limitations. To conclude, we show the motion and stiffness behavior of the device with some qualitative experiments.

11.Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

Authors:Juncheng Li, David J. Cappelleri

Abstract: This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments. Suction grasp policies typically employ data-driven approaches, necessitating large-scale, accurately-annotated suction grasp datasets. However, the generation of suction grasp datasets in cluttered environments remains underexplored, leaving uncertainties about the relationship between the object of interest and its surroundings. To address this, we propose a benchmark synthetic dataset, Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million annotated suction grasp poses. The efficient Sim-Suction-Dataset generation process provides novel insights by combining analytical models with dynamic physical simulations to create fast and accurate suction grasp pose annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset, leveraging the synergy of zero-shot text-to-segmentation. Real-world experiments for picking up all objects demonstrate that Sim-Suction-Pointnet achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1 objects (prismatic shape), cluttered level 2 objects (more complex geometry), and cluttered mixed objects, respectively. The Sim-Suction policies outperform state-of-the-art benchmarks tested by approximately 21% in cluttered mixed scenes.

12.Imitating Task and Motion Planning with Visuomotor Transformers

Authors:Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan Salakhutdinov, Dieter Fox

Abstract: Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results at

13.Automatic Extraction of Time-windowed ROS Computation Graphs from ROS Bag Files

Authors:Zhuojun Chen, Michel Albonico, Ivano Malvolta

Abstract: Robotic systems react to different environmental stimuli, potentially resulting in the dynamic reconfiguration of the software controlling such systems. One effect of such dynamism is the reconfiguration of the software architecture reconfiguration of the system at runtime. Such reconfigurations might severely impact the runtime properties of robotic systems, e.g., in terms of performance and energy efficiency. The ROS \emph{rosbag} package enables developers to record and store timestamped data related to the execution of robotic missions, implicitly containing relevant information about the architecture of the monitored system during its execution. In this study, we discuss about our approach for statically extracting (time-windowed) architectural information from ROS bag files. The proposed approach can support the robotics community in better discussing and reasoning the software architecture (and its runtime reconfigurations) of ROS-based systems. We evaluate our approach against hundreds of ROS bag files systematically mined from 4,434 public GitHub repositories.

14.Hierarchical Whole-body Control of the cable-Suspended Aerial Manipulator endowed with Winch-based Actuation

Authors:Yuri Sarkisov, Andre Coelho, Maihara Santos, Min Jun Kim, Dzmitry Tsetserukou, Christian Ott, Konstantin Kondak

Abstract: During operation, aerial manipulation systems are affected by various disturbances. Among them is a gravitational torque caused by the weight of the robotic arm. Common propeller-based actuation is ineffective against such disturbances because of possible overheating and high power consumption. To overcome this issue, in this paper we propose a winchbased actuation for the crane-stationed cable-suspended aerial manipulator. Three winch-controlled suspension rigging cables produce a desired cable tension distribution to generate a wrench that reduces the effect of gravitational torque. In order to coordinate the robotic arm and the winch-based actuation, a model-based hierarchical whole-body controller is adapted. It resolves two tasks: keeping the robotic arm end-effector at the desired pose and shifting the system center of mass in the location with zero gravitational torque. The performance of the introduced actuation system as well as control strategy is validated through experimental studies.

15.Metaheuristic planner for cooperative multi-agent wall construction with UAVs

Authors:Basel Elkhapery, Robert Pěnička, Michal Němec, Mohsin Siddiqui

Abstract: This paper introduces a wall construction planner for Unmanned Aerial Vehicles (UAVs), which uses a Greedy Randomized Adaptive Search Procedure (GRASP) metaheuristic to generate near-time-optimal building plans for even large walls within seconds. This approach addresses one of the most time-consuming and labor-intensive tasks, while also minimizing workers' safety risks. To achieve this, the wall-building problem is modeled as a variant of the Team Orienteering Problem and is formulated as Mixed-Integer Linear Programming (MILP), with added precedence and concurrence constraints that ensure bricks are built in the correct order and without collision between cooperating agents. The GRASP planner is validated in a realistic simulation and demonstrated to find solutions with similar quality as the optimal MILP, but much faster. Moreover, it outperforms all other state-of-the-art planning approaches in the majority of test cases. This paper presents a significant advancement in the field of automated wall construction, demonstrating the potential of UAVs and optimization algorithms in improving the efficiency and safety of construction projects.

16.Learning When to Ask for Help: Transferring Human Knowledge through Part-Time Demonstration

Authors:Ifueko Igbinedion, Sertac Karaman

Abstract: Robots operating alongside humans often encounter unfamiliar environments that make autonomous task completion challenging. Though improving models and increasing dataset size can enhance a robot's performance in unseen environments, dataset generation and model refinement may be impractical in every unfamiliar environment. Approaches that utilize human demonstration through manual operation can aid in generalizing to these unfamiliar environments, but often require significant human effort and expertise to achieve satisfactory task performance. To address these challenges, we propose leveraging part-time human interaction for redirection of robots during failed task execution. We train a lightweight help policy that allows robots to learn when to proceed autonomously or request human assistance at times of uncertainty. By incorporating part-time human intervention, robots recover quickly from their mistakes. Our best performing policy yields a 20 percent increase in path-length weighted success with only a 21 percent human interaction ratio. This approach provides a practical means for robots to interact and learn from humans in real-world settings, facilitating effective task completion without the need for significant human intervention.

17.Aerial Gym -- Isaac Gym Simulator for Aerial Robots

Authors:Mihir Kulkarni, Theodor J. L. Forgaard, Kostas Alexis

Abstract: Developing learning-based methods for navigation of aerial robots is an intensive data-driven process that requires highly parallelized simulation. The full utilization of such simulators is hindered by the lack of parallelized high-level control methods that imitate the real-world robot interface. Responding to this need, we develop the Aerial Gym simulator that can simulate millions of multirotor vehicles parallelly with nonlinear geometric controllers for the Special Euclidean Group SE(3) for attitude, velocity and position tracking. We also develop functionalities for managing a large number of obstacles in the environment, enabling rapid randomization for learning of navigation tasks. In addition, we also provide sample environments having robots with simulated cameras capable of capturing RGB, depth, segmentation and optical flow data in obstacle-rich environments. This simulator is a step towards developing a - currently missing - highly parallelized aerial robot simulation with geometric controllers at a large scale, while also providing a customizable obstacle randomization functionality for navigation tasks. We provide training scripts with compatible reinforcement learning frameworks to navigate the robot to a goal setpoint based on attitude and velocity command interfaces. Finally, we open source the simulator and aim to develop it further to speed up rendering using alternate kernel-based frameworks in order to parallelize ray-casting for depth images thus supporting a larger number of robots.

1.Robust Imaging Sonar-based Place Recognition and Localization in Underwater Environments

Authors:Hogyun Kim, Gilhwan Kang, Seokhwan Jeong, Seungjun Ma, Younggun Cho

Abstract: Place recognition using SOund Navigation and Ranging (SONAR) images is an important task for simultaneous localization and mapping(SLAM) in underwater environments. This paper proposes a robust and efficient imaging SONAR based place recognition, SONAR context, and loop closure method. Unlike previous methods, our approach encodes geometric information based on the characteristics of raw SONAR measurements without prior knowledge or training. We also design a hierarchical searching procedure for fast retrieval of candidate SONAR frames and apply adaptive shifting and padding to achieve robust matching on rotation and translation changes. In addition, we can derive the initial pose through adaptive shifting and apply it to the iterative closest point (ICP) based loop closure factor. We evaluate the performance of SONAR context in the various underwater sequences such as simulated open water, real water tank, and real underwater environments. The proposed approach shows the robustness and improvements of place recognition on various datasets and evaluation metrics. Supplementary materials are available at

2.ACE: Adversarial Correspondence Embedding for Cross Morphology Motion Retargeting from Human to Nonhuman Characters

Authors:Tianyu Li, Jungdam Won, Alexander Clegg, Jeonghwan Kim, Akshara Rai, Sehoon Ha

Abstract: Motion retargeting is a promising approach for generating natural and compelling animations for nonhuman characters. However, it is challenging to translate human movements into semantically equivalent motions for target characters with different morphologies due to the ambiguous nature of the problem. This work presents a novel learning-based motion retargeting framework, Adversarial Correspondence Embedding (ACE), to retarget human motions onto target characters with different body dimensions and structures. Our framework is designed to produce natural and feasible robot motions by leveraging generative-adversarial networks (GANs) while preserving high-level motion semantics by introducing an additional feature loss. In addition, we pretrain a robot motion prior that can be controlled in a latent embedding space and seek to establish a compact correspondence. We demonstrate that the proposed framework can produce retargeted motions for three different characters -- a quadrupedal robot with a manipulator, a crab character, and a wheeled manipulator. We further validate the design choices of our framework by conducting baseline comparisons and a user study. We also showcase sim-to-real transfer of the retargeted motions by transferring them to a real Spot robot.

3.Multi-Abstractive Neural Controller: An Efficient Hierarchical Control Architecture for Interactive Driving

Authors:Xiao Li, Igor Gilitschenski, Guy Rosman, Sertac Karaman, Daniela Rus

Abstract: As learning-based methods make their way from perception systems to planning/control stacks, robot control systems have started to enjoy the benefits that data-driven methods provide. Because control systems directly affect the motion of the robot, data-driven methods, especially black box approaches, need to be used with caution considering aspects such as stability and interpretability. In this paper, we describe a differentiable and hierarchical control architecture. The proposed representation, called \textit{multi-abstractive neural controller}, uses the input image to control the transitions within a novel discrete behavior planner (referred to as the visual automaton generative network, or \textit{vAGN}). The output of a vAGN controls the parameters of a set of dynamic movement primitives which provides the system controls. We train this neural controller with real-world driving data via behavior cloning and show improved explainability, sample efficiency, and similarity to human driving.

4.EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

Authors:Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

Abstract: Embodied AI is a crucial frontier in robotics, capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. To achieve this, we have made the following efforts: (i) We craft a large-scale embodied planning dataset, termed EgoCOT. The dataset consists of carefully selected videos from the Ego4D dataset, along with corresponding high-quality language instructions. Specifically, we generate a sequence of sub-goals with the "Chain of Thoughts" mode for effective embodied planning. (ii) We introduce an efficient training approach to EmbodiedGPT for high-quality plan generation, by adapting a 7B large language model (LLM) to the EgoCOT dataset via prefix tuning. (iii) We introduce a paradigm for extracting task-related features from LLM-generated planning queries to form a closed loop between high-level planning and low-level control. Extensive experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering. Notably, EmbodiedGPT significantly enhances the success rate of the embodied control task by extracting more effective features. It has achieved a remarkable 1.6 times increase in success rate on the Franka Kitchen benchmark and a 1.3 times increase on the Meta-World benchmark, compared to the BLIP-2 baseline fine-tuned with the Ego4D dataset.

5.Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts

Authors:Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler

Abstract: In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).

6.Localizing Multiple Radiation Sources Actively with a Particle Filter

Authors:Tomas Lazna, Ludek Zalud

Abstract: The article discusses the localization of radiation sources whose number and other relevant parameters are not known in advance. The data collection is ensured by an autonomous mobile robot that performs a survey in a defined region of interest populated with static obstacles. The measurement trajectory is information-driven rather than pre-planned. The localization exploits a regularized particle filter estimating the sources' parameters continuously. The dynamic robot control switches between two modes, one attempting to minimize the Shannon entropy and the other aiming to reduce the variance of expected measurements in unexplored parts of the target area; both of the modes maintain safe clearance from the obstacles. The performance of the algorithms was tested in a simulation study based on real-world data acquired previously from three radiation sources exhibiting various activities. Our approach reduces the time necessary to explore the region and to find the sources by approximately 40 %; at present, however, the method is unable to reliably localize sources that have a relatively low intensity. In this context, additional research has been planned to increase the credibility and robustness of the procedure and to improve the robotic platform autonomy.

7.Neural Lyapunov and Optimal Control

Authors:Daniel Layeghi, Steve Tonneau, Michael Mistry

Abstract: Optimal control (OC) is an effective approach to controlling complex dynamical systems. However, traditional approaches to parameterising and learning controllers in optimal control have been ad-hoc, collecting data and fitting it to neural networks. However, this can lead to learnt controllers ignoring constraints like optimality and time variability. We introduce a unified framework that simultaneously solves control problems while learning corresponding Lyapunov or value functions. Our method formulates OC-like mathematical programs based on the Hamilton-Jacobi-Bellman (HJB) equation. We leverage the HJB optimality constraint and its relaxation to learn time-varying value and Lyapunov functions, implicitly ensuring the inclusion of constraints. We show the effectiveness of our approach on linear and nonlinear control-affine problems. Additionally, we demonstrate significant reductions in planning horizons (up to a factor of 25) when incorporating the learnt functions into Model Predictive Controllers.

8.Concurrent Constrained Optimization of Unknown Rewards for Multi-Robot Task Allocation

Authors:Sukriti Singh, Anusha Srikanthan, Vivek Mallampati, Harish Ravichandar

Abstract: Task allocation can enable effective coordination of multi-robot teams to accomplish tasks that are intractable for individual robots. However, existing approaches to task allocation often assume that task requirements or reward functions are known and explicitly specified by the user. In this work, we consider the challenge of forming effective coalitions for a given heterogeneous multi-robot team when task reward functions are unknown. To this end, we first formulate a new class of problems, dubbed COncurrent Constrained Online optimization of Allocation (COCOA). The COCOA problem requires online optimization of coalitions such that the unknown rewards of all the tasks are simultaneously maximized using a given multi-robot team with constrained resources. To address the COCOA problem, we introduce an online optimization algorithm, named Concurrent Multi-Task Adaptive Bandits (CMTAB), that leverages and builds upon continuum-armed bandit algorithms. Experiments involving detailed numerical simulations and a simulated emergency response task reveal that CMTAB can effectively trade-off exploration and exploitation to simultaneously and efficiently optimize the unknown task rewards while respecting the team's resource constraints.

9.Towards Biomechanics-Aware Design of a Steerable Drilling Robot for Spinal Fixation Procedures with Flexible Pedicle Screws

Authors:Susheela Sharma, Yuewan Sun, Sarah Go, Jordan P. Amadio, Mohsen Khadem, Amir Hossein Eskandari, Farshid Alambeigi

Abstract: Towards reducing the failure rate of spinal fixation surgical procedures in osteoporotic patients, we propose a unique biomechanically-aware framework for the design of a novel concentric tube steerable drilling robot (CT-SDR). The proposed framework leverages a patient-specific finite element (FE) biomechanics model developed based on Quantitative Computed Tomography (QCT) scans of the patient's vertebra to calculate a biomechanically-optimal and feasible drilling and implantation trajectory. The FE output is then used as a design requirement for the design and evaluation of the CT-SDR. Providing a balance between the necessary flexibility to create curved optimal trajectories obtained by the FE module with the required strength to not buckle during drilling through a hard simulated bone material, we showed that the CT-SDR can reliably recreate this drilling trajectory with errors between 1.7-2.2%

10.Comparison of Data-Driven Approaches to Configuration Space Approximation

Authors:Gabriel Guo, Hod Lipson

Abstract: Configuration spaces (C-spaces) are an essential component of many robot path-planning algorithms, yet calculating them is a time-consuming task, especially in spaces involving a large number of degrees of freedom (DoF). Here we explore a two-step data-driven approach to C-space approximation: (1) sample (i.e., explicitly calculate) a few configurations; (2) train a machine learning (ML) model on these configurations to predict the collision status of other points in the C-space. We studied multiple factors that impact this approximation process, including model representation, number of DoF (up to 42), collision density, sample size, training set distribution, and desired confidence of predictions. We conclude that XGBoost offers a significant time improvement over other methods, while maintaining low error rates, even in C-Spaces with over 14 DoF.

1.Deep Reinforcement Learning-based Multi-objective Path Planning on the Off-road Terrain Environment for Ground Vehicles

Authors:Guoming Huang, Xiaofang Yuan, Zhixian Liu, Weihua Tan, Xiru Wu, Yaonan Wang

Abstract: Due to the energy-consumption efficiency between up-slope and down-slope is hugely different, a path with the shortest length on a complex off-road terrain environment (2.5D map) is not always the path with the least energy consumption. For any energy-sensitive vehicles, realizing a good trade-off between distance and energy consumption on 2.5D path planning is significantly meaningful. In this paper, a deep reinforcement learning-based 2.5D multi-objective path planning method (DMOP) is proposed. The DMOP can efficiently find the desired path with three steps: (1) Transform the high-resolution 2.5D map into a small-size map. (2) Use a trained deep Q network (DQN) to find the desired path on the small-size map. (3) Build the planned path to the original high-resolution map using a path enhanced method. In addition, the imitation learning method and reward shaping theory are applied to train the DQN. The reward function is constructed with the information of terrain, distance, border. Simulation shows that the proposed method can finish the multi-objective 2.5D path planning task. Also, simulation proves that the method has powerful reasoning capability that enables it to perform arbitrary untrained planning tasks on the same map.

2.Autonomous Control for Orographic Soaring of Fixed-Wing UAVs

Authors:Tom Suys, Sunyou Hwang, Guido C. H. E. de Croon, Bart D. W. Remes

Abstract: We present a novel controller for fixed-wing UAVs that enables autonomous soaring in an orographic wind field, extending flight endurance. Our method identifies soaring regions and addresses position control challenges by introducing a target gradient line (TGL) on which the UAV achieves an equilibrium soaring position, where sink rate and updraft are balanced. Experimental testing validates the controller's effectiveness in maintaining autonomous soaring flight without using any thrust in a non-static wind field. We also demonstrate a single degree of control freedom in a soaring position through manipulation of the TGL.

3.Failure-Sentient Composition For Swarm-Based Drone Services

Authors:Balsam Alkouz, Athman Bouguettaya, Abdallah Lakhdari

Abstract: We propose a novel failure-sentient framework for swarm-based drone delivery services. The framework ensures that those drones that experience a noticeable degradation in their performance (called soft failure) and which are part of a swarm, do not disrupt the successful delivery of packages to a consumer. The framework composes a weighted continual federated learning prediction module to accurately predict the time of failures of individual drones and uptime after failures. These predictions are used to determine the severity of failures at both the drone and swarm levels. We propose a speed-based heuristic algorithm with lookahead optimization to generate an optimal set of services considering failures. Experimental results on real datasets prove the efficiency of our proposed approach in terms of prediction accuracy, delivery times, and execution times.

4.Design and Operation of Autonomous Wheelchair Towing Robot

Authors:Hyunwoo Kang, Jaeho Shin, Jaewook Shin, Youngseok Jang, Seung Jae Lee

Abstract: In this study, a new concept of a wheelchair-towing robot for the facile electrification of manual wheelchairs is introduced. The development of this concept includes the design of towing robot hardware and an autonomous driving algorithm to ensure the safe transportation of patients to their intended destinations inside the hospital. We developed a novel docking mechanism to facilitate easy docking and separation between the towing robot and the manual wheelchair, which is connected to the front caster wheel of the manual wheelchair. The towing robot has a mecanum wheel drive, enabling the robot to move with a high degree of freedom in the standalone driving mode while adhering to kinematic constraints in the docking mode. Our novel towing robot features a camera sensor that can observe the ground ahead which allows the robot to autonomously follow color-coded wayfinding lanes installed in hospital corridors. This study introduces dedicated image processing techniques for capturing the lanes and control algorithms for effectively tracing a path to achieve autonomous path following. The autonomous towing performance of our proposed platform was validated by a real-world experiment in which a hospital environment with colored lanes was created.

5.CTopPRM: Clustering Topological PRM for Planning Multiple Distinct Paths in 3D Environments

Authors:Matej Novosad, Robert Penicka, Vojtech Vonasek

Abstract: In this paper, we propose a new method called Clustering Topological PRM (CTopPRM) for finding multiple homotopically distinct paths in 3D cluttered environments. Finding such distinct paths, e.g., going around an obstacle from a different side, is useful in many applications. Among others, using multiple distinct paths is necessary for optimization-based trajectory planners where found trajectories are restricted to only a single homotopy class of a given path. Distinct paths can also be used to guide sampling-based motion planning and thus increase the effectiveness of planning in environments with narrow passages. Graph-based representation called roadmap is a common representation for path planning and also for finding multiple distinct paths. However, challenging environments with multiple narrow passages require a densely sampled roadmap to capture the connectivity of the environment. Searching such a dense roadmap for multiple paths is computationally too expensive. Therefore, the majority of existing methods construct only a sparse roadmap which, however, struggles to find all distinct paths in challenging environments. To this end, we propose the CTopPRM which creates a sparse graph by clustering an initially sampled dense roadmap. Such a reduced roadmap allows fast identification of homotopically distinct paths captured in the dense roadmap. We show, that compared to the existing methods the CTopPRM improves the probability of finding all distinct paths by almost 20% in tested environments, during same run-time. The source code of our method is released as an open-source package.

6.Large Language Models as Commonsense Knowledge for Large-Scale Task Planning

Authors:Zirui Zhao, Wee Sun Lee, David Hsu

Abstract: Natural language provides a natural interface for human communication, yet it is challenging for robots to comprehend due to its abstract nature and inherent ambiguity. Large language models (LLMs) contain commonsense knowledge that can help resolve language ambiguity and generate possible solutions to abstract specifications. While LLMs have shown promise as few-shot planning policies, their potential for planning complex tasks is not fully tapped. This paper shows that LLMs can be used as both the commonsense model of the world and the heuristic policy in search algorithms such as Monte Carlo Tree Search (MCTS). MCTS explores likely world states sampled from LLMs to facilitate better-reasoned decision-making. The commonsense policy from LLMs guides the search to relevant parts of the tree, substantially reducing the search complexity. We demonstrate the effectiveness of our method in daily task-planning experiments and highlight its advantages over using LLMs solely as policies.

7.Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Authors:Oswin So, Chuchu Fan

Abstract: Tasks for autonomous robotic systems commonly require stabilization to a desired region while maintaining safety specifications. However, solving this multi-objective problem is challenging when the dynamics are nonlinear and high-dimensional, as traditional methods do not scale well and are often limited to specific problem structures. To address this issue, we propose a novel approach to solve the stabilize-avoid problem via the solution of an infinite-horizon constrained optimal control problem (OCP). We transform the constrained OCP into epigraph form and obtain a two-stage optimization problem that optimizes over the policy in the inner problem and over an auxiliary variable in the outer problem. We then propose a new method for this formulation that combines an on-policy deep reinforcement learning algorithm with neural network regression. Our method yields better stability during training, avoids instabilities caused by saddle-point finding, and is not restricted to specific requirements on the problem structure compared to more traditional methods. We validate our approach on different benchmark tasks, ranging from low-dimensional toy examples to an F16 fighter jet with a 17-dimensional state space. Simulation results show that our approach consistently yields controllers that match or exceed the safety of existing methods while providing ten-fold increases in stability performance from larger regions of attraction.

8.MultiSCOPE: Disambiguating In-Hand Object Poses with Proprioception and Tactile Feedback

Authors:Andrea Sipos, Nima Fazeli

Abstract: In this paper, we propose a method for estimating in-hand object poses using proprioception and tactile feedback from a bimanual robotic system. Our method addresses the problem of reducing pose uncertainty through a sequence of frictional contact interactions between the grasped objects. As part of our method, we propose 1) a tool segmentation routine that facilitates contact location and object pose estimation, 2) a loss that allows reasoning over solution consistency between interactions, and 3) a loss to promote converging to object poses and contact locations that explain the external force-torque experienced by each arm. We demonstrate the efficacy of our method in a task-based demonstration both in simulation and on a real-world bimanual platform and show significant improvement in object pose estimation over single interactions. Visit for code and videos.

9.Precise Object Sliding with Top Contact via Asymmetric Dual Limit Surfaces

Authors:Xili Yi, Nima Fazeli

Abstract: In this paper, we discuss the mechanics and planning algorithms to slide an object on a horizontal planar surface via frictional patch contact made with its top surface. Here, we propose an asymmetric dual limit surface model to determine slip boundary conditions for both the top and bottom contact. With this model, we obtain a range of twists that can keep the object in sticking contact with the robot end-effector while slipping on the supporting plane. Based on these constraints, we derive a planning algorithm to slide objects with only top contact to arbitrary goal poses without slippage between end effector and the object. We validate the proposed model empirically and demonstrate its predictive accuracy on a variety of object geometries and motions. We also evaluate the planning algorithm over a variety of objects and goals demonstrate an orientation error improvement of 90\% when compared to methods naive to linear path planners.

1.FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation

Authors:Minho Heo, Youngwoon Lee, Doohyun Lee, Joseph J. Lim

Abstract: Reinforcement learning (RL), imitation learning (IL), and task and motion planning (TAMP) have demonstrated impressive performance across various robotic manipulation tasks. However, these approaches have been limited to learning simple behaviors in current real-world manipulation benchmarks, such as pushing or pick-and-place. To enable more complex, long-horizon behaviors of an autonomous robot, we propose to focus on real-world furniture assembly, a complex, long-horizon robot manipulation task that requires addressing many current robotic manipulation challenges to solve. We present FurnitureBench, a reproducible real-world furniture assembly benchmark aimed at providing a low barrier for entry and being easily reproducible, so that researchers across the world can reliably test their algorithms and compare them against prior work. For ease of use, we provide 200+ hours of pre-collected data (5000+ demonstrations), 3D printable furniture models, a robotic environment setup guide, and systematic task initialization. Furthermore, we provide FurnitureSim, a fast and realistic simulator of FurnitureBench. We benchmark the performance of offline RL and IL algorithms on our assembly tasks and demonstrate the need to improve such algorithms to be able to solve our tasks in the real world, providing ample opportunities for future research.

2.Flying Adversarial Patches: Manipulating the Behavior of Deep Learning-based Autonomous Multirotors

Authors:Pia Hanfeld, Marina M. -C. Höhne, Michael Bussmann, Wolfgang Hönig

Abstract: Autonomous flying robots, e.g. multirotors, often rely on a neural network that makes predictions based on a camera image. These deep learning (DL) models can compute surprising results if applied to input images outside the training domain. Adversarial attacks exploit this fault, for example, by computing small images, so-called adversarial patches, that can be placed in the environment to manipulate the neural network's prediction. We introduce flying adversarial patches, where an image is mounted on another flying robot and therefore can be placed anywhere in the field of view of a victim multirotor. For an effective attack, we compare three methods that simultaneously optimize the adversarial patch and its position in the input image. We perform an empirical validation on a publicly available DL model and dataset for autonomous multirotors. Ultimately, our attacking multirotor would be able to gain full control over the motions of the victim multirotor.

3.Combinatorial-hybrid Optimization for Multi-agent Systems under Collaborative Tasks

Authors:Zili Tang, Junfeng Chen, Meng Guo

Abstract: Multi-agent systems can be extremely efficient when working concurrently and collaboratively, e.g., for transportation, maintenance, search and rescue. Coordination of such teams often involves two aspects: (i) selecting appropriate sub-teams for different tasks; (ii) designing collaborative control strategies to execute these tasks. The former aspect can be combinatorial w.r.t. the team size, while the latter requires optimization over joint state-spaces under geometric and dynamic constraints. Existing work often tackles one aspect by assuming the other is given, while ignoring their close dependency. This work formulates such problems as combinatorial-hybrid optimizations (CHO), where both the discrete modes of collaboration and the continuous control parameters are optimized simultaneously and iteratively. The proposed framework consists of two interleaved layers: the dynamic formation of task coalitions and the hybrid optimization of collaborative behaviors. Overall feasibility and costs of different coalitions performing various tasks are approximated at different granularities to improve the computational efficiency. At last, a Nash-stable strategy for both task assignment and execution is derived with provable guarantee on the feasibility and quality. Two non-trivial applications of collaborative transportation and dynamic capture are studied against several baselines.

4.End-to-End Stable Imitation Learning via Autonomous Neural Dynamic Policies

Authors:Dionis Totsila, Konstantinos Chatzilygeroudis, Denis Hadjivelichkov, Valerio Modugno, Ioannis Hatzilygeroudis, Dimitrios Kanoulas

Abstract: State-of-the-art sensorimotor learning algorithms offer policies that can often produce unstable behaviors, damaging the robot and/or the environment. Traditional robot learning, on the contrary, relies on dynamical system-based policies that can be analyzed for stability/safety. Such policies, however, are neither flexible nor generic and usually work only with proprioceptive sensor states. In this work, we bridge the gap between generic neural network policies and dynamical system-based policies, and we introduce Autonomous Neural Dynamic Policies (ANDPs) that: (a) are based on autonomous dynamical systems, (b) always produce asymptotically stable behaviors, and (c) are more flexible than traditional stable dynamical system-based policies. ANDPs are fully differentiable, flexible generic-policies that can be used in imitation learning setups while ensuring asymptotic stability. In this paper, we explore the flexibility and capacity of ANDPs in several imitation learning tasks including experiments with image observations. The results show that ANDPs combine the benefits of both neural network-based and dynamical system-based methods.

5.Bio-inspired spike-based Hippocampus and Posterior Parietal Cortex models for robot navigation and environment pseudo-mapping

Authors:Daniel Casanueva-Morato, Alvaro Ayuso-Martinez, Juan P. Dominguez-Morales, Angel Jimenez-Fernandez, Gabriel Jimenez-Moreno, Fernando Perez-Pena

Abstract: The brain has a great capacity for computation and efficient resolution of complex problems, far surpassing modern computers. Neuromorphic engineering seeks to mimic the basic principles of the brain to develop systems capable of achieving such capabilities. In the neuromorphic field, navigation systems are of great interest due to their potential applicability to robotics, although these systems are still a challenge to be solved. This work proposes a spike-based robotic navigation and environment pseudomapping system formed by a bio-inspired hippocampal memory model connected to a Posterior Parietal Cortex model. The hippocampus is in charge of maintaining a representation of an environment state map, and the PPC is in charge of local decision-making. This system was implemented on the SpiNNaker hardware platform using Spiking Neural Networks. A set of real-time experiments was applied to demonstrate the correct functioning of the system in virtual and physical environments on a robotic platform. The system is able to navigate through the environment to reach a goal position starting from an initial position, avoiding obstacles and mapping the environment. To the best of the authors knowledge, this is the first implementation of an environment pseudo-mapping system with dynamic learning based on a bio-inspired hippocampal memory.

6.Geometric Facts Underlying Algorithms of Robot Navigation for Tight Circumnavigation of Group Objects through Singular Inter-Object Gaps

Authors:Valerii Chernov, Alexey Matveev

Abstract: An underactuated nonholonomic Dubins-vehicle-like robot with a lower-limited turning radius travels with a constant speed in a plane, which hosts unknown complex objects. The robot has to approach and then circumnavigate all objects, with maintaining a given distance to the currently nearest of them. So the ideal targeted path is the equidistant curve of the entire set of objects. The focus is on the case where this curve cannot be perfectly traced due to excessive contortions and singularities. So the objective shapes into that of automatically finding, approaching and repeatedly tracing an approximation of the equidistant curve that is the best among those trackable by the robot. The paper presents some geometric facts that are in demand in research on reactive tight circumnavigation of group objects in the delineated situation.

7.Robots in the Garden: Artificial Intelligence and Adaptive Landscapes

Authors:Zihao Zhang, Susan L. Epstein, Casey Breen, Sophia Xia, Zhigang Zhu, Christian Volkmann

Abstract: This paper introduces ELUA, the Ecological Laboratory for Urban Agriculture, a collaboration among landscape architects, architects and computer scientists who specialize in artificial intelligence, robotics and computer vision. ELUA has two gantry robots, one indoors and the other outside on the rooftop of a 6-story campus building. Each robot can seed, water, weed, and prune in its garden. To support responsive landscape research, ELUA also includes sensor arrays, an AI-powered camera, and an extensive network infrastructure. This project demonstrates a way to integrate artificial intelligence into an evolving urban ecosystem, and encourages landscape architects to develop an adaptive design framework where design becomes a long-term engagement with the environment.

8.Can we hear physical and social space together through prosody?

Authors:Ambre Davat GIPSA-PCMD,LIG, Véronique Aubergé LIG, Gang Feng GIPSA-lab

Abstract: When human listeners try to guess the spatial position of a speech source, they are influenced by the speaker's production level, regardless of the intensity level reaching their ears. Because the perception of distance is a very difficult task, they rely on their own experience, which tells them that a whispering talker is close to them, and that a shouting talker is far away. This study aims to test if similar results could be obtained for prosodic variations produced by a human speaker in an everyday life environment. It consists in a localization task, during which blindfolded subjects had to estimate the incoming voice direction, speaker orientation and distance of a trained female speaker, who uttered single words, following instructions concerning intensity and social-affect to be performed. This protocol was implemented in two experiments. First, a complex pretext task was used in order to distract the subjects from the strange behavior of the speaker. On the contrary, during the second experiment, the subjects were fully aware of the prosodic variations, which allowed them to adapt their perception. Results show the importance of the pretext task, and suggest that the perception of the speaker's orientation can be influenced by voice intensity.

9.Learning Pedestrian Actions to Ensure Safe Autonomous Driving

Authors:Jia Huang, Alvika Gautam, Srikanth Saripalli

Abstract: To ensure safe autonomous driving in urban environments with complex vehicle-pedestrian interactions, it is critical for Autonomous Vehicles (AVs) to have the ability to predict pedestrians' short-term and immediate actions in real-time. In recent years, various methods have been developed to study estimating pedestrian behaviors for autonomous driving scenarios, but there is a lack of clear definitions for pedestrian behaviors. In this work, the literature gaps are investigated and a taxonomy is presented for pedestrian behavior characterization. Further, a novel multi-task sequence to sequence Transformer encoders-decoders (TF-ed) architecture is proposed for pedestrian action and trajectory prediction using only ego vehicle camera observations as inputs. The proposed approach is compared against an existing LSTM encoders decoders (LSTM-ed) architecture for action and trajectory prediction. The performance of both models is evaluated on the publicly available Joint Attention Autonomous Driving (JAAD) dataset, CARLA simulation data as well as real-time self-driving shuttle data collected on university campus. Evaluation results illustrate that the proposed method reaches an accuracy of 81% on action prediction task on JAAD testing data and outperforms the LSTM-ed by 7.4%, while LSTM counterpart performs much better on trajectory prediction task for a prediction sequence length of 25 frames.

10.Optimality Principles in Spacecraft Neural Guidance and Control

Authors:Dario Izzo, Emmanuel Blazquez, Robin Ferede, Sebastien Origer, Christophe De Wagter, Guido C. H. E. de Croon

Abstract: Spacecraft and drones aimed at exploring our solar system are designed to operate in conditions where the smart use of onboard resources is vital to the success or failure of the mission. Sensorimotor actions are thus often derived from high-level, quantifiable, optimality principles assigned to each task, utilizing consolidated tools in optimal control theory. The planned actions are derived on the ground and transferred onboard where controllers have the task of tracking the uploaded guidance profile. Here we argue that end-to-end neural guidance and control architectures (here called G&CNets) allow transferring onboard the burden of acting upon these optimality principles. In this way, the sensor information is transformed in real time into optimal plans thus increasing the mission autonomy and robustness. We discuss the main results obtained in training such neural architectures in simulation for interplanetary transfers, landings and close proximity operations, highlighting the successful learning of optimality principles by the neural model. We then suggest drone racing as an ideal gym environment to test these architectures on real robotic platforms, thus increasing confidence in their utilization on future space exploration missions. Drone racing shares with spacecraft missions both limited onboard computational capabilities and similar control structures induced from the optimality principle sought, but it also entails different levels of uncertainties and unmodelled effects. Furthermore, the success of G&CNets on extremely resource-restricted drones illustrates their potential to bring real-time optimal control within reach of a wider variety of robotic systems, both in space and on Earth.

11.PALoc: Robust Prior-assisted Trajectory Generation for Benchmarking

Authors:Xiangcheng Hu, Jin Wu, Jianhao Jiao, Ruoyu Geng, Ming Liu

Abstract: Evaluating simultaneous localization and mapping (SLAM) algorithms necessitates high-precision and dense ground truth (GT) trajectories. But obtaining desirable GT trajectories is sometimes challenging without GT tracking sensors. As an alternative, in this paper, we propose a novel prior-assisted SLAM system to generate a full six-degree-of-freedom ($6$-DOF) trajectory at around $10$Hz for benchmarking under the framework of the factor graph. Our degeneracy-aware map factor utilizes a prior point cloud map and LiDAR frame for point-to-plane optimization, simultaneously detecting degeneration cases to reduce drift and enhancing the consistency of pose estimation. Our system is seamlessly integrated with cutting-edge odometry via a loosely coupled scheme to generate high-rate and precise trajectories. Moreover, we propose a norm-constrained gravity factor for stationary cases, optimizing pose and gravity to boost performance. Extensive evaluations demonstrate our algorithm's superiority over existing SLAM or map-based methods in diverse scenarios in terms of precision, smoothness, and robustness. Our approach substantially advances reliable and accurate SLAM evaluation methods, fostering progress in robotics research.

12.DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Authors:Zixing Wang, Ahmed H. Qureshi

Abstract: Recent research efforts have yielded significant advancements in manipulating objects under homogeneous settings where the robot is required to either manipulate rigid or deformable (soft) objects. However, the manipulation under heterogeneous setups that involve both deformable and rigid objects remains an unexplored area of research. Such setups are common in various scenarios that involve the transportation of heavy objects via ropes, e.g., on factory floors, at disaster sites, and in forestry. To address this challenge, we introduce DeRi-Bot, the first framework that enables the collaborative manipulation of rigid objects with deformable objects. Our framework comprises an Action Prediction Network (APN) and a Configuration Prediction Network (CPN) to model the complex pattern and stochasticity of soft-rigid body systems. We demonstrate the effectiveness of DeRi-Bot in moving rigid objects to a target position with ropes connected to robotic arms. Furthermore, DeRi-Bot is a distributive method that can accommodate an arbitrary number of robots or human partners without reconfiguration or retraining. We evaluate our framework in both simulated and real-world environments and show that it achieves promising results with strong generalization across different types of objects and multi-agent settings, including human-robot collaboration.

13.Real-life Implementation of Internet of Robotic Things Using 5 DoF Heterogeneous Robotic Arm

Authors:Sayed Erfan Arefin, Tasnia Ashrafi Heya, Jia Uddin

Abstract: Establishing a communication bridge by transferring data driven from different embedded sensors via internet or reconcilable network protocols between enormous number of distinctively addressable objects or "things", is known as the Internet of Things (IoT). IoT can be amalgamated with multitudinous objects such as thermostats, cars, lights, refrigerators, and many more appliances which will be able to build a connection via internet. Where objects of our diurnal life can establish a network connection and get smarter with IoT, robotics can be another aspect which will get beneficial to be brought under the concept of IoT and is able to add a new perception in robotics having "Mechanical Smart Intelligence" which is generally called "Internet of Robotic Things" (IoRT). A robotic arm is a part of robotics where it is usually a programmable mechanical arm which has human arm like functionalities. In this paper, IoRT will be represented by a 5 DoF (degree of freedoms) Robotic Arm which will be able to communicate as an IoRT device, controlled with heterogeneous devices using IoT and "Cloud Robotics".

1.Risk-Sensitive Extended Kalman Filter

Authors:Armand Jordana, Avadesh Meduri, Etienne Arlaud, Justin Carpentier, Ludovic Righetti

Abstract: In robotics, designing robust algorithms in the face of estimation uncertainty is a challenging task. Indeed, controllers often do not consider the estimation uncertainty and only rely on the most likely estimated state. Consequently, sudden changes in the environment or the robot's dynamics can lead to catastrophic behaviors. In this work, we present a risk-sensitive Extended Kalman Filter that allows doing output-feedback Model Predictive Control (MPC) safely. This filter adapts its estimation to the control objective. By taking a pessimistic estimate concerning the value function resulting from the MPC controller, the filter provides increased robustness to the controller in phases of uncertainty as compared to a standard Extended Kalman Filter (EKF). Moreover, the filter has the same complexity as an EKF, so that it can be used for real-time model-predictive control. The paper evaluates the risk-sensitive behavior of the proposed filter when used in a nonlinear model-predictive control loop on a planar drone and industrial manipulator in simulation, as well as on an external force estimation task on a real quadruped robot. These experiments demonstrate the abilities of the approach to improve performance in the face of uncertainties significantly.

2.Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer

Authors:Dianzhao Li, Ostap Okhrin

Abstract: To achieve fully autonomous driving, vehicles must be capable of continuously performing various driving tasks, including lane keeping and car following, both of which are fundamental and well-studied driving ones. However, previous studies have mainly focused on individual tasks, and car following tasks have typically relied on complete leader-follower information to attain optimal performance. To address this limitation, we propose a vision-based deep reinforcement learning (DRL) agent that can simultaneously perform lane keeping and car following maneuvers. To evaluate the performance of our DRL agent, we compare it with a baseline controller and use various performance metrics for quantitative analysis. Furthermore, we conduct a real-world evaluation to demonstrate the Sim2Real transfer capability of the trained DRL agent. To the best of our knowledge, our vision-based car following and lane keeping agent with Sim2Real transfer capability is the first of its kind.

3.Time Optimal Ergodic Search

Authors:Dayi Dong, Henry Berger, Ian Abraham

Abstract: Robots with the ability to balance time against the thoroughness of search have the potential to provide time-critical assistance in applications such as search and rescue. Current advances in ergodic coverage-based search methods have enabled robots to completely explore and search an area in a fixed amount of time. However, optimizing time against the quality of autonomous ergodic search has yet to be demonstrated. In this paper, we investigate solutions to the time-optimal ergodic search problem for fast and adaptive robotic search and exploration. We pose the problem as a minimum time problem with an ergodic inequality constraint whose upper bound regulates and balances the granularity of search against time. Solutions to the problem are presented analytically using Pontryagin's conditions of optimality and demonstrated numerically through a direct transcription optimization approach. We show the efficacy of the approach in generating time-optimal ergodic search trajectories in simulation and with drone experiments in a cluttered environment. Obstacle avoidance is shown to be readily integrated into our formulation, and we perform ablation studies that investigate parameter dependence on optimized time and trajectory sensitivity for search.

4.Vehicle Teleoperation: Performance Assessment of SRPT Approach Under State Estimation Errors

Authors:Jai Prakash, Michele Vignati, Edoardo Sabbioni

Abstract: Vehicle teleoperation has numerous potential applications, including serving as a backup solution for autonomous vehicles, facilitating remote delivery services, and enabling hazardous remote operations. However, complex urban scenarios, limited situational awareness, and network delay increase the cognitive workload of human operators and degrade teleoperation performance. To address this, the successive reference pose tracking (SRPT) approach was introduced in earlier work, which transmits successive reference poses to the remote vehicle instead of steering commands. The operator generates reference poses online with the help of a joystick steering and an augmented display, potentially mitigating the detrimental effects of delays. However, it is not clear which minimal set of sensors is essential for the SRPT vehicle teleoperation control loop. This paper tests the robustness of the SRPT approach in the presence of state estimation inaccuracies, environmental disturbances, and measurement noises. The simulation environment, implemented in Simulink, features a 14-dof vehicle model and incorporates difficult maneuvers such as tight corners, double-lane changes, and slalom. Environmental disturbances include low adhesion track regions and strong cross-wind gusts. The results demonstrate that the SRPT approach, using either estimated or actual states, performs similarly under various worst-case scenarios, even without a position sensor requirement. Additionally, the designed state estimator ensures sufficient performance with just an inertial measurement unit, wheel speed encoder, and steer encoder, constituting a minimal set of essential sensors for the SRPT vehicle teleoperation control loop.

5.Real-time and Robust Feature Detection of Continuous Marker Pattern for Dense 3-D Deformation Measurement

Authors:Mingxuan Li, Yen Hang Zhou, Liemin Li, Yao Jiang

Abstract: Visuotactile sensing technology has received much attention in recent years. This article proposes a feature detection method applicable to visuotactile sensors based on continuous marker patterns (CMP) to measure 3-d deformation. First, we construct the feature model of checkerboard-like corners under contact deformation, and design a novel double-layer circular sampler. Then, we propose the judging criteria and response function of corner features by analyzing sampling signals' amplitude-frequency characteristics and circular cross-correlation behavior. The proposed feature detection algorithm fully considers the boundary characteristics retained by the corners with geometric distortion, thus enabling reliable detection at a low calculation cost. The experimental results show that the proposed method has significant advantages in terms of real-time and robustness. Finally, we have achieved the high-density 3-d contact deformation visualization based on this detection method. This technique is able to clearly record the process of contact deformation, thus enabling inverse sensing of dynamic contact processes.

6.Contact Optimization with Learning from Demonstration: Application in Long-term Non-prehensile Planar Manipulation

Authors:Teng Xue, Sylvain Calinon

Abstract: Long-term non-prehensile planar manipulation is a challenging task for planning and control, requiring determination of both continuous and discrete contact configurations, such as contact points and modes. This leads to the non-convexity and hybridness of contact optimization. To overcome these difficulties, we propose a novel approach that incorporates human demonstrations into trajectory optimization. We show that our approach effectively handles the hybrid combinatorial nature of the problem, mitigates the issues with local minima present in current state-of-the-art solvers, and requires only a small number of demonstrations while delivering robust generalization performance. We validate our results in simulation and demonstrate its applicability on a pusher-slider system with a real Franka Emika robot.

1.Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Authors:Yixuan Huang, Nichols Crawford Taylor, Adam Conkey, Weiyu Liu, Tucker Hermans

Abstract: Objects rarely sit in isolation in everyday human environments. If we want robots to operate and perform tasks in our human environments, they must understand how the objects they manipulate will interact with structural elements of the environment for all but the simplest of tasks. As such, we'd like our robots to reason about how multiple objects and environmental elements relate to one another and how those relations may change as the robot interacts with the world. We examine the problem of predicting inter-object and object-environment relations between previously unseen objects and novel environments purely from partial-view point clouds. Our approach enables robots to plan and execute sequences to complete multi-object manipulation tasks defined from logical relations. This removes the burden of providing explicit, continuous object states as goals to the robot. We explore several different neural network architectures for this task. We find the best performing model to be a novel transformer-based neural network that both predicts object-environment relations and learns a latent-space dynamics function. We achieve reliable sim-to-real transfer without any fine-tuning. Our experiments show that our model understands how changes in observed environmental geometry relate to semantic relations between objects. We show more videos on our website:

2.Online Non-linear Centroidal MPC for Humanoid Robots Payload Carrying with Contact-Stable Force Parametrization

Authors:Mohamed Elobaid, Giulio Romualdi, Gabriele Nava, Lorenzo Rapetti, Hosameldin Awadalla Omer Mohamed, Daniele Pucci

Abstract: In this paper we consider the problem of allowing a humanoid robot that is subject to a persistent disturbance, in the form of a payload-carrying task, to follow given planned footsteps. To solve this problem, we combine an online nonlinear centroidal Model Predictive Controller - MPC with a contact stable force parametrization. The cost function of the MPC is augmented with terms handling the disturbance and regularizing the parameter. The performance of the resulting controller is validated both in simulations and on the humanoid robot iCub. Finally, the effect of using the parametrization on the computational time of the controller is briefly studied.

3.Evaluating the validity of a German translation of an uncanniness questionnaire

Authors:Sarah Wingert, Christian Becker-Asano

Abstract: When researching on the acceptance of robots in Human-Robot-Interaction the Uncanny Valley needs to be considered. Reusable and standardized measures for it are essential. In this paper one such questionnaire got translated into German. The translated indices got evaluated (n=140) for reliability with Cronbach's alpha. Additionally the items were tested with an exploratory and a confirmatory factor analysis for problematic correlations. The results yield a good reliability for the translated indices and showed some items that need to be further checked.

4.An Android Robot Head as Embodied Conversational Agent

Authors:Marcel Heisler, Christian Becker-Asano

Abstract: This paper describes, how current Machine Learning (ML) techniques combined with simple rule-based animation routines make an android robot head an embodied conversational agent with ChatGPT as its core component. The android robot head is described, technical details are given of how lip-sync animation is being achieved, and general software design decisions are presented. A public presentation of the system revealed improvement opportunities that are reported and that lead our iterative implementation approach.

5.A Bioinspired Synthetic Nervous System Controller for Pick-and-Place Manipulation

Authors:Yanjun Li, Ravesh Sukhnandan, Jeffrey P. Gill, Hillel J. Chiel, Victoria Webster-Wood, Roger D. Quinn

Abstract: The Synthetic Nervous System (SNS) is a biologically inspired neural network (NN). Due to its capability of capturing complex mechanisms underlying neural computation, an SNS model is a candidate for building compact and interpretable NN controllers for robots. Previous work on SNSs has focused on applying the model to the control of legged robots and the design of functional subnetworks (FSNs) to realize dynamical systems. However, the FSN approach has previously relied on the analytical solution of the governing equations, which is difficult for designing more complex NN controllers. Incorporating plasticity into SNSs and using learning algorithms to tune the parameters offers a promising solution for systematic design in this situation. In this paper, we theoretically analyze the computational advantages of SNSs compared with other classical artificial neural networks. We then use learning algorithms to develop compact subnetworks for implementing addition, subtraction, division, and multiplication. We also combine the learning-based methodology with a bioinspired architecture to design an interpretable SNS for the pick-and-place control of a simulated gantry system. Finally, we show that the SNS controller is successfully transferred to a real-world robotic platform without further tuning of the parameters, verifying the effectiveness of our approach.

6.Deep Reinforcement Learning-Based Control for Stomach Coverage Scanning of Wireless Capsule Endoscopy

Authors:Yameng Zhang, Long Bai, Li Liu, Hongliang Ren, Max Q. -H. Meng

Abstract: Due to its non-invasive and painless characteristics, wireless capsule endoscopy has become the new gold standard for assessing gastrointestinal disorders. Omissions, however, could occur throughout the examination since controlling capsule endoscope can be challenging. In this work, we control the magnetic capsule endoscope for the coverage scanning task in the stomach based on reinforcement learning so that the capsule can comprehensively scan every corner of the stomach. We apply a well-made virtual platform named VR-Caps to simulate the process of stomach coverage scanning with a capsule endoscope model. We utilize and compare two deep reinforcement learning algorithms, the Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) algorithms, to train the permanent magnetic agent, which actuates the capsule endoscope directly via magnetic fields and then optimizes the scanning efficiency of stomach coverage. We analyze the pros and cons of the two algorithms with different hyperparameters and achieve a coverage rate of 98.04% of the stomach area within 150.37 seconds.

7.A Virtual Reality Teleoperation Interface for Industrial Robot Manipulators

Authors:Eric Rosen, Devesh K. Jha

Abstract: We address the problem of teleoperating an industrial robot manipulator via a commercially available Virtual Reality (VR) interface. Previous works on VR teleoperation for robot manipulators focus primarily on collaborative or research robot platforms (whose dynamics and constraints differ from industrial robot arms), or only address tasks where the robot's dynamics are not as important (e.g: pick and place tasks). We investigate the usage of commercially available VR interfaces for effectively teleoeprating industrial robot manipulators in a variety of contact-rich manipulation tasks. We find that applying standard practices for VR control of robot arms is challenging for industrial platforms because torque and velocity control is not exposed, and position control is mediated through a black-box controller. To mitigate these problems, we propose a simplified filtering approach to process command signals to enable operators to effectively teleoperate industrial robot arms with VR interfaces in dexterous manipulation tasks. We hope our findings will help robot practitioners implement and setup effective VR teleoperation interfaces for robot manipulators. The proposed method is demonstrated on a variety of contact-rich manipulation tasks which can also involve very precise movement of the robot during execution (videos can be found at

8.Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control

Authors:AJ Miller, Shamel Fahmi, Matthew Chignoli, Sangbae Kim

Abstract: We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque references. Hence, MIMOC does not require fine-tuning. MIMOC is also less sensitive to modeling and state estimation inaccuracies than model-based controllers. We validate MIMOC on the Mini-Cheetah in outdoor environments over a wide variety of challenging terrain, and on the MIT Humanoid in simulation. We show cases where MIMOC outperforms model-based optimal controllers, and show that imitating torque references improves the policy's performance.

9.The Dilemma of Choice: Addressing Constraint Selection for Autonomous Robotic Agents

Authors:Hardik Parwana, Ruiyang Wang, Dimitra Panagou

Abstract: The tasks that an autonomous agent is expected to perform are often optional or are incompatible with each other owing to the agent's limited actuation capabilities, specifically the dynamics and control input bounds. We encode tasks as time-dependent state constraints and leverage the advances in multi-objective optimization to formulate the problem of choosing tasks as selection of a feasible subset of constraints that can be satisfied for all time and maximizes a performance metric. We show that this problem, although amenable to reachability or mixed integer model predictive control-based analysis in the offline phase, is NP-Hard in general and therefore requires heuristics to be solved efficiently. When incompatibility in constraints is observed under a given policy that imposes task constraints at each time step in an optimization problem, we assign a Lagrange score to each of these constraints based on the variation in the corresponding Lagrange multipliers over the compatible time horizon. These scores are then used to decide the order in which constraints are dropped in a greedy strategy. We further employ a genetic algorithm to improve upon the greedy strategy. We evaluate our method on a robot waypoint following task when the low-level controllers that impose state constraints are described by Control Barrier Function-based Quadratic Programs and provide a comparison with waypoint selection based on knowledge of backward reachable sets.

10.Robust Single-Point Pushing with Force Feedback

Authors:Adam Heins, Angela P. Schoellig

Abstract: We present the first controller for quasistatic robotic planar pushing with single-point contact using only force feedback. We consider a mobile robot equipped with a force-torque sensor to measure the force at the contact point with the pushed object (the "slider"). The parameters of the slider are not known to the controller, nor is feedback on the slider's pose. We assume that the global position of the contact point is always known and that the approximate initial position of the slider is provided. We focus specifically on the case when it is desired to push the slider along a straight line. Simulations and real-world experiments show that our controller yields stable pushes that are robust to a wide range of slider parameters and state perturbations.

11.Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Authors:Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li

Abstract: Foundation models have made significant strides in various applications, including text-to-image generation, panoptic segmentation, and natural language processing. This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks. Specifically, Instruct2Act employs the LLM model to generate Python programs that constitute a comprehensive perception, planning, and action loop for robotic tasks. In the perception section, pre-defined APIs are used to access multiple foundation models where the Segment Anything Model (SAM) accurately locates candidate objects, and CLIP classifies them. In this way, the framework leverages the expertise of foundation models and robotic abilities to convert complex high-level instructions into precise policy codes. Our approach is adjustable and flexible in accommodating various instruction modalities and input types and catering to specific task demands. We validated the practicality and efficiency of our approach by assessing it on robotic tasks in different scenarios within tabletop manipulation domains. Furthermore, our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks. The code for our proposed approach is available at, serving as a robust benchmark for high-level robotic instruction tasks with assorted modality inputs.

1.Automatic Traffic Scenario Conversion from OpenSCENARIO to CommonRoad

Authors:Yuanfei Lin, Michael Ratzel, Matthias Althoff

Abstract: Scenarios are a crucial element for developing, testing, and verifying autonomous driving systems. However, open-source scenarios are often formulated using different terminologies. This limits their usage across different applications as many scenario representation formats are not directly compatible with each other. To address this problem, we present the first open-source converter from the OpenSCENARIO format to the CommonRoad format, which are two of the most popular scenario formats used in autonomous driving. Our converter employs a simulation tool to execute the dynamic elements defined by OpenSCENARIO. The converter is available at and we demonstrate its usefulness by converting publicly available scenarios in the OpenSCENARIO format and evaluating them using CommonRoad tools.

2.Large-Scale Package Manipulation via Learned Metrics of Pick Success

Authors:Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Charles Swan, Kostas Bekris

Abstract: Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention. This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5~million packages per day and has manipulated over 200~million packages during this paper's evaluation period. The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives. To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.

3.Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation

Authors:Maoran Zhu, Yuanxin Wu

Abstract: Inertial-based navigation refers to the navigation methods or systems that have inertial information or sensors as the core part and integrate a spectrum of other kinds of sensors for enhanced performance. Through a series of papers, the authors attempt to explore information blending of inertial-based navigation by a polynomial optimization method. The basic idea is to model rigid motions as finite-order polynomials and then attacks the involved navigation problems by optimally solving their coefficients, taking into considerations the constraints posed by inertial sensors and others. In the current paper, a continuous-time attitude estimation approach is proposed, which transforms the attitude estimation into a constant parameter determination problem by the polynomial optimization. Specifically, the continuous attitude is first approximated by a Chebyshev polynomial, of which the unknown Chebyshev coefficients are determined by minimizing the weighted residuals of initial conditions, dynamics and measurements. We apply the derived estimator to the attitude estimation with the magnetic and inertial sensors. Simulation and field tests show that the estimator has much better stability and faster convergence than the traditional extended Kalman filter does, especially in the challenging large initial state error scenarios.

4.Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search

Authors:Yoonchang Sung, Peter Stone

Abstract: Motion planning seeks a collision-free path in a configuration space (C-space), representing all possible robot configurations in the environment. As it is challenging to construct a C-space explicitly for a high-dimensional robot, we generally build a graph structure called a roadmap, a discrete approximation of a complex continuous C-space, to reason about connectivity. Checking collision-free connectivity in the roadmap requires expensive edge-evaluation computations, and thus, reducing the number of evaluations has become a significant research objective. However, in practice, we often face infeasible problems: those in which there is no collision-free path in the roadmap between the start and the goal locations. Existing studies often overlook the possibility of infeasibility, becoming highly inefficient by performing many edge evaluations. In this work, we address this oversight in scenarios where a prior roadmap is available; that is, the edges of the roadmap contain the probability of being a collision-free edge learned from past experience. To this end, we propose an algorithm called iterative path and cut finding (IPC) that iteratively searches for a path and a cut in a prior roadmap to detect infeasibility while reducing expensive edge evaluations as much as possible. We further improve the efficiency of IPC by introducing a second algorithm, iterative decomposition and path and cut finding (IDPC), that leverages the fact that cut-finding algorithms partition the roadmap into smaller subgraphs. We analyze the theoretical properties of IPC and IDPC, such as completeness and computational complexity, and evaluate their performance in terms of completion time and the number of edge evaluations in large-scale simulations.

1.Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Authors:Keyu Li, Xinyu Mao, Chengwei Ye, Ang Li, Yangxin Xu, Max Q. -H. Meng

Abstract: Robotic ultrasound (US) systems have shown great potential to make US examinations easier and more accurate. Recently, various machine learning techniques have been proposed to realize automatic US image interpretation for robotic US acquisition tasks. However, obtaining large amounts of real US imaging data for training is usually expensive or even unfeasible in some clinical applications. An alternative is to build a simulator to generate synthetic US data for training, but the differences between simulated and real US images may result in poor model performance. This work presents a Sim2Real framework to efficiently learn robotic US image analysis tasks based only on simulated data for real-world deployment. A style transfer module is proposed based on unsupervised contrastive learning and used as a preprocessing step to convert the real US images into the simulation style. Thereafter, a task-relevant model is designed to combine CNNs with vision transformers to generate the task-dependent prediction with improved generalization ability. We demonstrate the effectiveness of our method in an image regression task to predict the probe position based on US images in robotic transesophageal echocardiography (TEE). Our results show that using only simulated US data and a small amount of unlabelled real data for training, our method can achieve comparable performance to semi-supervised and fully supervised learning methods. Moreover, the effectiveness of our previously proposed CT-based US image simulation method is also indirectly confirmed.

2.Graph-based Global Robot Simultaneous Localization and Mapping using Architectural Plans

Authors:Muhammad Shaheer, Jose Andres Millan-Romera, Hriday Bavle, Jose Luis Sanchez-Lopez, Javier Civera, Holger Voos

Abstract: In this paper, we propose a solution for graph-based global robot simultaneous localization and mapping (SLAM) using architectural plans. Before the start of the robot operation, the previously available architectural plan of the building is converted into our proposed architectural graph (A-Graph). When the robot starts its operation, it uses its onboard LIDAR and odometry to carry out an online SLAM relying on our situational graph (S-Graph), which includes both, a representation of the environment with multiple levels of abstractions, such as walls or rooms, and their relationships, as well as the robot poses with their associated keyframes. Our novel graph-to-graph matching method is used to relate the aforementioned S-Graph and A-Graph, which are aligned and merged, resulting in our novel informed Situational Graph (iS-Graph). Our iS-Graph not only provides graph-based global robot localization, but it extends the graph-based SLAM capabilities of the S-Graph by incorporating into it the prior knowledge of the environment existing in the architectural plan

3.Towards Automatic Identification of Globally Valid Geometric Flat Outputs via Numerical Optimization

Authors:Jake Welde, Vijay Kumar

Abstract: Differential flatness enables efficient planning and control for underactuated robotic systems, but we lack a systematic and practical means of identifying a flat output (or determining whether one exists) for an arbitrary robotic system. In this work, we leverage recent results elucidating the role of symmetry in constructing flat outputs for free-flying robotic systems. Using the tools of Riemannian geometry, Lie group theory, and differential forms, we cast the search for a globally valid, equivariant flat output as an optimization problem. An approximate transcription of this continuum formulation to a quadratic program is performed, and its solutions for two example systems achieve precise agreement with the known closed-form flat outputs. Our results point towards a systematic, automated approach to numerically identify geometric flat outputs directly from the system model, particularly useful when complexity renders pen and paper analysis intractable.

4.Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction

Authors:Shubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara, Selim Engin, Jinwook Huh, Volkan Isler

Abstract: Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.

5.Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Authors:Rainer Kartmann, Tamim Asfour

Abstract: Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive robots must be able to learn the sub-symbolic meaning of such concepts from human demonstrations and instructions. We address the problem of incrementally learning geometric models of spatial relations from few demonstrations collected online during interaction with a human. Such models enable a robot to manipulate objects in order to fulfill desired spatial relations specified by verbal instructions. At the start, we assume the robot has no geometric model of spatial relations. Given a task as above, the robot requests the user to demonstrate the task once in order to create a model from a single demonstration, leveraging cylindrical probability distribution as generative representation of spatial relations. We show how this model can be updated incrementally with each new demonstration without access to past examples in a sample-efficient way using incremental maximum likelihood estimation, and demonstrate the approach on a real humanoid robot.

6.InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning

Authors:Lintong Zhang, Tejaswi Digumarti, Georgi Tinchev, Maurice Fallon

Abstract: Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.

7.Revisiting Proprioceptive Sensing for Articulated Object Manipulation

Authors:Thomas Lips, Francis wyffels

Abstract: Robots that assist humans will need to interact with articulated objects such as cabinets or microwaves. Early work on creating systems for doing so used proprioceptive sensing to estimate joint mechanisms during contact. However, nowadays, almost all systems use only vision and no longer consider proprioceptive information during contact. We believe that proprioceptive information during contact is a valuable source of information and did not find clear motivation for not using it in the literature. Therefore, in this paper, we create a system that, starting from a given grasp, uses proprioceptive sensing to open cabinets with a position-controlled robot and a parallel gripper. We perform a qualitative evaluation of this system, where we find that slip between the gripper and handle limits the performance. Nonetheless, we find that the system already performs quite well. This poses the question: should we make more use of proprioceptive information during contact in articulated object manipulation systems, or is it not worth the added complexity, and can we manage with vision alone? We do not have an answer to this question, but we hope to spark some discussion on the matter. The codebase and videos of the system are available at

8.Reward Learning with Intractable Normalizing Functions

Authors:Joshua Hoegerman, Dylan P. Losey

Abstract: Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this paper, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent settings (where each human correction is viewed as completely separate) and conditionally dependent environments (where the human's current correction may build on previous inputs). Across simulations and user studies, our proposed approach infers the human's reward parameters more accurately than the alternate approximations when learning from either demonstrations or corrections. See videos here:

9.RAMP: A Benchmark for Evaluating Robotic Assembly Manipulation and Planning

Authors:Jack Collins, Mark Robson, Jun Yamada, Mohan Sridharan, Karol Janik, Ingmar Posner

Abstract: We introduce RAMP, an open-source robotics benchmark inspired by real-world industrial assembly tasks. RAMP consists of beams that a robot must assemble into specified goal configurations using pegs as fasteners. As such it assesses planning and execution capabilities, and poses challenges in perception, reasoning, manipulation, diagnostics, fault recovery and goal parsing. RAMP has been designed to be accessible and extensible. Parts are either 3D printed or otherwise constructed from materials that are readily obtainable. The part design and detailed instructions are publicly available. In order to broaden community engagement, RAMP incorporates fixtures such as April Tags which enable researchers to focus on individual sub-tasks of the assembly challenge if desired. We provide a full digital twin as well as rudimentary baselines to enable rapid progress. Our vision is for RAMP to form the substrate for a community-driven endeavour that evolves as capability matures.

1.Enabling Failure Recovery for On-The-Move Mobile Manipulation

Authors:Ben Burgess-Limerick, Chris Lehnert Jurgen Leitner, Peter Corke

Abstract: We present a robot base placement and control method that enables a mobile manipulator to gracefully recover from manipulation failures while performing tasks on-the-move. A mobile manipulator in motion has a limited window to complete a task, unlike when stationary where it can make repeated attempts until successful. Existing approaches to manipulation on-the-move are typically based on open-loop execution of planned trajectories which does not allow the base controller to react to manipulation failures, slowing down or stopping as required. To overcome this limitation, we present a reactive base control method that repeatedly evaluates the best base placement given the robot's current state, the immediate manipulation task, as well as the next part of a multi-step task. The result is a system that retains the reliability of traditional mobile manipulation approaches where the base comes to a stop, but leverages the performance gains available by performing manipulation on-the-move. The controller keeps the base in range of the target for as long as required to recover from manipulation failures while making as much progress as possible toward the next objective. See for videos of experiments.

2.AcroMonk: A Minimalist Underactuated Brachiating Robot

Authors:Mahdi Javadi, Daniel Harnack, Paula Stocco, Shivesh Kumar, Shubham Vyas, Daniel Pizzutilo, Frank Kirchner

Abstract: Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper presents the simplest possible prototype of a brachiation robot, using only a single actuator and unactuated grippers. The novel passive gripper design allows it to snap on and release from monkey bars, while guaranteeing well defined start and end poses of the swing. The brachiation behavior is realized in three different ways, using trajectory optimization via direct collocation and stabilization by a model-based time-varying linear quadratic regulator (TVLQR) or model-free proportional derivative (PD) control, as well as by a reinforcement learning (RL) based control policy. The three control schemes are compared in terms of robustness to disturbances, mass uncertainty, and energy consumption. The system design and controllers have been open-sourced. Due to its minimal and open design, the system can serve as a canonical underactuated platform for education and research.

3.A Control Approach for Human-Robot Ergonomic Payload Lifting

Authors:Lorenzo Rapetti, Carlotta Sartore, Mohamed Elobaid, Yeshasvi Tirupachuri, Francesco Draicchio, Tomohiro Kawakami, Takahide Yoshiike, Daniele Pucci

Abstract: Collaborative robots can relief human operators from excessive efforts during payload lifting activities. Modelling the human partner allows the design of safe and efficient collaborative strategies. In this paper, we present a control approach for human-robot collaboration based on human monitoring through whole-body wearable sensors, and interaction modelling through coupled rigid-body dynamics. Moreover, a trajectory advancement strategy is proposed, allowing for online adaptation of the robot trajectory depending on the human motion. The resulting framework allows us to perform payload lifting tasks, taking into account the ergonomic requirements of the agents. Validation has been performed in an experimental scenario using the iCub3 humanoid robot and a human subject sensorized with the iFeel wearable system.

4.Fast Traversability Estimation for Wild Visual Navigation

Authors:Jonas Frey, Matias Mattamala, Nived Chebrolu, Cesar Cadena, Maurice Fallon, Marco Hutter

Abstract: Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we propose Wild Visual Navigation (WVN), an online self-supervised learning system for traversability estimation which uses only vision. The system is able to continuously adapt from a short human demonstration in the field. It leverages high-dimensional features from self-supervised visual transformer models, with an online scheme for supervision generation that runs in real-time on the robot. We demonstrate the advantages of our approach with experiments and ablation studies in challenging environments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex outdoor terrains - negotiating obstacles in high grass as well as a 1.4 km footpath following. While our experiments were executed with a quadruped robot, ANYmal, the approach presented can generalize to any ground robot.

5.NICOL: A Neuro-inspired Collaborative Semi-humanoid Robot that Bridges Social Interaction and Reliable Manipulation

Authors:Matthias Kerzel, Philipp Allgeuer, Erik Strahl, Nicolas Frick, Jan-Gerrit Habekost, Manfred Eppe, Stefan Wermter

Abstract: Robotic platforms that can efficiently collaborate with humans in physical tasks constitute a major goal in robotics. However, many existing robotic platforms are either designed for social interaction or industrial object manipulation tasks. The design of collaborative robots seldom emphasizes both their social interaction and physical collaboration abilities. To bridge this gap, we present the novel semi-humanoid NICOL, the Neuro-Inspired COLlaborator. NICOL is a large, newly designed, scaled-up version of its well-evaluated predecessor, the Neuro-Inspired COmpanion (NICO). While we adopt NICO's head and facial expression display, we extend its manipulation abilities in terms of precision, object size and workspace size. To introduce and evaluate NICOL, we first develop and extend different neural and hybrid neuro-genetic visuomotor approaches initially developed for the NICO to the larger NICOL and its more complex kinematics. Furthermore, we present a novel neuro-genetic approach that improves the grasp accuracy of the NICOL to over 99%, outperforming the state-of-the-art IK solvers KDL, TRACK-IK and BIO-IK. Furthermore, we introduce the social interaction capabilities of NICOL, including the auditory and visual capabilities, but also the face and emotion generation capabilities. Overall, this article presents for the first time the humanoid robot NICOL and, thereby, with the neuro-genetic approaches, contributes to the integration of social robotics and neural visuomotor learning for humanoid robots.

6.Benchmarking UWB-Based Infrastructure-Free Positioning and Multi-Robot Relative Localization: Dataset and Characterization

Authors:Paola Torrico Morón, Sahar Salimpour, Lei Fu, Xianjia Yu, Jorge Peña Queralta, Tomi Westerlund

Abstract: Ultra-wideband (UWB) positioning has emerged as a low-cost and dependable localization solution for multiple use cases, from mobile robots to asset tracking within the Industrial IoT. The technology is mature and the scientific literature contains multiple datasets and methods for localization based on fixed UWB nodes. At the same time, research in UWB-based relative localization and infrastructure-free localization is gaining traction, further domains. tools and datasets in this domain are scarce. Therefore, we introduce in this paper a novel dataset for benchmarking infrastructure-free relative localization targeting the domain of multi-robot systems. Compared to previous datasets, we analyze the performance of different relative localization approaches for a much wider variety of scenarios with varying numbers of fixed and mobile nodes. A motion capture system provides ground truth data, are multi-modal and include inertial or odometry measurements for benchmarking sensor fusion methods. Additionally, the dataset contains measurements of ranging accuracy based on the relative orientation of antennas and a comprehensive set of measurements for ranging between a single pair of nodes. Our experimental analysis shows that high accuracy can be localization, but the variability of the ranging error is significant across different settings and setups.

7.Quadratic Programming-based Reference Spreading Control for Dual-Arm Robotic Manipulation with Planned Simultaneous Impacts

Authors:Jari van Steen, Gijs van den Brandt, Nathan van de Wouw, Jens Kober, Alessandro Saccon

Abstract: With the aim of further enabling the exploitation of intentional impacts in robotic manipulation, a control framework is presented that directly tackles the challenges posed by tracking control of robotic manipulators that are tasked to perform nominally simultaneous impacts. This framework is an extension of the reference spreading control framework, in which overlapping ante- and post-impact references that are consistent with impact dynamics are defined. In this work, such a reference is constructed starting from a teleoperation-based approach. By using the corresponding ante- and post-impact control modes in the scope of a quadratic programming control approach, peaking of the velocity error and control inputs due to impacts is avoided while maintaining high tracking performance. With the inclusion of a novel interim mode, we aim to also avoid input peaks and steps when uncertainty in the environment causes a series of unplanned single impacts to occur rather than the planned simultaneous impact. This work in particular presents for the first time an experimental evaluation of reference spreading control on a robotic setup, showcasing its robustness against uncertainty in the environment compared to two baseline control approaches.

1.An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Authors:Yanmin Wu, Yunzhou Zhang, Delong Zhu, Zhiqiang Deng, Wenkai Sun, Xin Chen, Jian Zhang

Abstract: Object SLAM is considered increasingly significant for robot high-level perception and decision-making. Existing studies fall short in terms of data association, object representation, and semantic mapping and frequently rely on additional assumptions, limiting their performance. In this paper, we present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks. First, we propose an ensemble data association approach for associating objects in complicated conditions by incorporating parametric and nonparametric statistic testing. In addition, we suggest an outlier-robust centroid and scale estimation algorithm for modeling objects based on the iForest and line alignment. Then a lightweight and object-oriented map is represented by estimated general object models. Taking into consideration the semantic invariance of objects, we convert the object map to a topological map to provide semantic descriptors to enable multi-map matching. Finally, we suggest an object-driven active exploration strategy to achieve autonomous mapping in the grasping scenario. A range of public datasets and real-world results in mapping, augmented reality, scene matching, relocalization, and robotic manipulation have been used to evaluate the proposed object SLAM framework for its efficient performance.

2.Learning Quadruped Locomotion using Bio-Inspired Neural Networks with Intrinsic Rhythmicity

Authors:Chuanyu Yang, Can Pu, Tianqi Wei, Cong Wang, Zhibin Li

Abstract: Biological studies reveal that neural circuits located at the spinal cord called central pattern generator (CPG) oscillates and generates rhythmic signals, which are the underlying mechanism responsible for rhythmic locomotion behaviors of animals. Inspired by CPG's capability to naturally generate rhythmic patterns, researchers have attempted to create mathematical models of CPG and utilize them for the locomotion of legged robots. In this paper, we propose a network architecture that incorporates CPGs for rhythmic pattern generation and a multi-layer perceptron (MLP) network for sensory feedback. We also proposed a method that reformulates CPGs into a fully-differentiable stateless network, allowing CPGs and MLP to be jointly trained with gradient-based learning. The results show that our proposed method learned agile and dynamic locomotion policies which are capable of blind traversal over uneven terrain and resist external pushes. Simulation results also show that the learned policies are capable of self-modulating step frequency and step length to adapt to the locomotion velocity.

3.Motion comfort and driver feel: An explorative study about their relation in remote driving

Authors:Georgios Papaioannou, Lin Zhao, Mikael Nybacka, Jenny Jerrelind, Riender Happee, Lars Drugge

Abstract: Teleoperation is considered as a viable option to control fully automated vehicles (AVs) of Level 4 and 5 in special conditions. However, by bringing the remote drivers in the loop, their driving experience should be realistic to secure safe and comfortable remote control.Therefore, the remote control tower should be designed such that remote drivers receive high quality cues regarding the vehicle state and the driving environment. In this direction, the steering feedback could be manipulated to provide feedback to the remote drivers regarding how the vehicle reacts to their commands. However, until now, it is unclear how the remote drivers' steering feel could impact occupant's motion comfort. This paper focuses on exploring how the driver feel in remote (RD) and normal driving (ND) are related with motion comfort. More specifically, different types of steering feedback controllers are applied in (a) the steering system of a Research Concept Vehicle-model E (RCV-E) and (b) the steering system of a remote control tower. An experiment was performed to assess driver feel when the RCV-E is normally and remotely driven. Subjective assessment and objective metrics are employed to assess drivers' feel and occupants' motion comfort in both remote and normal driving scenarios. The results illustrate that motion sickness and ride comfort are only affected by the steering velocity in remote driving, while throttle input variations affect them in normal driving. The results demonstrate that motion sickness and steering velocity increase both around 25$\%$ from normal to remote driving.

4.A Virtual Reality Framework for Human-Robot Collaboration in Cloth Folding

Authors:Marco Moletta, Maciej K. Wozniak, Michael C. Welle, Danica Kragic

Abstract: We present a virtual reality (VR) framework to automate the data collection process in cloth folding tasks. The framework uses skeleton representations to help the user define the folding plans for different classes of garments, allowing for replicating the folding on unseen items of the same class. We evaluate the framework in the context of automating garment folding tasks. A quantitative analysis is performed on 3 classes of garments, demonstrating that the framework reduces the need for intervention by the user. We also compare skeleton representations with RGB and binary images in a classification task on a large dataset of clothing items, motivating the use of the framework for other classes of garments.

5.Dynamically Conservative Self-Driving Planner for Long-Tail Cases

Authors:Weitao Zhou, Zhong Cao, Nanshan Deng, Xiaoyu Liu, Kun Jiang, Diange Yang

Abstract: Self-driving vehicles (SDVs) are becoming reality but still suffer from "long-tail" challenges during natural driving: the SDVs will continually encounter rare, safety-critical cases that may not be included in the dataset they were trained. Some safety-assurance planners solve this problem by being conservative in all possible cases, which may significantly affect driving mobility. To this end, this work proposes a method to automatically adjust the conservative level according to each case's "long-tail" rate, named dynamically conservative planner (DCP). We first define the "long-tail" rate as an SDV's confidence to pass a driving case. The rate indicates the probability of safe-critical events and is estimated using the statistics bootstrapped method with historical data. Then, a reinforcement learning-based planner is designed to contain candidate policies with different conservative levels. The final policy is optimized based on the estimated "long-tail" rate. In this way, the DCP is designed to automatically adjust to be more conservative in low-confidence "long-tail" cases while keeping efficient otherwise. The DCP is evaluated in the CARLA simulator using driving cases with "long-tail" distributed training data. The results show that the DCP can accurately estimate the "long-tail" rate to identify potential risks. Based on the rate, the DCP automatically avoids potential collisions in "long-tail" cases using conservative decisions while not affecting the average velocity in other typical cases. Thus, the DCP is safer and more efficient than the baselines with fixed conservative levels, e.g., an always conservative planner. This work provides a technique to guarantee SDV's performance in unexpected driving cases without resorting to a global conservative setting, which contributes to solving the "long-tail" problem practically.

6.Periscope: A Robotic Camera System to Support Remote Physical Collaboration

Authors:Pragathi Praveena, Yeping Wang, Emmanuel Senft, Michael Gleicher, Bilge Mutlu

Abstract: We investigate how robotic camera systems can offer new capabilities to computer-supported cooperative work through the design, development, and evaluation of a prototype system called Periscope. With Periscope, a local worker completes manipulation tasks with guidance from a remote helper who observes the workspace through a camera mounted on a semi-autonomous robotic arm that is co-located with the worker. Our key insight is that the helper, the worker, and the robot should all share responsibility of the camera view-an approach we call shared camera control. Using this approach, we present a set of modes that distribute the control of the camera between the human collaborators and the autonomous robot depending on task needs. We demonstrate the system's utility and the promise of shared camera control through a preliminary study where 12 dyads collaboratively worked on assembly tasks and discuss design and research implications of our work for future robotic camera system that facilitate remote collaboration.

7.VBOC: Learning the Viability Boundary of a Robot Manipulator using Optimal Control

Authors:Asia La Rocca, Matteo Saveriano, Andrea Del Prete

Abstract: Safety is often the most important requirement in robotics applications. Nonetheless, control techniques that can provide safety guarantees are still extremely rare for nonlinear systems, such as robot manipulators. A well-known tool to ensure safety is the Viability kernel, which is the largest set of states from which safety can be ensured. Unfortunately, computing such a set for a nonlinear system is extremely challenging in general. Several numerical algorithms for approximating it have been proposed in the literature, but they suffer from the curse of dimensionality. This paper presents a new approach for numerically approximating the viability kernel of robot manipulators. Our approach solves optimal control problems to compute states that are guaranteed to be on the boundary of the set. This allows us to learn directly the set boundary, therefore learning in a smaller dimensional space. Compared to the state of the art on systems up to dimension 6, our algorithm resulted to be more than 2 times as accurate for the same computation time, or 6 times as fast to reach the same accuracy.

8.Vision and Control for Grasping Clear Plastic Bags

Authors:Joohwan Seo, Jackson Wagner, Anuj Raicura, Jake Kim

Abstract: We develop two novel vision methods for planning effective grasps for clear plastic bags, as well as a control method to enable a Sawyer arm with a parallel gripper to execute the grasps. The first vision method is based on classical image processing and heuristics (e.g., Canny edge detection) to select a grasp target and angle. The second uses a deep-learning model trained on a human-labeled data set to mimic human grasp decisions. A clustering algorithm is used to de-noise the outputs of each vision method. Subsequently, a workspace PD control method is used to execute each grasp. Of the two vision methods, we find the deep-learning based method to be more effective.

9.Design, Development, and Evaluation of an Interactive Personalized Social Robot to Monitor and Coach Post-Stroke Rehabilitation Exercises

Authors:Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, Sergi Bermúdez i Badia

Abstract: Socially assistive robots are increasingly being explored to improve the engagement of older adults and people with disability in health and well-being-related exercises. However, even if people have various physical conditions, most prior work on social robot exercise coaching systems has utilized generic, predefined feedback. The deployment of these systems still remains a challenge. In this paper, we present our work of iteratively engaging therapists and post-stroke survivors to design, develop, and evaluate a social robot exercise coaching system for personalized rehabilitation. Through interviews with therapists, we designed how this system interacts with the user and then developed an interactive social robot exercise coaching system. This system integrates a neural network model with a rule-based model to automatically monitor and assess patients' rehabilitation exercises and can be tuned with individual patient's data to generate real-time, personalized corrective feedback for improvement. With the dataset of rehabilitation exercises from 15 post-stroke survivors, we demonstrated our system significantly improves its performance to assess patients' exercises while tuning with held-out patient's data. In addition, our real-world evaluation study showed that our system can adapt to new participants and achieved 0.81 average performance to assess their exercises, which is comparable to the experts' agreement level. We further discuss the potential benefits and limitations of our system in practice.

1.Learning-Free Grasping of Unknown Objects Using Hidden Superquadrics

Authors:Yuwei Wu, Weixiao Liu, Zhiyang Liu, Gregory S. Chirikjian

Abstract: Robotic grasping is an essential and fundamental task and has been studied extensively over the past several decades. Traditional work analyzes physical models of the objects and computes force-closure grasps. Such methods require pre-knowledge of the complete 3D model of an object, which can be hard to obtain. Recently with significant progress in machine learning, data-driven methods have dominated the area. Although impressive improvements have been achieved, those methods require a vast amount of training data and suffer from limited generalizability. In this paper, we propose a novel two-stage approach to predicting and synthesizing grasping poses directly from the point cloud of an object without database knowledge or learning. Firstly, multiple superquadrics are recovered at different positions within the object, representing the local geometric features of the object surface. Subsequently, our algorithm exploits the tri-symmetry feature of superquadrics and synthesizes a list of antipodal grasps from each recovered superquadric. An evaluation model is designed to assess and quantify the quality of each grasp candidate. The grasp candidate with the highest score is then selected as the final grasping pose. We conduct experiments on isolated and packed scenes to corroborate the effectiveness of our method. The results indicate that our method demonstrates competitive performance compared with the state-of-the-art without the need for either a full model or prior training.

2.Realistic Safety-critical Scenarios Search for Autonomous Driving System via Behavior Tree

Authors:Ping Zhang, Lingfeng Ming, Tingyi Yuan, Cong Qiu, Yang Li, Xinhua Hui, Zhiquan Zhang, Chao Huang

Abstract: The simulation-based testing of Autonomous Driving Systems (ADSs) has gained significant attention. However, current approaches often fall short of accurately assessing ADSs for two reasons: over-reliance on expert knowledge and the utilization of simplistic evaluation metrics. That leads to discrepancies between simulated scenarios and naturalistic driving environments. To address this, we propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios. Our approach involves the $log2BT$ method, which abstracts logged road-users' trajectories to behavior sequences. Furthermore, we vary the properties of behaviors from real-world driving distributions and then use an adaptive algorithm to explore the input space. Meanwhile, we design a general evaluation engine that guides the algorithm toward critical areas, thus reducing the generation of invalid scenarios. Our approach is demonstrated in our Matrix Simulator. The experimental results show that: (1) Our $log2BT$ achieves satisfactory trajectory reconstructions. (2) Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm. Specifically, it improves the ratio of the critical violations to total scenarios and the ratio of the types to total scenarios by at least 10x and 5x, respectively, while reducing the ratio of the invalid scenarios to total scenarios by at least 58% in two case studies.

3.Semantic and Topological Mapping using Intersection Identification

Authors:Scott Fredriksson, Akshit Saradagi, George Nikolakopoulos

Abstract: This article presents a novel approach to identifying and classifying intersections for semantic and topological mapping. More specifically, the proposed novel approach has the merit of generating a semantically meaningful map containing intersections, pathways, dead ends, and pathways leading to unexplored frontiers. Furthermore, the resulting semantic map can be used to generate a sparse topological map representation, that can be utilized by robots for global navigation. The proposed solution also introduces a built-in filtering to handle noises in the environment, to remove openings in the map that the robot cannot pass, and to remove small objects to optimize and simplify the overall mapping results. The efficacy of the proposed semantic and topological mapping method is demonstrated over a map of an indoor structured environment that is built from experimental data. The proposed framework, when compared with similar state-of-the-art topological mapping solutions, is able to produce a map with up to 89% fewer nodes than the next best solution.

4.Control of a Back-Support Exoskeleton to Assist Carrying Activities

Authors:Maria Lazzaroni, Giorgia Chini, Francesco Draicchio, Christian Di Natali, Darwin G. Caldwell, Jesús Ortiz

Abstract: Back-support exoskeletons are commonly used in the workplace to reduce low back pain risk for workers performing demanding activities. However, for the assistance of tasks differing from lifting, back-support exoskeletons potential has not been exploited extensively. This work focuses on the use of an active back-support exoskeleton to assist carrying. Two control strategies are designed that modulate the exoskeleton torques to comply with the task assistance requirements. In particular, two gait phase detection frameworks are exploited to adapt the assistance according to the legs' motion. The two strategies are assessed through an experimental analysis on ten subjects. Carrying task is performed without and with the exoskeleton assistance. Results prove the potential of the presented controls in assisting the task without hindering the gait movement and improving the usability experienced by users. Moreover, the exoskeleton assistance significantly reduces the lumbar load associated with the task, demonstrating its promising use for risk mitigation in the workplace.

5.Adaptive Graduated Nonconvexity Loss

Authors:Kyungmin Jung, Thomas Hitchcox, James Richard Forbes

Abstract: Many problems in robotics, such as estimating the state from noisy sensor data or aligning two LiDAR point clouds, can be posed and solved as least-squares problems. Unfortunately, vanilla nonminimal solvers for least-squares problems are notoriously sensitive to outliers. As such, various robust loss functions have been proposed to reduce the sensitivity to outliers. Examples of loss functions include pseudo-Huber, Cauchy, and Geman-McClure. Recently, these loss functions have been generalized into a single loss function that enables the best loss function to be found adaptively based on the distribution of the residuals. However, even with the generalized robust loss function, most nonminimal solvers can only be solved locally given a prior state estimate due to the nonconvexity of the problem. The first contribution of this paper is to combine graduated nonconvexity (GNC) with the generalized robust loss function to solve least-squares problems without a prior state estimate and without the need to specify a loss function. Moreover, existing loss functions, including the generalized loss function, are based on Gaussian-like distribution. However, residuals are often defined as the squared norm of a multivariate error and distributed in a Chi-like fashion. The second contribution of this paper is to apply a norm-aware adaptive robust loss function within a GNC framework. This leads to additional robustness when compared with state-of-the-art methods. Simulations and experiments demonstrate that the proposed approach is more robust and yields faster convergence times compared to other GNC formulations.

6.Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

Authors:Huzaifa Mustafa Unjhawala, Ruochun Zhang, Wei Hu, Jinlong Wu, Radu Serban, Dan Negrut

Abstract: In robotics, simulation has the potential to reduce design time and costs, and lead to a more robust engineered solution and a safer development process. However, the use of simulators is predicated on the availability of good models. This contribution is concerned with improving the quality of these models via calibration, which is cast herein in a Bayesian framework. First, we discuss the Bayesian machinery involved in model calibration. Then, we demonstrate it in one example: calibration of a vehicle dynamics model that has low degree of freedom count and can be used for state estimation, model predictive control, or path planning. A high fidelity simulator is used to emulate the ``experiments'' and generate the data for the calibration. The merit of this work is not tied to a new Bayesian methodology for calibration, but to the demonstration of how the Bayesian machinery can establish connections among models in computational dynamics, even when the data in use is noisy. The software used to generate the results reported herein is available in a public repository for unfettered use and distribution.

7.Path-Based Sensors: Will the Knowledge of Correlation in Random Variables Accelerate Information Gathering?

Authors:Alkesh K. Srivastava, George P. Kontoudis, Donald Sofge, Michael Otte

Abstract: Effective communication is crucial for deploying robots in mission-specific tasks, but inadequate or unreliable communication can greatly reduce mission efficacy, for example in search and rescue missions where communication-denied conditions may occur. In such missions, robots are deployed to locate targets, such as human survivors, but they might get trapped at hazardous locations, such as in a trapping pit or by debris. Thus, the information the robot collected is lost owing to the lack of communication. In our prior work, we developed the notion of a path-based sensor. A path-based sensor detects whether or not an event has occurred along a particular path, but it does not provide the exact location of the event. Such path-based sensor observations are well-suited to communication-denied environments, and various studies have explored methods to improve information gathering in such settings. In some missions it is typical for target elements to be in close proximity to hazardous factors that hinder the information-gathering process. In this study, we examine a similar scenario and conduct experiments to determine if additional knowledge about the correlation between hazards and targets improves the efficiency of information gathering. To incorporate this knowledge, we utilize a Bayesian network representation of domain knowledge and develop an algorithm based on this representation. Our empirical investigation reveals that such additional information on correlation is beneficial only in environments with moderate hazard lethality, suggesting that while knowledge of correlation helps, further research and development is necessary for optimal outcomes.

8.Rhino: An Autonomous Robot for Mapping Underground Mine Environments

Authors:Christopher Tatsch, Jonas Amoama Bredu Jnr, Dylan Covell, Ihsan Berk Tulu, Yu Gu

Abstract: There are many benefits for exploring and exploiting underground mines, but there are also significant risks and challenges. One such risk is the potential for accidents caused by the collapse of the pillars, and roofs which can be mitigated through inspections. However, these inspections can be costly and may put the safety of the inspectors at risk. To address this issue, this work presents Rhino, an autonomous robot that can navigate underground mine environments and generate 3D maps. These generated maps will allow mine workers to proactively respond to potential hazards and prevent accidents. The system being developed is a skid-steer, four-wheeled unmanned ground vehicle (UGV) that uses a LiDAR and IMU to perform long-duration autonomous navigation and generation of maps through a LIO-SAM framework. The system has been tested in different environments and terrains to ensure its robustness and ability to operate for extended periods of time while also generating 3D maps.

9.Real-Time Joint Simulation of LiDAR Perception and Motion Planning for Automated Driving

Authors:Zhanhong Huang, Xiao Zhang, Xinming Huang

Abstract: Real-time perception and motion planning are two crucial tasks for autonomous driving. While there are many research works focused on improving the performance of perception and motion planning individually, it is still not clear how a perception error may adversely impact the motion planning results. In this work, we propose a joint simulation framework with LiDAR-based perception and motion planning for real-time automated driving. Taking the sensor input from the CARLA simulator with additive noise, a LiDAR perception system is designed to detect and track all surrounding vehicles and to provide precise orientation and velocity information. Next, we introduce a new collision bound representation that relaxes the communication cost between the perception module and the motion planner. A novel collision checking algorithm is implemented using line intersection checking that is more efficient for long distance range in comparing to the traditional method of occupancy grid. We evaluate the joint simulation framework in CARLA for urban driving scenarios. Experiments show that our proposed automated driving system can execute at 25 Hz, which meets the real-time requirement. The LiDAR perception system has high accuracy within 20 meters when evaluated with the ground truth. The motion planning results in consistent safe distance keeping when tested in CARLA urban driving scenarios.

1.Fast Event-based Double Integral for Real-time Robotics

Authors:Shijie Lin, Yingqiang Zhang, Dongyue Huang, Bin Zhou, Xiaowei Luo, Jia Pan

Abstract: Motion deblurring is a critical ill-posed problem that is important in many vision-based robotics applications. The recently proposed event-based double integral (EDI) provides a theoretical framework for solving the deblurring problem with the event camera and generating clear images at high frame-rate. However, the original EDI is mainly designed for offline computation and does not support real-time requirement in many robotics applications. In this paper, we propose the fast EDI, an efficient implementation of EDI that can achieve real-time online computation on single-core CPU devices, which is common for physical robotic platforms used in practice. In experiments, our method can handle event rates at as high as 13 million event per second in a wide variety of challenging lighting conditions. We demonstrate the benefit on multiple downstream real-time applications, including localization, visual tag detection, and feature matching.

2.Safe motion planning with environment uncertainty

Authors:Antony Thomas, Fulvio Mastrogiovanni, Marco Baglietto

Abstract: We present an approach for safe motion planning under robot state and environment (obstacle and landmark location) uncertainties. To this end, we first develop an approach that accounts for the landmark uncertainties during robot localization. Existing planning approaches assume that the landmark locations are well known or are known with little uncertainty. However, this might not be true in practice. Noisy sensors and imperfect motions compound to the errors originating from the estimate of environment features. Moreover, possible occlusions and dynamic objects in the environment render imperfect landmark estimation. Consequently, not considering this uncertainty can wrongly localize the robot, leading to inefficient plans. Our approach thus incorporates the landmark uncertainty within the Bayes filter estimation framework. We also analyze the effect of considering this uncertainty and delineate the conditions under which it can be ignored. Second, we extend the state-of-the-art by computing an exact expression for the collision probability under Gaussian distributed robot motion, perception and obstacle location uncertainties. We formulate the collision probability process as a quadratic form in random variables. Under Gaussian distribution assumptions, an exact expression for collision probability is thus obtained which is computable in real-time. In contrast, existing approaches approximate the collision probability using upper-bounds that can lead to overly conservative estimate and thereby suboptimal plans. We demonstrate and evaluate our approach using a theoretical example and simulations. We also present a comparison of our approach to different state-of-the-art methods.

3.Shape Formation and Locomotion with Joint Movements in the Amoebot Model

Authors:Andreas Padalkin, Manish Kumar, Christian Scheideler

Abstract: We are considering the geometric amoebot model where a set of $n$ amoebots is placed on the triangular grid. An amoebot is able to send information to its neighbors, and to move via expansions and contractions. Since amoebots and information can only travel node by node, most problems have a natural lower bound of $\Omega(D)$ where $D$ denotes the diameter of the structure. Inspired by the nervous and muscular system, Feldmann et al. have proposed the reconfigurable circuit extension and the joint movement extension of the amoebot model with the goal of breaking this lower bound. In the joint movement extension, the way amoebots move is altered. Amoebots become able to push and pull other amoebots. Feldmann et al. demonstrated the power of joint movements by transforming a line of amoebots into a rhombus within $O(\log n)$ rounds. However, they left the details of the extension open. The goal of this paper is therefore to formalize and extend the joint movement extension. In order to provide a proof of concept for the extension, we consider two fundamental problems of modular robot systems: shape formation and locomotion. We approach these problems by defining meta-modules of rhombical and hexagonal shape, respectively. The meta-modules are capable of movement primitives like sliding, rotating, and tunneling. This allows us to simulate shape formation algorithms of various modular robot systems. Finally, we construct three amoebot structures capable of locomotion by rolling, crawling, and walking, respectively.

4.Sequence-Agnostic Multi-Object Navigation

Authors:Nandiraju Gireesh, Ayush Agrawal, Ahana Datta, Snehasis Banerjee, Mohan Sridharan, Brojeshwar Bhowmick, Madhava Krishna

Abstract: The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.

5.Concentric Tube Robot Redundancy Resolution via Velocity/Compliance Manipulability Optimization

Authors:Jia Shen, Yifan Wang, Milad Azizkhani, Deqiang Qiu, Yue Chen

Abstract: Concentric Tube Robots (CTR) have the potential to enable effective minimally invasive surgeries. While extensive modeling and control schemes have been proposed in the past decade, limited efforts have been made to improve the trajectory tracking performance from the perspective of manipulability , which can be critical to generate safe motion and feasible actuator commands. In this paper, we propose a gradient-based redundancy resolution framework that optimizes velocity/compliance manipulability-based performance indices during trajectory tracking for a kinematically redundant CTR. We efficiently calculate the gradients of manipulabilities by propagating the first- and second-order derivatives of state variables of the Cosserat rod model along the CTR arc length, reducing the gradient computation time by 68\% compared to finite difference method. Task-specific performance indices are optimized by projecting the gradient into the null-space of trajectory tracking. The proposed method is validated in three exemplary scenarios that involve trajectory tracking, obstacle avoidance, and external load compensation, respectively. Simulation results show that the proposed method is able to accomplish the required tasks while commonly used redundancy resolution approaches underperform or even fail.

6.Waterberry Farms: A Novel Benchmark For Informative Path Planning

Authors:Samuel Matloob, Partha P. Datta, O. Patrick Kreidl, Ayan Dutta, Swapnoneel Roy, Ladislau Bölöni

Abstract: Recent developments in robotic and sensor hardware make data collection with mobile robots (ground or aerial) feasible and affordable to a wide population of users. The newly emergent applications, such as precision agriculture, weather damage assessment, or personal home security often do not satisfy the simplifying assumptions made by previous research: the explored areas have complex shapes and obstacles, multiple phenomena need to be sensed and estimated simultaneously and the measured quantities might change during observations. The future progress of path planning and estimation algorithms requires a new generation of benchmarks that provide representative environments and scoring methods that capture the demands of these applications. This paper describes the Waterberry Farms benchmark (WBF) that models a precision agriculture application at a Florida farm growing multiple crop types. The benchmark captures the dynamic nature of the spread of plant diseases and variations of soil humidity while the scoring system measures the performance of a given combination of a movement policy and an information model estimator. By benchmarking several examples of representative path planning and estimator algorithms, we demonstrate WBF's ability to provide insight into their properties and quantify future progress.

7.Learning Video-Conditioned Policies for Unseen Manipulation Tasks

Authors:Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev

Abstract: The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target task. While prior work typically aims to imitate human demonstrations performed in robot environments, here we focus on a more realistic and challenging setup with demonstrations recorded in natural and diverse human environments. We propose Video-conditioned Policy learning (ViP), a data-driven approach that maps human demonstrations of previously unseen tasks to robot manipulation skills. To this end, we learn our policy to generate appropriate actions given current scene observations and a video of the target task. To encourage generalization to new tasks, we avoid particular tasks during training and learn our policy from unlabelled robot trajectories and corresponding robot videos. Both robot and human videos in our framework are represented by video embeddings pre-trained for human action recognition. At test time we first translate human videos to robot videos in the common video embedding space, and then use resulting embeddings to condition our policies. Notably, our approach enables robot control by human demonstrations in a zero-shot manner, i.e., without using robot trajectories paired with human instructions during training. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art. Our method also demonstrates excellent performance in a new challenging zero-shot setup where no paired data is used during training.

8.Joint Metrics Matter: A Better Standard for Trajectory Forecasting

Authors:Erica Weng, Hana Hoshino, Deva Ramanan, Kris Kitani

Abstract: Multi-modal trajectory forecasting methods commonly evaluate using single-agent metrics (marginal metrics), such as minimum Average Displacement Error (ADE) and Final Displacement Error (FDE), which fail to capture joint performance of multiple interacting agents. Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. Consequently, methods optimized for marginal metrics lead to overly-optimistic estimations of performance, which is detrimental to progress in trajectory forecasting research. In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate. We demonstrate the importance of joint metrics as opposed to marginal metrics with quantitative evidence and qualitative examples drawn from the ETH / UCY and Stanford Drone datasets. We introduce a new loss function incorporating joint metrics that, when applied to a SOTA trajectory forecasting method, achieves a 7% improvement in JADE / JFDE on the ETH / UCY datasets with respect to the previous SOTA. Our results also indicate that optimizing for joint metrics naturally leads to an improvement in interaction modeling, as evidenced by a 16% decrease in mean collision rate on the ETH / UCY datasets with respect to the previous SOTA.

9.Non-Euclidean Motion Planning with Graphs of Geodesically-Convex Sets

Authors:Thomas Cohn, Mark Petersen, Max Simchowitz, Russ Tedrake

Abstract: Computing optimal, collision-free trajectories for high-dimensional systems is a challenging problem. Sampling-based planners struggle with the dimensionality, whereas trajectory optimizers may get stuck in local minima due to inherent nonconvexities in the optimization landscape. The use of mixed-integer programming to encapsulate these nonconvexities and find globally optimal trajectories has recently shown great promise, thanks in part to tight convex relaxations and efficient approximation strategies that greatly reduce runtimes. These approaches were previously limited to Euclidean configuration spaces, precluding their use with mobile bases or continuous revolute joints. In this paper, we handle such scenarios by modeling configuration spaces as Riemannian manifolds, and we describe a reduction procedure for the zero-curvature case to a mixed-integer convex optimization problem. We demonstrate our results on various robot platforms, including producing efficient collision-free trajectories for a PR2 bimanual mobile manipulator.

1.Understanding why SLAM algorithms fail in modern indoor environments

Authors:Nwankwo Linus, Elmar Rueckert

Abstract: Simultaneous localization and mapping (SLAM) algorithms are essential for the autonomous navigation of mobile robots. With the increasing demand for autonomous systems, it is crucial to evaluate and compare the performance of these algorithms in real-world environments. In this paper, we provide an evaluation strategy and real-world datasets to test and evaluate SLAM algorithms in complex and challenging indoor environments. Further, we analysed state-of-the-art (SOTA) SLAM algorithms based on various metrics such as absolute trajectory error, scale drift, and map accuracy and consistency. Our results demonstrate that SOTA SLAM algorithms often fail in challenging environments, with dynamic objects, transparent and reflecting surfaces. We also found that successful loop closures had a significant impact on the algorithm's performance. These findings highlight the need for further research to improve the robustness of the algorithms in real-world scenarios.

2.Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement

Authors:Yunke Ao, Hooman Esfandiari, Fabio Carrillo, Yarden As, Mazda Farshad, Benjamin F. Grewe, Andreas Krause, Philipp Fuernstahl

Abstract: Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conducted in critical proximity to vital structures with a limited view of anatomy. Robotic surgery systems have been proposed to improve placement accuracy, however, state-of-the-art systems suffer from the limitations of open-loop approaches, as they follow traditional concepts of preoperative planning and intraoperative registration, without real-time recalculation of the surgical plan. In this paper, we propose an intraoperative planning approach for robotic spine surgery that leverages real-time observation for drill path planning based on Safe Deep Reinforcement Learning (DRL). The main contributions of our method are (1) the capability to guarantee safe actions by introducing an uncertainty-aware distance-based safety filter; and (2) the ability to compensate for incomplete intraoperative anatomical information, by encoding a-priori knowledge about anatomical structures with a network pre-trained on high-fidelity anatomical models. Planning quality was assessed by quantitative comparison with the gold standard (GS) drill planning. In experiments with 5 models derived from real magnetic resonance imaging (MRI) data, our approach was capable of achieving 90% bone penetration with respect to the GS while satisfying safety requirements, even under observation and motion uncertainty. To the best of our knowledge, our approach is the first safe DRL approach focusing on orthopedic surgeries.

3.Physics-informed Neural Networks to Model and Control Robots: a Theoretical and Experimental Investigation

Authors:Jingyue Liu, Pablo Borja, Cosimo Della Santina

Abstract: Physics-inspired neural networks are proven to be an effective modeling method by giving more physically plausible results with less data dependency. However, their application in robotics is limited due to the non-conservative nature of robot dynamics and the difficulty in friction modeling. Moreover, these physics-inspired neural networks do not account for complex input matrices, such as those found in underactuated soft robots. This paper solves these problems by extending Lagrangian and Hamiltonian neural networks by including dissipation and a simplified input matrix. Additionally, the loss function is processed using the Runge-Kutta algorithm, circumventing the inaccuracies and environmental susceptibility inherent in direct acceleration measurements. First, the effectiveness of the proposed method is validated via simulations of soft and rigid robots. Then, the proposed approach is validated experimentally in a tendon-driven soft robot and a Panda robot. The simulations and experimental results show that the modified neural networks can model different robots while the learned model enables decent anticipatory control.

4.Resilient Temporal Logic Planning in the Presence of Robot Failures

Authors:Samarth Kalluraya, George J. Pappas, Yiannis Kantaros

Abstract: Several task and motion planning algorithms have been proposed recently to design paths for mobile robot teams with collaborative high-level missions specified using formal languages, such as Linear Temporal Logic (LTL). However, the designed paths often lack reactivity to failures of robot capabilities (e.g., sensing, mobility, or manipulation) that can occur due to unanticipated events (e.g., human intervention or system malfunctioning) which in turn may compromise mission performance. To address this novel challenge, in this paper, we propose a new resilient mission planning algorithm for teams of heterogeneous robots with collaborative LTL missions. The robots are heterogeneous with respect to their capabilities while the mission requires applications of these skills at certain areas in the environment in a temporal/logical order. The proposed method designs paths that can adapt to unexpected failures of robot capabilities. This is accomplished by re-allocating sub-tasks to the robots based on their currently functioning skills while minimally disrupting the existing team motion plans. We provide experiments and theoretical guarantees demonstrating the efficiency and resiliency of the proposed algorithm.

5.ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation

Authors:Vishnu Dutt Sharma, Jingxi Chen, Pratap Tokekar

Abstract: In a typical path planning pipeline for a ground robot, we build a map (e.g., an occupancy grid) of the environment as the robot moves around. While navigating indoors, a ground robot's knowledge about the environment may be limited due to occlusions. Therefore, the map will have many as-yet-unknown regions that may need to be avoided by a conservative planner. Instead, if a robot is able to correctly predict what its surroundings and occluded regions look like, the robot may be more efficient in navigation. In this work, we focus on predicting occupancy within the reachable distance of the robot to enable faster navigation and present a self-supervised proximity occupancy map prediction method, named ProxMaP. We show that ProxMaP generalizes well across realistic and real domains, and improves the robot navigation efficiency in simulation by \textbf{$12.40\%$} against the traditional navigation method. We share our findings on our project webpage (see ).

6.Buoyancy enabled autonomous underwater construction with cement blocks

Authors:Samuel Lensgraf, Devin Balkcom, Alberto Quattrini Li

Abstract: We present the first free-floating autonomous underwater construction system capable of using active ballasting to transport cement building blocks efficiently. It is the first free-floating autonomous construction robot to use a paired set of resources: compressed air for buoyancy and a battery for thrusters. In construction trials, our system built structures of up to 12 components and weighing up to 100Kg (75Kg in water). Our system achieves this performance by combining a novel one-degree-of-freedom manipulator, a novel two-component cement block construction system that corrects errors in placement, and a simple active ballasting system combined with compliant placement and grasp behaviors. The passive error correcting components of the system minimize the required complexity in sensing and control. We also explore the problem of buoyancy allocation for building structures at scale by defining a convex program which allocates buoyancy to minimize the predicted energy cost for transporting blocks.

7.A Robotic Medical Clown (RMC): Forming a Design Space Model

Authors:Ela Liberman-Pincu, Tal Oron-Gilad

Abstract: Medical clowns help hospitalized children in reducing pain and anxiety symptoms and increase the level of satisfaction in children's wards. Unfortunately, there is a shortage of medical clowns around the world. Furthermore, isolated children can not enjoy this service. This study explored the concept of a Robotic Medical Clown (RMC) and its role. We used mixed methods of elicitation to create a design space model for future robotic medical clowns. We investigated the needs, perceptions, and preferences of children and teenagers using four methods: interviewing medical clowns to learn how they perceive their role and the potential role of an RMC, conducting focus groups with teenagers, a one-on-one experience of children with a robot, and an online questionnaire. The concept of RMCs was acceptable to children, teenagers, and medical clowns. We found that the RMC's appearance affects the perception of its characters and role. Future work should investigate the interaction in hospitals.

8.TidyBot: Personalized Robot Assistance with Large Language Models

Authors:Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.

1.Deadlock-Free Collision Avoidance for Nonholonomic Robots

Authors:Ruochen Zheng, Siyu Li

Abstract: We present a method for deadlock-free and collision-free navigation in a multi-robot system with nonholonomic robots. The problem is solved by quadratic programming and is applicable to most wheeled mobile robots with linear kinematic constraints. We introduce masked velocity and Masked Cooperative Collision Avoidance (MCCA) algorithm to encourage a fully decentralized deadlock avoidance behavior. To verify the method, we provide a detailed implementation and introduce heading oscillation avoidance for differential-drive robots. To the best of our knowledge, it is the first method to give very promising and stable results for deadlock avoidance even in situations with a large number of robots and narrow passages.

2.An Enhanced Sampling-Based Method With Modified Next-Best View Strategy For 2D Autonomous Robot Exploration

Authors:Dong Huu Quoc Tran, Hoang-Anh Phan, Hieu Dang Van, Tan Van Duong, Tung Thanh Bui, Van Nguyen Thi Thanh

Abstract: Autonomous exploration is a new technology in the field of robotics that has found widespread application due to its objective to help robots independently localize, scan maps, and navigate any terrain without human control. Up to present, the sampling-based exploration strategies have been the most effective for aerial and ground vehicles equipped with depth sensors producing three-dimensional point clouds. Those methods utilize the sampling task to choose random points or make samples based on Rapidly-exploring Random Trees (RRT). Then, they decide on frontiers or Next Best Views (NBV) with useful volumetric information. However, most state-of-the-art sampling-based methodology is challenging to implement in two-dimensional robots due to the lack of environmental knowledge, thus resulting in a bad volumetric gain for evaluating random destinations. This study proposed an enhanced sampling-based solution for indoor robot exploration to decide Next Best View (NBV) in 2D environments. Our method makes RRT until have the endpoints as frontiers and evaluates those with the enhanced utility function. The volumetric information obtained from environments was estimated using non-uniform distribution to determine cells that are occupied and have an uncertain probability. Compared to the sampling-based Frontier Detection and Receding Horizon NBV approaches, the methodology executed performed better in Gazebo platform-simulated environments, achieving a significantly larger explored area, with the average distance and time traveled being reduced. Moreover, the operated proposed method on an author-built 2D robot exploring the entire natural environment confirms that the method is effective and applicable in real-world scenarios.

3.A sensor fusion approach for improving implementation speed and accuracy of RTAB-Map algorithm based indoor 3D mapping

Authors:Hoang-Anh Phan, Phuc Vinh Nguyen, Thu Hang Thi Khuat, Hieu Dang Van, Dong Huu Quoc Tran, Bao Lam Dang, Tung Thanh Bui, Van Nguyen Thi Thanh, Trinh Chu Duc

Abstract: In recent years, 3D mapping for indoor environments has undergone considerable research and improvement because of its effective applications in various fields, including robotics, autonomous navigation, and virtual reality. Building an accurate 3D map for indoor environment is challenging due to the complex nature of the indoor space, the problem of real-time embedding and positioning errors of the robot system. This study proposes a method to improve the accuracy, speed, and quality of 3D indoor mapping by fusing data from the Inertial Measurement System (IMU) of the Intel Realsense D435i camera, the Ultrasonic-based Indoor Positioning System (IPS), and the encoder of the robot's wheel using the extended Kalman filter (EKF) algorithm. The merged data is processed using a Real-time Image Based Mapping algorithm (RTAB-Map), with the processing frequency updated in synch with the position frequency of the IPS device. The results suggest that fusing IMU and IPS data significantly improves the accuracy, mapping time, and quality of 3D maps. Our study highlights the proposed method's potential to improve indoor mapping in various fields, indicating that the fusion of multiple data sources can be a valuable tool in creating high-quality 3D indoor maps.

4.Reducing Onboard Processing Time for Path Planning in Dynamically Evolving Polygonal Maps

Authors:Aditya Shirwatkar, Aman Singh, Jana Ravi Kiran

Abstract: Autonomous agents face the challenge of coordinating multiple tasks (perception, motion planning, controller) which are computationally expensive on a single onboard computer. To utilize the onboard processing capacity optimally, it is imperative to arrive at computationally efficient algorithms for global path planning. In this work, it is attempted to reduce the processing time for global path planning in dynamically evolving polygonal maps. In dynamic environments, maps may not remain valid for long. Hence it is of utmost importance to obtain the shortest path quickly in an ever-changing environment. To address this, an existing rapid path-finding algorithm, the Minimal Construct was used. This algorithm discovers only a necessary portion of the Visibility Graph around obstacles and computes collision tests only for lines that seem heuristically promising. Simulations show that this algorithm finds shortest paths faster than traditional grid-based A* searches in most cases, resulting in smoother and shorter paths even in dynamic environments.

5.Multimodal Detection and Identification of Robot Manipulation Failures

Authors:Arda Inceoglu, Eren Erdal Aksoy, Sanem Sariel

Abstract: An autonomous service robot should be able to interact with its environment safely and robustly without requiring human assistance. Unstructured environments are challenging for robots since the exact prediction of outcomes is not always possible. Even when the robot behaviors are well-designed, the unpredictable nature of physical robot-object interaction may prevent success in object manipulation. Therefore, execution of a manipulation action may result in an undesirable outcome involving accidents or damages to the objects or environment. Situation awareness becomes important in such cases to enable the robot to (i) maintain the integrity of both itself and the environment, (ii) recover from failed tasks in the short term, and (iii) learn to avoid failures in the long term. For this purpose, robot executions should be continuously monitored, and failures should be detected and classified appropriately. In this work, we focus on detecting and classifying both manipulation and post-manipulation phase failures using the same exteroception setup. We cover a diverse set of failure types for primary tabletop manipulation actions. In order to detect these failures, we propose FINO-Net [1], a deep multimodal sensor fusion based classifier network. Proposed network accurately detects and classifies failures from raw sensory data without any prior knowledge. In this work, we use our extended FAILURE dataset [1] with 99 new multimodal manipulation recordings and annotate them with their corresponding failure types. FINO-Net achieves 0.87 failure detection and 0.80 failure classification F1 scores. Experimental results show that proposed architecture is also appropriate for real-time use.

6.Synthesize Dexterous Nonprehensile Pregrasp for Ungraspable Objects

Authors:Sirui Chen, Albert Wu, C. Karen Liu

Abstract: Daily objects embedded in a contextual environment are often ungraspable initially. Whether it is a book sandwiched by other books on a fully packed bookshelf or a piece of paper lying flat on the desk, a series of nonprehensile pregrasp maneuvers is required to manipulate the object into a graspable state. Humans are proficient at utilizing environmental contacts to achieve manipulation tasks that are otherwise impossible, but synthesizing such nonprehensile pregrasp behaviors is challenging to existing methods. We present a novel method that combines graph search, optimal control, and a learning-based objective function to synthesize physically realistic and diverse nonprehensile pre-grasp motions that leverage the external contacts. Since the ``graspability'' of an object in context with its surrounding is difficult to define, we utilize a dataset of dexterous grasps to learn a metric which implicitly takes into account the exposed surface of the object and the finger tip locations. Our method can efficiently discover hand and object trajectories that are certified to be physically feasible by the simulation and kinematically achievable by the dexterous hand. We evaluate our method on eight challenging scenarios where nonprehensile pre-grasps are required to succeed. We also show that our method can be applied to unseen objects different from those in the training dataset. Finally, we report quantitative analyses on generalization and robustness of our method, as well as an ablation study.

7.Rotational Slippage Prediction from Segmentation of Tactile Images

Authors:Julio Castaño-Amoros, Pablo Gil

Abstract: Adding tactile sensors to a robotic system is becoming a common practice to achieve more complex manipulation skills than those robotics systems that only use external cameras to manipulate objects. The key of tactile sensors is that they provide extra information about the physical properties of the grasping. In this paper, we implemented a system to predict and quantify the rotational slippage of objects in hand using the vision-based tactile sensor known as Digit. Our system comprises a neural network that obtains the segmented contact region (object-sensor), to later calculate the slippage rotation angle from this region using a thinning algorithm. Besides, we created our own tactile segmentation dataset, which is the first one in the literature as far as we are concerned, to train and evaluate our neural network, obtaining results of 95% and 91% in Dice and IoU metrics. In real-scenario experiments, our system is able to predict rotational slippage with a maximum mean rotational error of 3 degrees with previously unseen objects. Thus, our system can be used to prevent an object from falling due to its slippage.

8.ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

Authors:Chelsea Zou, Kishan Chandan, Yan Ding, Shiqi Zhang

Abstract: Human-robot collaboration (HRC) has become increasingly relevant in industrial, household, and commercial settings. However, the effectiveness of such collaborations is highly dependent on the human and robots' situational awareness of the environment. Improving this awareness includes not only aligning perceptions in a shared workspace, but also bidirectionally communicating intent and visualizing different states of the environment to enhance scene understanding. In this paper, we propose ARDIE (Augmented Reality with Dialogue and Eye Gaze), a novel intelligent agent that leverages multi-modal feedback cues to enhance HRC. Our system utilizes a decision theoretic framework to formulate a joint policy that incorporates interactive augmented reality (AR), natural language, and eye gaze to portray current and future states of the environment. Through object-specific AR renders, the human can visualize future object interactions to make adjustments as needed, ultimately providing an interactive and efficient collaboration between humans and robots.

9.Anticipatory Planning: Improving Long-Lived Planning by Estimating Expected Cost of Future Tasks

Authors:Roshan Dhakal, Md Ridwan Hossain Talukder, Gregory J. Stein

Abstract: We consider a service robot in a household environment given a sequence of high-level tasks one at a time. Most existing task planners, lacking knowledge of what they may be asked to do next, solve each task in isolation and so may unwittingly introduce side effects that make subsequent tasks more costly. In order to reduce the overall cost of completing all tasks, we consider that the robot must anticipate the impact its actions could have on future tasks. Thus, we propose anticipatory planning: an approach in which estimates of the expected future cost, from a graph neural network, augment model-based task planning. Our approach guides the robot towards behaviors that encourage preparation and organization, reducing overall costs in long-lived planning scenarios. We evaluate our method on blockworld environments and show that our approach reduces the overall planning costs by 5% as compared to planning without anticipatory planning. Additionally, if given an opportunity to prepare the environment in advance (a special case of anticipatory planning), our planner improves overall cost by 11%.

10.The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation

Authors:Jan Ole von Hartz, Eugenio Chisari, Tim Welschehold, Wolfram Burgard, Joschka Boedecker, Abhinav Valada

Abstract: In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach successfully resolves inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.

11.Sense, Imagine, Act: Multimodal Perception Improves Model-Based Reinforcement Learning for Head-to-Head Autonomous Racing

Authors:Elena Shrestha, Chetan Reddy, Hanxi Wan, Yulun Zhuang, Ram Vasudevan

Abstract: Model-based reinforcement learning (MBRL) techniques have recently yielded promising results for real-world autonomous racing using high-dimensional observations. MBRL agents, such as Dreamer, solve long-horizon tasks by building a world model and planning actions by latent imagination. This approach involves explicitly learning a model of the system dynamics and using it to learn the optimal policy for continuous control over multiple timesteps. As a result, MBRL agents may converge to sub-optimal policies if the world model is inaccurate. To improve state estimation for autonomous racing, this paper proposes a self-supervised sensor fusion technique that combines egocentric LiDAR and RGB camera observations collected from the F1TENTH Gym. The zero-shot performance of MBRL agents is empirically evaluated on unseen tracks and against a dynamic obstacle. This paper illustrates that multimodal perception improves robustness of the world model without requiring additional training data. The resulting multimodal Dreamer agent safely avoided collisions and won the most races compared to other tested baselines in zero-shot head-to-head autonomous racing.

12.Multi-legged matter transport: a framework for locomotion on noisy landscapes

Authors:Baxi Chong, Juntao He, Daniel Soto, Tianyu Wang, Daniel Irvine, Grigoriy Blekherman, Daniel I. Goldman

Abstract: While the transport of matter by wheeled vehicles or legged robots can be guaranteed in engineered landscapes like roads or rails, locomotion prediction in complex environments like collapsed buildings or crop fields remains challenging. Inspired by principles of information transmission which allow signals to be reliably transmitted over noisy channels, we develop a ``matter transport" framework demonstrating that non-inertial locomotion can be provably generated over ``noisy" rugose landscapes (heterogeneities on the scale of locomotor dimensions). Experiments confirm that sufficient spatial redundancy in the form of serially-connected legged robots leads to reliable transport on such terrain without requiring sensing and control. Further analogies from communication theory coupled to advances in gaits (coding) and sensor-based feedback control (error detection/correction) can lead to agile locomotion in complex terradynamic regimes.

13.SwipeBot: DNN-based Autonomous Robot Navigation among Movable Obstacles in Cluttered Environments

Authors:Nikolay Zherdev, Mikhail Kurenkov, Kristina Belikova, Dzmitry Tsetserukou

Abstract: In this paper, we propose a novel approach to wheeled robot navigation through an environment with movable obstacles. A robot exploits knowledge about different obstacle classes and selects the minimally invasive action to perform to clear the path. We trained a convolutional neural network (CNN), so the robot can classify an RGB-D image and decide whether to push a blocking object and which force to apply. After known objects are segmented, they are being projected to a cost-map, and a robot calculates an optimal path to the goal. If the blocking objects are allowed to be moved, a robot drives through them while pushing them away. We implemented our algorithm in ROS, and an extensive set of simulations showed that the robot successfully overcomes the blocked regions. Our approach allows a robot to successfully build a path through regions, where it would have stuck with traditional path-planning techniques.

14.Hierarchical Visual Localization Based on Sparse Feature Pyramid for Adaptive Reduction of Keypoint Map Size

Authors:Andrei Potapov, Mikhail Kurenkov, Pavel Karpyshev, Evgeny Yudin, Alena Savinykh, Evgeny Kruzhkov, Dzmitry Tsetserukou

Abstract: Visual localization is a fundamental task for a wide range of applications in the field of robotics. Yet, it is still a complex problem with no universal solution, and the existing approaches are difficult to scale: most state-of-the-art solutions are unable to provide accurate localization without a significant amount of storage space. We propose a hierarchical, low-memory approach to localization based on keypoints with different descriptor lengths. It becomes possible with the use of the developed unsupervised neural network, which predicts a feature pyramid with different descriptor lengths for images. This structure allows applying coarse-to-fine paradigms for localization based on keypoint map, and varying the accuracy of localization by changing the type of the descriptors used in the pipeline. Our approach achieves comparable results in localization accuracy and a significant reduction in memory consumption (up to 16 times) among state-of-the-art methods.

1.Occupancy Prediction-Guided Neural Planner for Autonomous Driving

Authors:Haochen Liu, Zhiyu Huang, Chen Lv

Abstract: Forecasting the scalable future states of surrounding traffic participants in complex traffic scenarios is a critical capability for autonomous vehicles, as it enables safe and feasible decision-making. Recent successes in learning-based prediction and planning have introduced two primary challenges: generating accurate joint predictions for the environment and integrating prediction guidance for planning purposes. To address these challenges, we propose a two-stage integrated neural planning framework, termed OPGP, that incorporates joint prediction guidance from occupancy forecasting. The preliminary planning phase simultaneously outputs the predicted occupancy for various types of traffic actors based on imitation learning objectives, taking into account shared interactions, scene context, and actor dynamics within a unified Transformer structure. Subsequently, the transformed occupancy prediction guides optimization to further inform safe and smooth planning under Frenet coordinates. We train our planner using a large-scale, real-world driving dataset and validate it in open-loop configurations. Our proposed planner outperforms strong learning-based methods, exhibiting improved performance due to occupancy prediction guidance.

2.Experimental Validation of Safe MPC for Autonomous Driving in Uncertain Environments

Authors:Ivo Batkovic, Ankit Gupta, Mario Zanon, Paolo Falcone

Abstract: The full deployment of autonomous driving systems on a worldwide scale requires that the self-driving vehicle be operated in a provably safe manner, i.e., the vehicle must be able to avoid collisions in any possible traffic situation. In this paper, we propose a framework based on Model Predictive Control (MPC) that endows the self-driving vehicle with the necessary safety guarantees. In particular, our framework ensures constraint satisfaction at all times, while tracking the reference trajectory as close as obstacles allow, resulting in a safe and comfortable driving behavior. To discuss the performance and real-time capability of our framework, we provide first an illustrative simulation example, and then we demonstrate the effectiveness of our framework in experiments with a real test vehicle.

3.Multi S-graphs: A Collaborative Semantic SLAM architecture

Authors:Miguel Fernandez-Cortizas, Hriday Bavle, Jose Luis Sanchez-Lopez, Pascual Campoy, Holger Voos

Abstract: Collaborative Simultaneous Localization and Mapping (CSLAM) is a critical capability for enabling multiple robots to operate in complex environments. Most CSLAM techniques rely on the transmission of low-level features for visual and LiDAR-based approaches, which are used for pose graph optimization. However, these low-level features can lead to incorrect loop closures, negatively impacting map generation.Recent approaches have proposed the use of high-level semantic information in the form of Hierarchical Semantic Graphs to improve the loop closure procedures and overall precision of SLAM algorithms. In this work, we present Multi S-Graphs, an S-graphs [1] based distributed CSLAM algorithm that utilizes high-level semantic information for cooperative map generation while minimizing the amount of information exchanged between robots. Experimental results demonstrate the promising performance of the proposed algorithm in map generation tasks.

4.Local Gaussian Modifiers (LGMs): UAV dynamic trajectory generation for onboard computation

Authors:Miguel Fernandez-Cortizas, David Perez-Saura, Javier Rodriguez-Vazquez, Pascual Campoy

Abstract: Agile autonomous drones are becoming increasingly popular in research due to the challenges they represent in fields like control, state estimation, or perception at high speeds. When all algorithms are computed onboard the uav, the computational limitations make the task of agile and robust flight even more difficult. One of the most computationally expensive tasks in agile flight is the generation of optimal trajectories that tackles the problem of planning a minimum time trajectory for a quadrotor over a sequence of specified waypoints. When these trajectories must be updated online due to changes in the environment or uncertainties, this high computational cost can leverage to not reach the desired waypoints or even crash in cluttered environments. In this paper, a fast lightweight dynamic trajectory modification approach is presented to allow modifying computational heavy trajectories using Local Gaussian Modifiers (LGMs), when recalculating a trajectory is not possible due to the time of computation. Our approach was validated in simulation, being able to pass through a race circuit with dynamic gates with top speeds up to 16.0 m/s, and was also validated in real flight reaching speeds up to 4.0 m/s in a fully autonomous onboard computing condition.

1.Learning Generalizable Pivoting Skills

Authors:Xiang Zhang, Siddarth Jain, Baichuan Huang, Masayoshi Tomizuka, Diego Romeres

Abstract: The skill of pivoting an object with a robotic system is challenging for the external forces that act on the system, mainly given by contact interaction. The complexity increases when the same skills are required to generalize across different objects. This paper proposes a framework for learning robust and generalizable pivoting skills, which consists of three steps. First, we learn a pivoting policy on an ``unitary'' object using Reinforcement Learning (RL). Then, we obtain the object's feature space by supervised learning to encode the kinematic properties of arbitrary objects. Finally, to adapt the unitary policy to multiple objects, we learn data-driven projections based on the object features to adjust the state and action space of the new pivoting task. The proposed approach is entirely trained in simulation. It requires only one depth image of the object and can zero-shot transfer to real-world objects. We demonstrate robustness to sim-to-real transfer and generalization to multiple objects.

2.Real-Time Spatial Trajectory Planning for Urban Environments Using Dynamic Optimization

Authors:Jona Ruof, Max Bastian Mertens, Michael Buchholz, Klaus Dietmayer

Abstract: Planning trajectories for automated vehicles in urban environments requires methods with high generality, long planning horizons, and fast update rates. Using a path-velocity decomposition, we contribute a novel planning framework, which generates foresighted trajectories and can handle a wide variety of state and control constraints effectively. In contrast to related work, the proposed optimal control problems are formulated over space rather than time. This spatial formulation decouples environmental constraints from the optimization variables, which allows the application of simple, yet efficient shooting methods. To this end, we present a tailored solution strategy based on ILQR, in the Augmented Lagrangian framework, to rapidly minimize the trajectory objective costs, even under infeasible initial solutions. Evaluations in simulation and on a full-sized automated vehicle in real-world urban traffic show the real-time capability and versatility of the proposed approach.

3.CCIL: Context-conditioned imitation learning for urban driving

Authors:Ke Guo, Wei Jing, Junbo Chen, Jia Pan

Abstract: Imitation learning holds great promise for addressing the complex task of autonomous urban driving, as experienced human drivers can navigate highly challenging scenarios with ease. While behavior cloning is a widely used imitation learning approach in autonomous driving due to its exemption from risky online interactions, it suffers from the covariate shift issue. To address this limitation, we propose a context-conditioned imitation learning approach that employs a policy to map the context state into the ego vehicle's future trajectory, rather than relying on the traditional formulation of both ego and context states to predict the ego action. Additionally, to reduce the implicit ego information in the coordinate system, we design an ego-perturbed goal-oriented coordinate system. The origin of this coordinate system is the ego vehicle's position plus a zero mean Gaussian perturbation, and the x-axis direction points towards its goal position. Our experiments on the real-world large-scale Lyft and nuPlan datasets show that our method significantly outperforms state-of-the-art approaches.

4.Guidance & Control Networks for Time-Optimal Quadcopter Flight

Authors:Sebastien Origer, Christophe De Wagter, Robin Ferede, Guido C. H. E. de Croon, Dario Izzo

Abstract: Reaching fast and autonomous flight requires computationally efficient and robust algorithms. To this end, we train Guidance & Control Networks to approximate optimal control policies ranging from energy-optimal to time-optimal flight. We show that the policies become more difficult to learn the closer we get to the time-optimal 'bang-bang' control profile. We also assess the importance of knowing the maximum angular rotor velocity of the quadcopter and show that over- or underestimating this limit leads to less robust flight. We propose an algorithm to identify the current maximum angular rotor velocity onboard and a network that adapts its policy based on the identified limit. Finally, we extend previous work on Guidance & Control Networks by learning to take consecutive waypoints into account. We fly a 4x3m track in similar lap times as the differential-flatness-based minimum snap benchmark controller while benefiting from the flexibility that Guidance & Control Networks offer.

5.Efficient and Robust Time-Optimal Trajectory Planning and Control for Agile Quadrotor Flight

Authors:Ziyu Zhou, Gang Wang, Jian Sun, Jikai Wang, Jie Chen

Abstract: Agile quadrotor flight relies on rapidly planning and accurately tracking time-optimal trajectories, a technology critical to their application in the wild. However, the computational burden of computing time-optimal trajectories based on the full quadrotor dynamics (typically on the order of minutes or even hours) can hinder its ability to respond quickly to changing scenarios. Additionally, modeling errors and external disturbances can lead to deviations from the desired trajectory during tracking in real time. This letter proposes a novel approach to computing time-optimal trajectories, by fixing the nodes with waypoint constraints and adopting separate sampling intervals for trajectories between waypoints, which significantly accelerates trajectory planning. Furthermore, the planned paths are tracked via a time-adaptive model predictive control scheme whose allocated tracking time can be adaptively adjusted on-the-fly, therefore enhancing the tracking accuracy and robustness. We evaluate our approach through simulations and experimentally validate its performance in dynamic waypoint scenarios for time-optimal trajectory replanning and trajectory tracking.

6.Learning Failure Prevention Skills for Safe Robot Manipulation

Authors:Abdullah Cihan Ak, Eren Erdal Aksoy, Sanem Sariel

Abstract: Robots are more capable of achieving manipulation tasks for everyday activities than before. But the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Beyond that, in unstructured environments, it is not easy to enumerate all possible failures beforehand. In the context of safe skill manipulation, we reformulate skills as base and failure prevention skills where base skills aim at completing tasks and failure prevention skills focus on reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning, forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible, novel failures are easily adaptable, and our safe manipulation tools can be transferred to the real environment.

7.Social Robot Navigation through Constrained Optimization: a Comparative Study of Uncertainty-based Objectives and Constraints

Authors:Timur Akhtyamov, Aleksandr Kashirin, Aleksey Postnikov, Gonzalo Ferrer

Abstract: This work is dedicated to the study of how uncertainty estimation of the human motion prediction can be embedded into constrained optimization techniques, such as Model Predictive Control (MPC) for the social robot navigation. We propose several cost objectives and constraint functions obtained from the uncertainty of predicting pedestrian positions and related to the probability of the collision that can be applied to the MPC, and all the different variants are compared in challenging scenes with multiple agents. The main question this paper tries to answer is: what are the most important uncertainty-based criteria for social MPC? For that, we evaluate the proposed approaches with several social navigation metrics in an extensive set of scenarios of different complexity in reproducible synthetic environments. The main outcome of our study is a foundation for a practical guide on when and how to use uncertainty-aware approaches for social robot navigation in practice and what are the most effective criteria.

8.Off-Road Navigation of Legged Robots Using Linear Transfer Operators

Authors:Joseph Moyalan, Andrew Zheng, Sriram S. K. S Narayanan, Umesh Vaidya

Abstract: This paper presents the implementation of off-road navigation on legged robots using convex optimization through linear transfer operators. Given a traversability measure that captures the off-road environment, we lift the navigation problem into the density space using the Perron-Frobenius (P-F) operator. This allows the problem formulation to be represented as a convex optimization. Due to the operator acting on an infinite-dimensional density space, we use data collected from the terrain to get a finite-dimension approximation of the convex optimization. Results of the optimal trajectory for off-road navigation are compared with a standard iterative planner, where we show how our convex optimization generates a more traversable path for the legged robot compared to the suboptimal iterative planner.

9.Preliminary results of a therapeutic lab for promoting autonomies in autistic children

Authors:Cristina Gena, Rossana Damiano, Claudio Mattutino, Alessandro Mazzei, Andrea Meirone, Loredana Mazzotta, Matteo Nazzario, Valeria Ricci, Stefania Brighenti, Federica Liscio, Francesco Petriglia

Abstract: This extended abtract describes the preliminary qualitative results coming from a therapeutic laboratory focused on the use of the Pepper robot to promote autonomies and functional acquisitions in highly functioning (Asperger) children with autism. The field lab, ideated and led by a multidisciplinary team, involved 4 children, aged 11-13, who attended the laboratory sessions once a week for four months.

1.Design and Control of a Micro Overactuated Aerial Robot with an Origami Delta Manipulator

Authors:Eugenio Cuniato, Christian Geckeler, Maximilian Brunner, Dario Strübin, Elia Bähler, Fabian Ospelt, Marco Tognon, Stefano Mintchev, Roland Siegwart

Abstract: This work presents the mechanical design and control of a novel small-size and lightweight Micro Aerial Vehicle (MAV) for aerial manipulation. To our knowledge, with a total take-off mass of only 2.0 kg, the proposed system is the most lightweight Aerial Manipulator (AM) that has 8-DOF independently controllable: 5 for the aerial platform and 3 for the articulated arm. We designed the robot to be fully-actuated in the body forward direction. This allows independent pitching and instantaneous force generation, improving the platform's performance during physical interaction. The robotic arm is an origami delta manipulator driven by three servomotors, enabling active motion compensation at the end-effector. Its composite multimaterial links help reduce the weight, while their flexibility allow for compliant aerial interaction with the environment. In particular, the arm's stiffness can be changed according to its configuration. We provide an in depth discussion of the system design and characterize the stiffness of the delta arm. A control architecture to deal with the platform's overactuation while exploiting the delta arm is presented. Its capabilities are experimentally illustrated both in free flight and physical interaction, highlighting advantages and disadvantages of the origami's folding mechanism.

2.Enhancing Efficiency of Quadrupedal Locomotion over Challenging Terrains with Extensible Feet

Authors:Lokesh Kumar, Sarvesh Sortee, Titas Bera, Ranjan Dasgupta

Abstract: Recent advancements in legged locomotion research have made legged robots a preferred choice for navigating challenging terrains when compared to their wheeled counterparts. This paper presents a novel locomotion policy, trained using Deep Reinforcement Learning, for a quadrupedal robot equipped with an additional prismatic joint between the knee and foot of each leg. The training is performed in NVIDIA Isaac Gym simulation environment. Our study investigates the impact of these joints on maintaining the quadruped's desired height and following commanded velocities while traversing challenging terrains. We provide comparison results, based on a Cost of Transport (CoT) metric, between quadrupeds with and without prismatic joints. The learned policy is evaluated on a set of challenging terrains using the CoT metric in simulation. Our results demonstrate that the added degrees of actuation offer the locomotion policy more flexibility to use the extra joints to traverse terrains that would be deemed infeasible or prohibitively expensive for the conventional quadrupedal design, resulting in significantly improved efficiency.

3.Stochastic High Fidelity Autonomous Fixed Wing Aircraft Flight Simulator

Authors:Eduardo Gallo

Abstract: This document describes the architecture and algorithms of a high fidelity fixed wing flight simulator intended to test and validate novel guidance, navigation, and control (GNC) algorithms for autonomous aircraft. It aims to replicate the influence of as many factors as possible on the aircraft performances, the Earth model, the physics of flight and the associated equations of motion, and in particular the behavior of the onboard sensors, limiting the assumptions to the bare minimum, and including multiple relatively minor effects not usually considered in simulation that may play a role in the GNC algorithms not performing as intended. The author releases the flight simulator C ++ implementation as open-source software. The simulator modular design enables the replacement of the standard GNC algorithms with the objective of evaluating their performances when subject to specific missions and meteorological conditions (atmospheric properties, wind field, air turbulence). The testing and evaluation is performed by means of Monte Carlo simulations, as most simulation modules (such as the aircraft mission, the meteorological conditions, the errors introduced by the sensors, and the initial conditions) are defined stochastically and hence vary in a pseudo-random way from one execution to the next according to certain user-defined input parameters, ensuring that the results are valid for a wide range of conditions. In addition to modeling the outputs of all sensors usually present onboard a fixed wing platform, such as accelerometers, gyroscopes, magnetometers, Pitot tube, air vanes, and a Global Navigation Satellite System (GNCC) receiver, the simulator is also capable of generating realistic images of the Earth surface that resemble what an onboard camera would record if following the resulting trajectory, enabling the use and evaluation of visual and visual inertial navigation systems.

4.HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization

Authors:Fabian Immel, Richard Fehler, Mohammad M. Ghanaat, Florian Ries, Martin Haueis, Christoph Stiller

Abstract: High Definition (HD) maps are necessary for many applications of automated driving (AD), but their manual creation and maintenance is very costly. Vehicle fleet data from series production vehicles can be used to automatically generate HD maps, but the data is often incomplete and noisy. We propose a system for the generation of HD maps from vehicle fleet data, which is tolerant to missing or misclassified detections and can handle drives with multiple routes, generating a single complete map, model-free and without prior reference lines. Using randomly selected drives as pivot drives, a step-wise lateral sampling of detections is performed. These sampled points are then clustered and aligned using Expectation Maximization (EM), estimating a lateral offset for each drive to compensate localization errors. The clustered points are replaced with the maxima of their probability density function (PDF) and connected to form polylines using a modified rectangular linear assignment algorithm. The data from vehicles on varying routes is then fused into a hierarchical singular map graph. The proposed approach achieves an average accuracy below 0.5 meters compared to a hand annotated ground truth map, as well as correctly resolving lane splits and merges, proving the feasibility of the use of vehicle fleet data for the generation of highway HD maps.

5.Locosim: an Open-Source Cross-Platform Robotics Framework

Authors:Michele Focchi, Francesco Roscia, Claudio Semini

Abstract: The architecture of a robotics software framework tremendously influences the effort and time it takes for end users to test new concepts in a simulation environment and to control real hardware. Many years of activity in the field allowed us to sort out crucial requirements for a framework tailored for robotics: modularity and extensibility, source code reusability, feature richness, and user-friendliness. We implemented these requirements and collected best practices in Locosim, a cross-platform framework for simulation and real hardware. In this paper, we describe the architecture of Locosim and illustrate some use cases that show its potential.

6.A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments

Authors:Jason Gibson, Bogdan Vlahov, David Fan, Patrick Spieler, Daniel Pastor, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou

Abstract: Modeling dynamics is often the first step to making a vehicle autonomous. While on-road autonomous vehicles have been extensively studied, off-road vehicles pose many challenging modeling problems. An off-road vehicle encounters highly complex and difficult-to-model terrain/vehicle interactions, as well as having complex vehicle dynamics of its own. These complexities can create challenges for effective high-speed control and planning. In this paper, we introduce a framework for multistep dynamics prediction that explicitly handles the accumulation of modeling error and remains scalable for sampling-based controllers. Our method uses a specially-initialized Long Short-Term Memory (LSTM) over a limited time horizon as the learned component in a hybrid model to predict the dynamics of a 4-person seating all-terrain vehicle (Polaris S4 1000 RZR) in two distinct environments. By only having the LSTM predict over a fixed time horizon, we negate the need for long term stability that is often a challenge when training recurrent neural networks. Our framework is flexible as it only requires odometry information for labels. Through extensive experimentation, we show that our method is able to predict millions of possible trajectories in real-time, with a time horizon of five seconds in challenging off road driving scenarios.

7.Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter

Authors:Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden

Abstract: This paper investigated the distributed leader follower formation control problem for multiple differentially driven mobile robots. A distributed estimator is first introduced and it only requires the state information from each follower itself and its neighbors. Then, we propose a bioinspired neural dynamic based backstepping and sliding mode control hybrid formation control method with proof of its stability. The proposed control strategy resolves the impractical speed jump issue that exists in the conventional backstepping design. Additionally, considering the system and measurement noises, the proposed control strategy not only removes the chattering issue existing in the conventional sliding mode control but also provides smooth control input with extra robustness. After that, an adaptive sliding innovation filter is integrated with the proposed control to provide accurate state estimates that are robust to modeling uncertainties. Finally, we performed multiple simulations to demonstrate the efficiency and effectiveness of the proposed formation control strategy.

1.Sim2real and Digital Twins in Autonomous Driving: A Survey

Authors:Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Long Chen

Abstract: Safety and cost are two important concerns for the development of autonomous driving technologies. From the academic research to commercial applications of autonomous driving vehicles, sufficient simulation and real world testing are required. In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue. However, the virtual simulation world differs from the real world in many aspects such as lighting, textures, vehicle dynamics, and agents' behaviors, etc., which makes it difficult to bridge the gap between the virtual and real worlds. This gap is commonly referred to as the reality gap (RG). In recent years, researchers have explored various approaches to address the reality gap issue, which can be broadly classified into two categories: transferring knowledge from simulation to reality (sim2real) and learning in digital twins (DTs). In this paper, we consider the solutions through the sim2real and DTs technologies, and review important applications and innovations in the field of autonomous driving. Meanwhile, we show the state-of-the-arts from the views of algorithms, models, and simulators, and elaborate the development process from sim2real to DTs. The presentation also illustrates the far-reaching effects of the development of sim2real and DTs in autonomous driving.

2.Revisiting the Minimum Constraint Removal Problem in Mobile Robotics

Authors:Antony Thomas, Fulvio Mastrogiovanni, Marco Baglietto

Abstract: The minimum constraint removal problem seeks to find the minimum number of constraints, i.e., obstacles, that need to be removed to connect a start to a goal location with a collision-free path. This problem is NP-hard and has been studied in robotics, wireless sensing, and computational geometry. This work contributes to the existing literature by presenting and discussing two results. The first result shows that the minimum constraint removal is NP-hard for simply connected obstacles where each obstacle intersects a constant number of other obstacles. The second result demonstrates that for $n$ simply connected obstacles in the plane, instances of the minimum constraint removal problem with minimum removable obstacles lower than $(n+1)/3$ can be solved in polynomial time. This result is also empirically validated using several instances of randomly sampled axis-parallel rectangles.

3.HuNavSim: A ROS 2 Human Navigation Simulator for Benchmarking Human-Aware Robot Navigation

Authors:Noé Pérez-Higueras, Roberto Otero, Fernando Caballero, Luis Merino

Abstract: This work presents the Human Navigation Simulator (HuNavSim), a novel open-source tool for the simulation of different human-agent navigation behaviors in scenarios with mobile robots. The tool, the first programmed under the ROS 2 framework, can be employed along with different well-known robotics simulators like Gazebo. The main goal is to ease the development and evaluation of human-aware robot navigation systems in simulation. Besides a general human-navigation model, HuNavSim includes, as a novelty, a rich set of individual and realistic human navigation behaviors and a complete set of metrics for social navigation benchmarking.

4.Safe Autonomous Driving in Adverse Weather: Sensor Evaluation and Performance Monitoring

Authors:Fatih Sezgin, Daniel Vriesman, Dagmar Steinhauser, Robert Lugner, Thomas Brandmeier

Abstract: The vehicle's perception sensors radar, lidar and camera, which must work continuously and without restriction, especially with regard to automated/autonomous driving, can lose performance due to unfavourable weather conditions. This paper analyzes the sensor signals of these three sensor technologies under rain and fog as well as day and night. A data set of a driving test vehicle as an object target under different weather conditions was recorded in a controlled environment with adjustable, defined, and reproducible weather conditions. Based on the sensor performance evaluation, a method has been developed to detect sensor degradation, including determining the affected data areas and estimating how severe they are. Through this sensor monitoring, measures can be taken in subsequent algorithms to reduce the influences or to take them into account in safety and assistance systems to avoid malfunctions.

5.Get Back Here: Robust Imitation by Return-to-Distribution Planning

Authors:Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi

Abstract: We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.

6.A Mobile Quad-Arm Robot ARMS: Wheel-Legged Tripedal Mobility and Quad-Arm Manipulation

Authors:Hisayoshi Muramatsu, Keigo Kitagawa, Jun Watanabe, Ryohei Hisashiki

Abstract: This letter proposes a mobile quad-arm robot: ARMS that unifies wheel-legged tripedal mobility, wheeled mobility, and quad-arm manipulation. The four arms have different mechanics and are designed to be general-purpose arms to enable the wheel-legged hybrid mobilities and manipulation. The three-degree-of-freedom (DOF) front arm has an active wheel, which is used for wheel-legged tripedal walking and wheel driving with passive wheels attached to the torso. The three-DOF rear arms are series elastic arms, which are used for wheel-legged tripedal walking, object grasping, and manipulation. The two-DOF upper arm is used for manipulation only; its position and orientation are determined by coordinating all arms. Each motor is controlled by an angle controller and trajectory modification with angle, angular velocity, angular acceleration, and torque constraints. ARMS was experimentally validated on the basis of the following four tasks: wheel-legged walking, wheel-driving, wheel-driving with grasping, and carrying a bag.

7.Extrinsic Infrastructure Calibration Using the Hand-Eye Robot-World Formulation

Authors:Markus Horn, Thomas Wodtko, Michael Buchholz, Klaus Dietmayer

Abstract: We propose a certifiably globally optimal approach for solving the hand-eye robot-world problem supporting multiple sensors and targets at once. Further, we leverage this formulation for estimating a geo-referenced calibration of infrastructure sensors. Since vehicle motion recorded by infrastructure sensors is mostly planar, obtaining a unique solution for the respective hand-eye robot-world problem is unfeasible without incorporating additional knowledge. Hence, we extend our proposed method to include a-priori knowledge, i.e., the translation norm of calibration targets, to yield a unique solution. Our approach achieves state-of-the-art results on simulated and real-world data. Especially on real-world intersection data, our approach utilizing the translation norm is the only method providing accurate results.

8.Borinot: an agile torque-controlled robot for hybrid flying and contact loco-manipulation (workshop version)

Authors:Josep Marti-Saumell, Joan Sola, Angel Santamaria-Navarro, Hugo Duarte

Abstract: This paper introduces Borinot, an open-source flying robotic platform designed to perform hybrid agile locomotion and manipulation. This platform features a compact and powerful hexarotor that can be outfitted with torque-actuated extremities of diverse architecture, allowing for whole-body dynamic control. As a result, Borinot can perform agile tasks such as aggressive or acrobatic maneuvers with the participation of the whole-body dynamics. The extremities attached to Borinot can be utilized in various ways; during contact, they can be used as legs to create contact-based locomotion, or as arms to manipulate objects. In free flight, they can be used as tails to contribute to dynamics, mimicking the movements of many animals. This allows for any hybridization of these dynamic modes, like the jump-flight of chicken and locusts, making Borinot an ideal open-source platform for research on hybrid aerial-contact agile motion. To demonstrate the key capabilities of Borinot, we have fitted a planar 2DoF arm and implemented whole-body torque-level model-predictive-control. The result is a capable and adaptable platform that, we believe, opens up new avenues of research in the field of agile robotics.

9.An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Authors:Ke Qiu, Jingyu Zhang, Danying Sun, Rong Xiong, Haojian Lu, Yue Wang

Abstract: Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging both the theoretical reduction and numerical correction. We derive analytical conditions to simplify the original problem into a one-dimensional problem. Further, the equivalence of the two problems is formalised. In addition, we introduce an approximation with bounded error so that the one dimension becomes traversable while the remaining parameters analytically solvable. With the theoretical results, the global search and numerical correction are employed to implement the solver. The experiments validate the better efficiency and higher success rate of our solver than the numerical methods when one solution is required, and demonstrate the ability of obtaining multiple solutions with optimal path planning in a space with obstacles.

10.An Adaptive Behaviour-Based Strategy for SARs interacting with Older Adults with MCI during a Serious Game Scenario

Authors:Eleonora Zedda, Marco Manca, Fabio Paterno, Carmen Santoro

Abstract: The monotonous nature of repetitive cognitive training may cause losing interest in it and dropping out by older adults. This study introduces an adaptive technique that enables a Socially Assistive Robot (SAR) to select the most appropriate actions to maintain the engagement level of older adults while they play the serious game in cognitive training. The goal is to develop an adaptation strategy for changing the robot's behaviour that uses reinforcement learning to encourage the user to remain engaged. A reinforcement learning algorithm was implemented to determine the most effective adaptation strategy for the robot's actions, encompassing verbal and nonverbal interactions. The simulation results demonstrate that the learning algorithm achieved convergence and offers promising evidence to validate the strategy's effectiveness.

11.3D Laser-and-tissue Agnostic Data-driven Method for Robotic Laser Surgical Planning

Authors:Guangshen Ma, Ravi Prakash, Brian Mann, Weston Ross, Patrick Codd

Abstract: In robotic laser surgery, shape prediction of an one-shot ablation cavity is an important problem for minimizing errant overcutting of healthy tissue during the course of pathological tissue resection and precise tumor removal. Since it is difficult to physically model the laser-tissue interaction due to the variety of optical tissue properties, complicated process of heat transfer, and uncertainty about the chemical reaction, we propose a 3D cavity prediction model based on an entirely data-driven method without any assumptions of laser settings and tissue properties. Based on the cavity prediction model, we formulate a novel robotic laser planning problem to determine the optimal laser incident configuration, which aims to create a cavity that aligns with the surface target (e.g. tumor, pathological tissue). To solve the one-shot ablation cavity prediction problem, we model the 3D geometric relation between the tissue surface and the laser energy profile as a non-linear regression problem that can be represented by a single-layer perceptron (SLP) network. The SLP network is encoded in a novel kinematic model to predict the shape of the post-ablation cavity with an arbitrary laser input. To estimate the SLP network parameters, we formulate a dataset of one-shot laser-phantom cavities reconstructed by the optical coherence tomography (OCT) B-scan images for the data-driven modelling. To verify the method. The learned cavity prediction model is applied to solve a simplified robotic laser planning problem modelled as a surface alignment error minimization problem. The initial results report (91.1 +- 3.0)% 3D-cavity-Intersection-over-Union (3D-cavity-IoU) for the 3D cavity prediction and an average of 97.9% success rate for the simulated surface alignment experiments.

12.Touch and deformation perception of soft manipulators with capacitive e-skins and deep learning

Authors:Delin Hu, Zhou Chen, Paul Baisamy, Zhe Liu, Francesco Giorgio-Serchi, Yunjie Yang

Abstract: Tactile sensing in soft robots remains particularly challenging because of the coupling between contact and deformation information which the sensor is subject to during actuation and interaction with the environment. This often results in severe interference and makes disentangling tactile sensing and geometric deformation difficult. To address this problem, this paper proposes a soft capacitive e-skin with a sparse electrode distribution and deep learning for information decoupling. Our approach successfully separates tactile sensing from geometric deformation, enabling touch recognition on a soft pneumatic actuator subject to both internal (actuation) and external (manual handling) forces. Using a multi-layer perceptron, the proposed e-skin achieves 99.88\% accuracy in touch recognition across a range of deformations. When complemented with prior knowledge, a transformer-based architecture effectively tracks the deformation of the soft actuator. The average distance error in positional reconstruction of the manipulator is as low as 2.905$\pm$2.207 mm, even under operative conditions with different inflation states and physical contacts which lead to additional signal variations and consequently interfere with deformation tracking. These findings represent a tangible way forward in the development of e-skins that can endow soft robots with proprioception and exteroception.

13.On the Collaborative Object Transportation Using Leader Follower Approach

Authors:Sumanta Ghosh, Subhajit Nath, Sarvesh Sortee, Lokesh Kumar, Titas Bera

Abstract: In this paper we address the multi-agent collaborative object transportation problem in a partially known environment with obstacles under a specified goal condition. We propose a leader follower approach for two mobile manipulators collaboratively transporting an object along specified desired trajectories. The proposed approach treats the mobile manipulation system as two independent subsystems: a mobile platform and a manipulator arm and uses their kinematics model for trajectory tracking. In this work we considered that the mobile platform is subject to non-holonomic constraints, with a manipulator carrying a rigid load. The desired trajectories of the end points of the load are obtained from Probabilistic RoadMap-based planning approach. Our method combines Proportional Navigation Guidance-based approach with a proposed Stop-and-Sync algorithm to reach sufficiently close to the desired trajectory, the deviation due to the non-holonomic constraints is compensated by the manipulator arm. A leader follower approach for computing inverse kinematics solution for the position of the end-effector of the manipulator arm is proposed to maintain the load rigidity. Further, we compare the proposed approach with other approaches to analyse the efficacy of our algorithm.

14.FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow

Authors:Wenchao Ding, Jieru Zhao, Yubin Chu, Haihui Huang, Tong Qin, Chunjing Xu, Yuxiang Guan, Zhongxue Gan

Abstract: There is extensive literature on perceiving road structures by fusing various sensor inputs such as lidar point clouds and camera images using deep neural nets. Leveraging the latest advance of neural architects (such as transformers) and bird-eye-view (BEV) representation, the road cognition accuracy keeps improving. However, how to cognize the ``road'' for automated vehicles where there is no well-defined ``roads'' remains an open problem. For example, how to find paths inside intersections without HD maps is hard since there is neither an explicit definition for ``roads'' nor explicit features such as lane markings. The idea of this paper comes from a proverb: it becomes a way when people walk on it. Although there are no ``roads'' from sensor readings, there are ``roads'' from tracks of other vehicles. In this paper, we propose FlowMap, a path generation framework for automated vehicles based on traffic flows. FlowMap is built by extending our previous work RoadMap, a light-weight semantic map, with an additional traffic flow layer. A path generation algorithm on traffic flow fields (TFFs) is proposed to generate human-like paths. The proposed framework is validated using real-world driving data and is amenable to generating paths for super complicated intersections without using HD maps.

15.More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Authors:Huang Huang, Antonio Loquercio, Ashish Kumar, Neerja Thakkar, Ken Goldberg, Jitendra Malik

Abstract: Is a manipulator on a legged robot a liability or an asset for locomotion? Prior works mainly designed specific controllers to account for the added payload and inertia from a manipulator. In contrast, biological systems typically benefit from additional limbs, which can simplify postural control. For instance, cats use their tails to enhance the stability of their bodies and prevent falls under disturbances. In this work, we show that a manipulator can be an important asset for maintaining balance during locomotion. To do so, we train a sensorimotor policy using deep reinforcement learning to create a synergy between the robot's limbs. This policy enables the robot to maintain stability despite large disturbances. However, learning such a controller can be quite challenging. To account for these challenges, we propose a stage-wise training procedure to learn complex behaviors. Our proposed method decomposes this complex task into three stages and then incrementally learns these tasks to arrive at a single policy capable of solving the final control task, achieving a success rate up to 2.35 times higher than baselines in simulation. We deploy our learned policy in the real world and show stability during locomotion under strong disturbances.

1.Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving With Model Predictive Path Integral Control

Authors:Hojin Lee, Taekyung Kim, Jungwi Mun, Wonsuk Lee

Abstract: High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propose a method for learning terrain-aware kinodynamic model which is conditioned on both proprioceptive and exteroceptive information. The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions without requiring ground truth force data during training. This enables the design of a safe and robust model predictive controller through appropriate cost function design which penalizes sampled trajectories with unstable motion, unsafe interactions, and high levels of uncertainty derived from the model. We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline and ensures robust high-speed driving performance without control failure.

2.Learning Flight Control Systems from Human Demonstrations and Real-Time Uncertainty-Informed Interventions

Authors:Prashant Ganesh, J. Humberto Ramos, Vinicius G. Goecks, Jared Paquet, Matthew Longmire, Nicholas R. Waytowich, Kevin Brink

Abstract: This paper describes a methodology for learning flight control systems from human demonstrations and interventions while considering the estimated uncertainty in the learned models. The proposed approach uses human demonstrations to train an initial model via imitation learning and then iteratively, improve its performance by using real-time human interventions. The aim of the interventions is to correct undesired behaviors and adapt the model to changes in the task dynamics. The learned model uncertainty is estimated in real-time via Monte Carlo Dropout and the human supervisor is cued for intervention via an audiovisual signal when this uncertainty exceeds a predefined threshold. This proposed approach is validated in an autonomous quadrotor landing task on both fixed and moving platforms. It is shown that with this algorithm, a human can rapidly teach a flight task to an unmanned aerial vehicle via demonstrating expert trajectories and then adapt the learned model by intervening when the learned controller performs any undesired maneuver, the task changes, and/or the model uncertainty exceeds a threshold

3.A Comparison of Pneumatic Actuators for Soft Growing Vine Robots

Authors:Alexander M. Kübler, Cosima du Pasquier, Andrew Low, Betim Djambazi, Nicolas Aymon, Julian Förster, Nathaniel Agharese, Roland Siegwart, Allison M. Okamura

Abstract: Soft pneumatic actuators are used to steer soft growing "vine" robots while being flexible enough to undergo the tip eversion required for growth. They also meet the requirements to steer soft growing vine robots through challenging terrain. In this study, we compared the performance of three types of pneumatic actuators in terms of their ability to perform eversion, bending, dynamic motion, and force: the pouch motor, the cylindrical pneumatic artificial muscle (cPAM), and the fabric pneumatic artificial muscle (fPAM). The pouch motor is advantageous for prototyping due to its simple manufacturing process. The cPAM exhibits superior bending behavior and produces the highest forces, while the fPAM actuates fastest and everts at the lowest pressure. We evaluated a similar range of dimensions for each actuator type. Larger actuators can produce more significant deformations and forces, but smaller actuators inflate more quickly and require a lower eversion pressure. Since vine robots are lightweight, the effect of gravity on the functionality of different actuators is minimal. We developed a new analytical model that predicts the pressure-to-bending behavior of vine robot actuators. Using the actuator results, we designed and demonstrated a 4.8 m long vine robot equipped with highly maneuverable 60x60 mm cPAMs in a three-dimensional obstacle course. The vine robot was able to move around sharp turns, travel through a passage smaller than its diameter, and lift itself against gravity.

1.Towards autonomous system: flexible modular production system enhanced with large language model agents

Authors:Yuchen Xia, Manthan Shenoy, Nasser Jazdi, Michael Weyrich

Abstract: In this paper, we present a novel framework that combines large language models (LLMs), digital twins and industrial automation system to enable intelligent planning and control of production processes. Our approach involves developing a digital twin system that contains descriptive information about the production and retrofitting the automation system to offer unified interfaces of fine-granular functionalities or skills executable by automation components or modules. Subsequently, LLM-Agents are designed to interpret descriptive information in the digital twins and control the physical system through RESTful interfaces. These LLM-Agents serve as intelligent agents within an automation system, enabling autonomous planning and control of flexible production. Given a task instruction as input, the LLM-agents orchestrate a sequence of atomic functionalities and skills to accomplish the task. We demonstrate how our implemented prototype can handle un-predefined tasks, plan a production process, and execute the operations. This research highlights the potential of integrating LLMs into industrial automation systems for more agile, flexible, and adaptive production processes, while also underscoring the critical insights and limitations for future work.

2.Adaptive Gravity Compensation Control of a Cable-Driven Upper-Arm Soft Exosuit

Authors:Joyjit Mukherjee, Ankit Chatterjee, Shreeshan Jena, Nitesh Kumar, Suriya Prakash Muthukrishnan, Sitikantha Roy, Shubhendu Bhasin

Abstract: This paper proposes an adaptive gravity compensation (AGC) control strategy for a cable-driven upper-limb exosuit intended to assist the wearer with lifting tasks. Unlike most model-based control techniques used for this human-robot interaction task, the proposed control design does not assume knowledge of the anthropometric parameters of the wearer's arm and the payload. Instead, the uncertainties in human arm parameters, such as mass, length, and payload, are estimated online using an indirect adaptive control law that compensates for the gravity moment about the elbow joint. Additionally, the AGC controller is agnostic to the desired joint trajectory followed by the human arm. For the purpose of controller design, the human arm is modeled using a 1-DOF manipulator model. Further, a cable-driven actuator model is proposed that maps the assistive elbow torque to the actuator torque. The performance of the proposed method is verified through a co-simulation, wherein the control input realized in MATLAB is applied to the human bio-mechanical model in OpenSim under varying payload conditions. Significant reductions in human effort in terms of human muscle torque and metabolic cost are observed with the proposed control strategy. Further, simulation results show that the performance of the AGC controller converges to that of the gravity compensation (GC) controller, demonstrating the efficacy of AGC-based online parameter learning.

3.Using Large Language Models for Interpreting Autonomous Robots Behaviors

Authors:Miguel A. González-Santamarta, Laura Fernández-Becerra, David Sobrín-Hidalgo, Ángel Manuel Guerrero-Higueras, Irene González, Francisco J. Rodríguez Lera

Abstract: The deployment of autonomous robots in various domains has raised significant concerns about their trustworthiness and accountability. This study explores the potential of Large Language Models (LLMs) in analyzing ROS 2 logs generated by autonomous robots and proposes a framework for log analysis that categorizes log files into different aspects. The study evaluates the performance of three different language models in answering questions related to StartUp, Warning, and PDDL logs. The results suggest that GPT 4, a transformer-based model, outperforms other models, however, their verbosity is not enough to answer why or how questions for all kinds of actors involved in the interaction.

4.Ensuring Reliable Robot Task Performance through Probabilistic Rare-Event Verification and Synthesis

Authors:Guy Scher, Sadra Sadraddini, Ariel Yadin, Hadas Kress-Gazit

Abstract: Providing guarantees on the safe operation of robots against edge cases is challenging as testing methods such as traditional Monte-Carlo require too many samples to provide reasonable statistics. Built upon recent advancements in rare-event sampling, we present a model-based method to verify if a robotic system satisfies a Signal Temporal Logic (STL) specification in the face of environment variations and sensor/actuator noises. Our method is efficient and applicable to both linear and nonlinear and even black-box systems with arbitrary, but known, uncertainty distributions. For linear systems with Gaussian uncertainties, we exploit a feature to find optimal parameters that minimize the probability of failure. We demonstrate illustrative examples on applying our approach to real-world autonomous robotic systems.

5.Employing Socially Assistive Robots in Elderly Care (longer version)

Authors:Daniel Macis, Sara Perilli, Cristina Gena

Abstract: Recently, it has been considering robotics to face world population aging. According to the WHO, in 2050 there will be about 2.1 billion people over 60 years old worldwide causing a persistent growing need of assistance and a shortage of manpower for delivering congruous assistance. Therefore, seniors' QoL is continuously threatened. Socially Assistive Robotics proposes itself as a solution. To improve SARs acceptability, it is necessary to tailor the system's characteristics with respect to the target needs and issues through the analysis of previous and current studies in the HRI field. Through the examination of the state of the art of social robotics in elderly care, past case studies and paper research about SARs' efficiency, it has been proposed two potential solution examples for two different scenarios, applying two different SARs: Pepper and Nao robots.

1.A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

Authors:Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou

Abstract: In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

2.Direct Visual Servoing Based on Discrete Orthogonal Moments

Authors:Yuhan Chen, Max Q. -H. Meng, Li Liu

Abstract: This paper proposes a new approach to achieve direct visual servoing (DVS) based on discrete orthogonal moments (DOM). DVS is conducted whereby the extraction of geometric primitives, matching and tracking steps in the conventional feature-based visual servoing pipeline can be bypassed. Although DVS enables highly precise positioning, and suffers from a small convergence domain and poor robustness, due to the high non-linearity of the cost function to be minimized and the presence of redundant data between visual features. To tackle these issues, we propose a generic and augmented framework to take DOM as visual features into consideration. Through taking Tchebichef, Krawtchouk and Hahn moments as examples, we not only present the strategies for adaptive adjusting the parameters and orders of the visual features, but also exhibit the analytical formulation of the associated interaction matrix. Simulations demonstrate the robustness and accuracy of our method, as well as the advantages over the state of the art. The real experiments have also been performed to validate the effectiveness of our approach.

3.Current Safety Legislation of Food Processing Smart Robot Systems The Red Meat Sector

Authors:Kristof Takacs, Alex Mason, Luis Eduardo Cordova-Lopez, Marta Alexy, Peter Galambos, Tamas Haidegger

Abstract: Ensuring the safety of the equipment, its environment and most importantly, the operator during robot operations is of paramount importance. Robots and complex robotic systems are appearing in more and more industrial and professional service applications. However, while mechanical components and control systems are advancing rapidly, the legislation background and standards framework for such systems and machinery are lagging behind. As part of a fundamental research work targeting industrial robots and industry 4.0 solutions for completely automated slaughtering, it was revealed that there are no particular standards addressing robotics systems applied to the agrifood domain. More specifically, within the agrifood sector, the only standards existing for the meat industry and the red meat sector are hygienic standards related to machinery. None of the identified standards or regulations consider the safety of autonomous robot operations or human robot collaborations in the abattoirs. The goal of this paper is to provide a general overview of the regulations and standards (and similar guiding documents) relevant for such applications, that could possibly be used as guidelines during the development of inherently safe robotic systems for abattoirs. Reviewing and summarizing the relevant standard and legislation landscape should also offer some instrumental help regarding the foreseen certification procedure of meat processing robots and robot cells for slaughterhouses in the near future.

4.A Distributed Online Optimization Strategy for Cooperative Robotic Surveillance

Authors:Lorenzo Pichierri, Guido Carnevale, Lorenzo Sforni, Andrea Testa, Giuseppe Notarstefano

Abstract: In this paper, we propose a distributed algorithm to control a team of cooperating robots aiming to protect a target from a set of intruders. Specifically, we model the strategy of the defending team by means of an online optimization problem inspired by the emerging distributed aggregative framework. In particular, each defending robot determines its own position depending on (i) the relative position between an associated intruder and the target, (ii) its contribution to the barycenter of the team, and (iii) collisions to avoid with its teammates. We highlight that each agent is only aware of local, noisy measurements about the location of the associated intruder and the target. Thus, in each robot, our algorithm needs to (i) locally reconstruct global unavailable quantities and (ii) predict its current objective functions starting from the local measurements. The effectiveness of the proposed methodology is corroborated by simulations and experiments on a team of cooperating quadrotors.

5.Improved path planning algorithms for non-holonomic autonomous vehicles in industrial environments with narrow corridors: Roadmap Hybrid A* and Waypoints Hybrid B*. Roadmap hybrid A* and Waypoints hybrid A* Pseudocodes

Authors:Alessandro Bonetti, Simone Guidetti, Lorenzo Sabattini

Abstract: This paper proposes two novel path planning algorithms, Roadmap Hybrid A* and Waypoints Hybrid A*, for car-like autonomous vehicles in logistics and industrial contexts with obstacles (e.g., pallets or containers) and narrow corridors. Roadmap Hybrid A* combines Hybrid A* with a graph search algorithm applied to a static roadmap. The former enables obstacle avoidance and flexibility, whereas the latter provides greater robustness, repeatability, and computational speed. Waypoint Hybrid A*, on the other hand, generates waypoints using a topological map of the environment to guide Hybrid A* to the target pose, reducing complexity and search time. Both algorithms enable predetermined control over the shape of desired parts of the path, for example, to obtain precise docking maneuvers to service machines and to eliminate unnecessary steering changes produced by Hybrid A* in corridors, thanks to the roadmap and/or the waypoints. To evaluate the performance of these algorithms, we conducted a simulation study in an industrial plant where a robot must navigate narrow corridors to serve machines in different areas. In terms of computational time, total length, reverse length path, and other metrics, both algorithms outperformed the standard Hybrid A*.

6.An Overview of Robotic Grippers

Authors:Mr Thomas J. Cairnes, Mr Christopher J. Ford, Dr Efi Psomopoulou, Professor Nathan Lepora

Abstract: The development of robotic grippers is driven by the need to execute particular manual tasks or meet specific objectives in handling operations. Grippers with specific functions vary from being small, accurate and highly controllable such as the surgical tool effectors of the Da Vinci robot (designed to be used as non-invasive grippers controlled by a human operator during keyhole surgeries) to larger, highly controllable grippers like the Shadow Dexterous Hand (designed to recreate the hand motions of a human). Additionally, there are less finely controllable grippers, such as the iRobot-Harvard-Yale (iHY) Hand or Istituto Italiano di Tecnoglia-Pisa (IIT-Pisa) Softhand, which instead leverage natural motions during grasping via designs inspired by observed bio-mechanical systems. As robotic systems become more autonomous and widely used, it is becoming increasingly important to consider the design, form and function of robotic grippers.

7.Energy Tank-based Control Framework for Satisfying the ISO/TS 15066 Constraint

Authors:Federico Benzi, Federica Ferraguti, Cristian Secchi

Abstract: The technical specification ISO/TS 15066 provides the foundational elements for assessing the safety of collaborative human-robot cells, which are the cornerstone of the modern industrial paradigm. The standard implementation of the ISO/TS 15066 procedure, however, often results in conservative motions of the robot, with consequently low performance of the cell. In this paper, we propose an energy tank-based approach that allows to directly satisfy the energetic bounds imposed by the ISO/TS 15066, thus avoiding the introduction of conservative modeling and assumptions. The proposed approach has been successfully validated in simulation.

8.Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning

Authors:Welf Rehberg, Joaquim Ortiz-Haro, Marc Toussaint, Wolfgang Hönig

Abstract: Quadrotors are agile flying robots that are challenging to control. Considering the full dynamics of quadrotors during motion planning is crucial to achieving good solution quality and small tracking errors during flight. Optimization-based methods scale well with high-dimensional state spaces and can handle dynamic constraints directly, therefore they are often used in these scenarios. The resulting optimization problem is notoriously difficult to solve due to its nonconvex constraints. In this work, we present an analysis of four solvers for nonlinear trajectory optimization (KOMO, direct collocation with SCvx, direct collocation with CasADi, Crocoddyl) and evaluate their performance in scenarios where the solvers are tasked to find minimum-effort solutions to geometrically complex problems and problems requiring highly dynamic solutions. Benchmarking these methods helps to determine the best algorithm structures for these kinds of problems.

9.SocNavGym: A Reinforcement Learning Gym for Social Navigation

Authors:Aditya Kapoor, Sushant Swamy, Luis Manso, Pilar Bachiller

Abstract: It is essential for autonomous robots to be socially compliant while navigating in human-populated environments. Machine Learning and, especially, Deep Reinforcement Learning have recently gained considerable traction in the field of Social Navigation. This can be partially attributed to the resulting policies not being bound by human limitations in terms of code complexity or the number of variables that are handled. Unfortunately, the lack of safety guarantees and the large data requirements by DRL algorithms make learning in the real world unfeasible. To bridge this gap, simulation environments are frequently used. We propose SocNavGym, an advanced simulation environment for social navigation that can generate a wide variety of social navigation scenarios and facilitates the development of intelligent social agents. SocNavGym is light-weight, fast, easy-to-use, and can be effortlessly configured to generate different types of social navigation scenarios. It can also be configured to work with different hand-crafted and data-driven social reward signals and to yield a variety of evaluation metrics to benchmark agents' performance. Further, we also provide a case study where a Dueling-DQN agent is trained to learn social-navigation policies using SocNavGym. The results provides evidence that SocNavGym can be used to train an agent from scratch to navigate in simple as well as complex social scenarios. Our experiments also show that the agents trained using the data-driven reward function displays more advanced social compliance in comparison to the heuristic-based reward function.

10.The CRAM Cognitive Architecture for Robot Manipulation in Everyday Activities

Authors:Michael Beetz, Gayane Kazhoyan, David Vernon

Abstract: This paper presents a hybrid robot cognitive architecture, CRAM, that enables robot agents to accomplish everyday manipulation tasks. It addresses five key challenges that arise when carrying out everyday activities. These include (i) the underdetermined nature of task specification, (ii) the generation of context-specific behavior, (iii) the ability to make decisions based on knowledge, experience, and prediction, (iv) the ability to reason at the levels of motions and sensor data, and (v) the ability to explain actions and the consequences of these actions. We explore the computational foundations of the CRAM cognitive model: the self-programmability entailed by physical symbol systems, the CRAM plan language, generalized action plans and implicit-to-explicit manipulation, generative models, digital twin knowledge representation & reasoning, and narrative-enabled episodic memories. We describe the structure of the cognitive architecture and explain the process by which CRAM transforms generalized action plans into parameterized motion plans. It does this using knowledge and reasoning to identify the parameter values that maximize the likelihood of successfully accomplishing the action. We demonstrate the ability of a CRAM-controlled robot to carry out everyday activities in a kitchen environment. Finally, we consider future extensions that focus on achieving greater flexibility through transformational learning and metacognition.

11.Singularity Distance Computations for 3-RPR Manipulators using Extrinsic Metrics

Authors:Aditya Kapilavai, Georg Nawratil

Abstract: It is well-known that parallel manipulators are prone to singularities. There is still a lack of distance evaluation functions, referred to as metrics, for computing the distance between two 3-RPR configurations. The presented extrinsic metrics take the combinatorial structure of the manipulator into account as well as different design options. Using these extrinsic metrics, we formulate constrained optimization problems, which aim to find the closest singular configurations for a given non-singular configuration. The solution of the corresponding system of polynomial equations relies on algorithms from numerical algebraic geometry implemented in the software package Bertini. Moreover, we developed a computational pipeline for computing the singularity distance along a 1-parametric motion of the manipulator. To facilitate these computations for the user, an open-source interface is developed between software packages Maple, Bertini, and Paramotopy. The presented approach is demonstrated based on a numerical example.

12.Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation

Authors:Kaizhong Deng, Baoru Huang, Daniel S. Elson

Abstract: The increasing prevalence of prostate cancer has led to the widespread adoption of Robotic-Assisted Surgery (RAS) as a treatment option. Sentinel lymph node biopsy (SLNB) is a crucial component of prostate cancer surgery and requires accurate diagnostic evidence. This procedure can be improved by using a drop-in gamma probe, SENSEI system, to distinguish cancerous tissue from normal tissue. However, manual control of the probe using live gamma level display and audible feedback could be challenging for inexperienced surgeons, leading to the potential for missed detections. In this study, a deep imitation training workflow was proposed to automate the radioactive node detection procedure. The proposed training workflow uses simulation data to train an end-to-end vision-based gamma probe manipulation agent. The evaluation results showed that the proposed approach was capable to predict the next-step action and holds promise for further improvement and extension to a hardware setup.

13.Double-Deck Multi-Agent Pickup and Delivery: Multi-Robot Rearrangement in Large-Scale Warehouses

Authors:Baiyu Li, Hang Ma

Abstract: We introduce a new problem formulation, Double-Deck Multi-Agent Pickup and Delivery (DD-MAPD), which models the multi-robot shelf rearrangement problem in automated warehouses. DD-MAPD extends both Multi-Agent Pickup and Delivery (MAPD) and Multi-Agent Path Finding (MAPF) by allowing agents to move beneath shelves or lift and deliver a shelf to an arbitrary location, thereby changing the warehouse layout. We show that solving DD-MAPD is NP-hard. To tackle DD-MAPD, we propose MAPF-DECOMP, an algorithmic framework that decomposes a DD-MAPD instance into a MAPF instance for coordinating shelf trajectories and a subsequent MAPD instance with task dependencies for computing paths for agents. We also present an optimization technique to improve the performance of MAPF-DECOMP and demonstrate how to make MAPF-DECOMP complete for well-formed DD-MAPD instances, a realistic subclass of DD-MAPD instances. Our experimental results demonstrate the efficiency and effectiveness of MAPF-DECOMP, with the ability to compute high-quality solutions for large-scale instances with over one thousand shelves and hundreds of agents in just minutes of runtime.

14.SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

Authors:Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan

Abstract: With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

15.SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

Authors:John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Deva Ramanan, Zachary Manchester

Abstract: We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

16.Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement

Authors:Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Abstract: Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energy functions over relative object arrangements. A language parser maps instructions to corresponding energy functions and an open-vocabulary visual-language model grounds their arguments to relevant objects in the scene. We generate goal scene configurations by gradient descent on the sum of energy functions, one per language predicate in the instruction. Local vision-based policies then relocate objects to the inferred goal locations. We test our model on established instruction-guided manipulation benchmarks, as well as benchmarks of compositional instructions we introduce. We show our model can execute highly compositional instructions zero-shot in simulation and in the real world. It outperforms language-to-action reactive policies and Large Language Model planners by a large margin, especially for long instructions that involve compositions of multiple spatial concepts.

1.An Adaptive Control Strategy for Neural Network based Optimal Quadcopter Controllers

Authors:Robin Ferede, Guido C. H. E. de Croon, Christophe De Wagter, Dario Izzo

Abstract: Developing optimal controllers for aggressive high-speed quadcopter flight is a major challenge in the field of robotics. Recent work has shown that neural networks trained with supervised learning can achieve real-time optimal control in some specific scenarios. In these methods, the networks (termed G&CNets) are trained to learn the optimal state feedback from a dataset of optimal trajectories. An important problem with these methods is the reality gap encountered in the sim-to-real transfer. In this work, we trained G&CNets for energy-optimal end-to-end control on the Bebop drone and identified the unmodeled pitch moment as the main contributor to the reality gap. To mitigate this, we propose an adaptive control strategy that works by learning from optimal trajectories of a system affected by constant external pitch, roll and yaw moments. In real test flights, this model mismatch is estimated onboard and fed to the network to obtain the optimal rpm command. We demonstrate the effectiveness of our method by performing energy-optimal hover-to-hover flights with and without moment feedback. Finally, we compare the adaptive controller to a state-of-the-art differential-flatness-based controller in a consecutive waypoint flight and demonstrate the advantages of our method in terms of energy optimality and robustness.

2.Hydra-Multi: Collaborative Online Construction of 3D Scene Graphs with Multi-Robot Teams

Authors:Yun Chang, Nathan Hughes, Aaron Ray, Luca Carlone

Abstract: 3D scene graphs have recently emerged as an expressive high-level map representation that describes a 3D environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction (e.g., objects, rooms, buildings) and edges represent relations between concepts (e.g., inclusion, adjacency). This paper describes Hydra-Multi, the first multi-robot spatial perception system capable of constructing a multi-robot 3D scene graph online from sensor data collected by robots in a team. In particular, we develop a centralized system capable of constructing a joint 3D scene graph by taking incremental inputs from multiple robots, effectively finding the relative transforms between the robots' frames, and incorporating loop closure detections to correctly reconcile the scene graph nodes from different robots. We evaluate Hydra-Multi on simulated and real scenarios and show it is able to reconstruct accurate 3D scene graphs online. We also demonstrate Hydra-Multi's capability of supporting heterogeneous teams by fusing different map representations built by robots with different sensor suites.

3.Thermal Vision for Soil Assessment in a Multipurpose Environmental Chamber under Martian Conditions towards Robot Navigation

Authors:Raúl Castilla-Arquillo, Anthony Mandow, Carlos J. Pérez-del-Pulgar, César Álvarez-Llamas, José M. Vadillo, Javier Laserna

Abstract: Soil assessment is important for mobile robot planning and navigation on natural and planetary environments. Terramechanic characteristics can be inferred from the thermal behaviour of soils under the influence of sunlight using remote sensors such as Long-Wave Infrared cameras. However, this behaviour is greatly affected by the low atmospheric pressures of planets such as Mars, so practical models are needed to relate robot remote sensing data on Earth to target planetary exploration conditions. This article proposes a general framework based on multipurpose environmental chambers to generate representative diurnal cycle dataset pairs that can be useful to relate the thermal behaviour of a soil on Earth to the corresponding behaviour under planetary pressure conditions using remote sensing. Furthermore, we present an application of the proposed framework to generate datasets using the UMA-Laserlab chamber, which can replicate the atmospheric \ch{CO2} composition of Mars. In particular, we analyze the thermal behaviour of four soil samples of different granularity by comparing replicated Martian surface conditions and their Earth's diurnal cycle equivalent. Results indicate a correlation between granularity and thermal inertia that is consistent with available Mars surface measurements recorded by rovers. The resulting dataset pairs, consisting of representative diurnal cycle thermal images with heater, air, and subsurface temperatures, have been made available for the scientific community.

4.Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Authors:Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess

Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

5.Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning

Authors:Selma Wanna, Fabian Parra, Robert Valner, Karl Kruusamäe, Mitch Pryor

Abstract: Recent advances in generative modeling have spurred a resurgence in the field of Embodied Artificial Intelligence (EAI). EAI systems typically deploy large language models to physical systems capable of interacting with their environment. In our exploration of EAI for industrial domains, we successfully demonstrate the feasibility of co-located, human-robot teaming. Specifically, we construct an experiment where an Augmented Reality (AR) headset mediates information exchange between an EAI agent and human operator for a variety of inspection tasks. To our knowledge the use of an AR headset for multimodal grounding and the application of EAI to industrial tasks are novel contributions within Embodied AI research. In addition, we highlight potential pitfalls in EAI's construction by providing quantitative and qualitative analysis on prompt robustness.

1.AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments

Authors:Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung

Abstract: In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.

2.Using Intent Estimation and Decision Theory to Support Lifting Motions with a Quasi-Passive Hip Exoskeleton

Authors:Thomas Callens, Vincent Ducastel, Joris De Schutter, Erwin Aertbeliën

Abstract: This paper compares three controllers for quasi-passive exoskeletons. The Utility Maximizing Controller (UMC) uses intent estimation to recognize user motions and decision theory to activate the support mechanism. The intent estimation algorithm requires demonstrations for each motion to be recognized. Depending on what motion is recognized, different control signals are sent to the exoskeleton. The Extended UMC (E-UMC) adds a calibration step and a velocity module to trigger the UMC. As a benchmark, and to compare the behavior of the controllers irrespective of the hardware, a Passive Exoskeleton Controller (PEC) is developed as well. The controllers were implemented on a hip exoskeleton and evaluated in a user study consisting of two phases. First, demonstrations of three motions were recorded: squat, stoop left and stoop right. Afterwards, the controllers were evaluated. The E-UMC combines benefits from the UMC and the PEC, confirming the need for the two extensions. The E-UMC discriminates between the three motions and does not generate false positives for previously unseen motions such as stair walking. The proposed methods can also be applied to support other motions.

3.Towards a generalizable simulation framework to study collisions between spacecraft and debris

Authors:Simone Asci, Angadh Nanjangud

Abstract: In recent years, computer simulators of rigid-body systems have been successfully used to improve and expand the field of developing new space robots, becoming a leading tool for the preliminary investigation and evaluation of space robotic missions. However, the impressive progress in performance has not been matched yet by an improvement in modelling capabilities, which remain limited to very basic representations of real systems. We present a new approach to modelling and simulation of collision-inclusive multibody dynamics by leveraging symbolic models generated by a computer algebra system (CAS). While similar investigations into contact dynamics on other domains exploit pre-existing models of common multibody systems (e.g., industrial robot arms, humanoids, and wheeled robots), our focus is on allowing researchers to develop models of novel designs of systems that are not as common or yet to be fabricated: e.g., small spacecraft manipulators. In this paper, we demonstrate the usefulness of our approach to investigate spacecraft-debris collision dynamics.

4.Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Authors:Dongkun Zhang, Jintao Xue, Yuxiang Cui, Yunkai Wang, Eryun Liu, Wei Jing, Junbo Chen, Rong Xiong, Yue Wang

Abstract: Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method.

5.Direct Collocation Methods for Trajectory Optimization in Constrained Robotic Systems

Authors:Ricard Bordalba, Tobias Schoels, Lluís Ros, Josep M. Porta, Moritz Diehl

Abstract: Direct collocation methods are powerful tools to solve trajectory optimization problems in robotics. While their resulting trajectories tend to be dynamically accurate, they may also present large kinematic errors in the case of constrained mechanical systems, i.e., those whose state coordinates are subject to holonomic or nonholonomic constraints, like loop-closure or rolling-contact constraints. These constraints confine the robot trajectories to an implicitly-defined manifold, which complicates the computation of accurate solutions. Discretization errors inherent to the transcription of the problem easily make the trajectories drift away from this manifold, which results in physically inconsistent motions that are difficult to track with a controller. This paper reviews existing methods to deal with this problem and proposes new ones to overcome their limitations. Current approaches either disregard the kinematic constraints (which leads to drift accumulation) or modify the system dynamics to keep the trajectory close to the manifold (which adds artificial forces or energy dissipation to the system). The methods we propose, in contrast, achieve full drift elimination on the discrete trajectory, or even along the continuous one, without artificial modifications of the system dynamics. We illustrate and compare the methods using various examples of different complexity.

1.Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving

Authors:Shunli Ren, Zixing Lei, Zi Wang, Mehrdad Dianati, Yafei Wang, Siheng Chen, Wenjun Zhang

Abstract: Cooperative perception enabled by V2X Communication technologies can significantly improve the perception performance of autonomous vehicles beyond the limited perception ability of the individual vehicles, therefore, improving the safety and efficiency of autonomous driving in intelligent transportation systems. However, in order to fully reap the benefits of cooperative perception in practice, the impacts of imperfect V2X communication, i.e., communication errors and disruptions, need to be understood and effective remedies need to be developed to alleviate their adverse impacts. Motivated by this need, we propose a novel INterruption-aware robust COoperative Perception (V2X-INCOP) solution for V2X communication-aided autonomous driving, which leverages historical information to recover missing information due to interruption. To achieve comprehensive recovery, we design a communication adaptive multi-scale spatial-temporal prediction model to extract multi-scale spatial-temporal features based on V2X communication conditions and capture the most significant information for the prediction of the missing information. To further improve recovery performance, we adopt a knowledge distillation framework to give direct supervision to the prediction model and a curriculum learning strategy to stabilize the training of the model. Our experiments on three public cooperative perception datasets demonstrate that our proposed method is effective in alleviating the impacts of communication interruption on cooperative perception.

2.Controlled illumination for perception and manipulation of Lambertian objects

Authors:Arkadeep Narayan Chaudhury, Christopher G. Atkeson

Abstract: Controlling illumination can generate high quality information about object surface normals and depth discontinuities at a low computational cost. In this work we demonstrate a robot workspace-scaled controlled illumination approach that generates high quality information for table top scale objects for robotic manipulation. With our low angle of incidence directional illumination approach we can precisely capture surface normals and depth discontinuities of Lambertian objects. We demonstrate three use cases of our approach for robotic manipulation. We show that 1) by using the captured information we can perform general purpose grasping with a single point vacuum gripper, 2) we can visually measure the deformation of known objects, and 3) we can estimate pose of known objects and track unknown objects in the robot's workspace. Additional demonstrations of the results presented in the work can be viewed on the project webpage

3.Fault-tolerant Control of Over-actuated UAV Platform under Propeller Failure

Authors:Yao Su, Pengkang Yu, Matthew J. Gerber, Lecheng Ruan, Tsu-Chin Tsao

Abstract: Propeller failure is one major reason for the falling and crashing of multirotor Unmanned Aerial Vehicles (UAVs). While conventional multirotors can barely handle this issue due to underactuation, over-actuated platforms can still pursue the flight with proper fault-tolerant control (FTC). This paper investigates such a controller for one such over-actuated multirotor aerial platform composing quadcopters mounted on passive joints with input redundancy in both the high-level vehicle control and the low-level quadcopter control of vectored thrusts. To fully utilize the input redundancies of the whole platform under propeller failure, our proposed FTC controller has a hierarchical control architecture with three main components: (i) a low-level adjustment strategy to avoid propeller-level thrust saturation; (ii) a compensation loop to attenuate introduced disturbance; (iii) a nullspace-based control allocation framework to avoid quadcopter-level thrust saturation. Through reallocating actuator inputs in both the low-level and high-level control loops, the low-level quadcopter control can be maintained with at most two failed propellers and the whole platform can be stabilized without crashing. The proposed controller is extensively studied in both simulation and real-world experiments to demonstrate its superior performance.

4.Open Continuum Robotics -- One Actuation Module to Create them All

Authors:Reinhard M. Grassmann, Chengnan Shentu, Taqi Hamoda, Puspita Triana Dewi, Jessica Burgner-Kahrs

Abstract: Experiments on physical continuum robot are the gold standard for evaluations. Currently, as no commercial continuum robot platform is available, a large variety of early-stage prototypes exists. These prototypes are developed by individual research groups and are often used for a single publication. Thus, a significant amount of time is devoted to creating proprietary hardware and software hindering the development of a common platform, and shifting away scarce time and efforts from the main research challenges. We address this problem by proposing an open-source actuation module, which can be used to build different types of continuum robots. It consists of a high-torque brushless electric motor, a high resolution optical encoder, and a low-gear-ratio transmission. For this letter, we create three different types of continuum robots. In addition, we illustrate, for the first time, that continuum robots built with our actuation module can proprioceptively detect external forces. Consequently, our approach opens untapped and under-investigated research directions related to the dynamics and advanced control of continuum robots, where sensing the generalized flow and effort is mandatory. Besides that, we democratize continuum robots research by providing open-source software and hardware with our initiative called the Open Continuum Robotics Project, to increase the accessibility and reproducibility of advanced methods.

5.A Spatial Calibration Method for Robust Cooperative Perception

Authors:Zhiying Song, Tenghui Xie, Hailiang Zhang, Fuxi Wen, Jun Li

Abstract: Cooperative perception is a promising technique for enhancing the perception capabilities of automated vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. Moreover, signal drift resulting from factors such as occlusion and multipath effects can compromise the stability of the positioning information. Hence, a low-cost and robust method is required to calibrate relative pose information for multi-agent cooperative perception. In this paper, we propose a simple but effective inter-agent object association approach (CBM), which constructs contexts using the detected bounding boxes, followed by local context matching and global consensus maximization. Based on the matched correspondences, optimal relative pose transform is estimated, followed by cooperative perception fusion. Extensive experimental studies are conducted on both the simulated and real-world datasets, high object association precision and decimeter level relative pose calibration accuracy is achieved among the cooperating agents even with larger inter-agent localization errors. Furthermore, the proposed approach outperforms the state-of-the-art methods in terms of object association and relative pose estimation accuracy, as well as the robustness of cooperative perception against the pose errors of the connected agents. The code will be available at

6.When to Replan? An Adaptive Replanning Strategy for Autonomous Navigation using Deep Reinforcement Learning

Authors:Kohei Honda, Ryo Yonetani, Mai Nishimura, Tadashi Kozuno

Abstract: The hierarchy of global and local planners is one of the most commonly utilized system designs in robot autonomous navigation. While the global planner generates a reference path from the current to goal locations based on the pre-built static map, the local planner produces a collision-free, kinodynamic trajectory to follow the reference path while avoiding perceived obstacles. The reference path should be replanned regularly to accommodate new obstacles that were absent in the pre-built map, but when to execute replanning remains an open question. In this work, we conduct an extensive simulation experiment to compare various replanning strategies and confirm that effective strategies highly depend on the environment as well as on the global and local planners. We then propose a new adaptive replanning strategy based on deep reinforcement learning, where an agent learns from experiences to decide appropriate replanning timings in the given environment and planning setups. Our experimental results demonstrate that the proposed replanning agent can achieve performance on par or even better than current best-performing strategies across multiple situations in terms of navigation robustness and efficiency.

7.Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning

Authors:Simón C. Smith, Bryan Lim, Hannah Janmohamed, Antoine Cully

Abstract: Learning algorithms, like Quality-Diversity (QD), can be used to acquire repertoires of diverse robotics skills. This learning is commonly done via computer simulation due to the large number of evaluations required. However, training in a virtual environment generates a gap between simulation and reality. Here, we build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour and improve sample efficiency. A behaviour selection policy filters out uninteresting or unsafe policies predicted by the model. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning. We demonstrate that our method enables a physical quadruped robot to learn a repertoire of behaviours in two hours without human supervision. We successfully test the solution repertoire using a maze navigation task. Finally, we compare our approach to the MAP-Elites algorithm. We show that dynamics awareness and a recovery policy are required for training on a physical robot for optimal archive generation. Video available at

8.UAV Tracking with Solid-State Lidars:Dynamic Multi-Frequency Scan Integration

Authors:Iacopo Catalano, Ha Sier, Xianjia Yu, Jorge Pena Queralta, Tomi Westerlund

Abstract: With the increasing use of drones across various industries, the navigation and tracking of these unmanned aerial vehicles (UAVs) in challenging environments (such as GNSS-denied environments) have become critical issues. In this paper, we propose a novel method for a ground-based UAV tracking system using a solid-state LiDAR, which dynamically adjusts the LiDAR frame integration time based on the distance to the UAV and its speed. Our method fuses two simultaneous scan integration frequencies for high accuracy and persistent tracking, enabling reliable estimates of the UAV state even in challenging scenarios. The use of the Inverse Covariance Intersection method and Kalman filters allow for better tracking accuracy and can handle challenging tracking scenarios. We have performed a number of experiments for evaluating the performance of the proposed tracking system and identifying its limitations. Our experimental results demonstrate that the proposed method achieves comparable tracking performance to the established baseline method, while also providing more reliable and accurate tracking when only one of the frequencies is available or unreliable.

9.Microgravity Induces Overconfidence in Perceptual Decision-making

Authors:Leyla Loued-Khenissi, Christian Pfeiffer, Rupal Saxena, Shivam Adarsh, Davide Scaramuzza

Abstract: Does gravity affect decision-making? This question comes into sharp focus as plans for interplanetary human space missions solidify. In the framework of Bayesian brain theories, gravity encapsulates a strong prior, anchoring agents to a reference frame via the vestibular system, informing their decisions and possibly their integration of uncertainty. What happens when such a strong prior is altered? We address this question using a self-motion estimation task in a space analog environment under conditions of altered gravity. Two participants were cast as remote drone operators orbiting Mars in a virtual reality environment on board a parabolic flight, where both hyper- and microgravity conditions were induced. From a first-person perspective, participants viewed a drone exiting a cave and had to first predict a collision and then provide a confidence estimate of their response. We evoked uncertainty in the task by manipulating the motion's trajectory angle. Post-decision subjective confidence reports were negatively predicted by stimulus uncertainty, as expected. Uncertainty alone did not impact overt behavioral responses (performance, choice) differentially across gravity conditions. However microgravity predicted higher subjective confidence, especially in interaction with stimulus uncertainty. These results suggest that variables relating to uncertainty affect decision-making distinctly in microgravity, highlighting the possible need for automatized, compensatory mechanisms when considering human factors in space research.

10.USA-Net: Unified Semantic and Affordance Representations for Robot Memory

Authors:Benjamin Bolte, Austin Wang, Jimmy Yang, Mustafa Mukadam, Mrinal Kalakrishnan, Chris Paxton

Abstract: In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present "method", a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website:

11.Model-Based Pose Estimation of Steerable Catheters under Bi-Plane Image Feedback

Authors:Jared Lawson, Rohan Chitale, Nabil Simaan

Abstract: Small catheters undergo significant torsional deflections during endovascular interventions. A key challenge in enabling robot control of these catheters is the estimation of their bending planes. This paper considers approaches for estimating these bending planes based on bi-plane image feedback. The proposed approaches attempt to minimize error between either the direct (position-based) or instantaneous (velocity-based) kinematics with the reconstructed kinematics from bi-plane image feedback. A comparison between these methods is carried out on a setup using two cameras in lieu of a bi-plane fluoroscopy setup. The results show that the position-based approach is less susceptible to segmentation noise and works best when the segment is in a non-straight configuration. These results suggest that estimation of the bending planes can be accompanied with errors under 30 degrees. Considering that the torsional buildup of these catheters can be more than 180 degrees, we believe that this method can be used for catheter control with improved safety due to the reduction of this uncertainty.

12.MOTLEE: Distributed Mobile Multi-Object Tracking with Localization Error Elimination

Authors:Mason B. Peterson, Parker C. Lusk, Jonathan P. How

Abstract: We present MOTLEE, a distributed mobile multi-object tracking algorithm that enables a team of robots to collaboratively track moving objects in the presence of localization error. Existing approaches to distributed tracking assume either a static sensor network or that perfect localization is available. Instead, we develop algorithms based on the Kalman-Consensus filter for distributed tracking that are uncertainty-aware and properly leverage localization uncertainty. Our method maintains an accurate understanding of dynamic objects in an environment by realigning robot frames and incorporating uncertainty of frame misalignment into our object tracking formulation. We evaluate our method in hardware on a team of three mobile ground robots tracking four people. Compared to previous works that do not account for localization error, we show that MOTLEE is resilient to localization uncertainties.

13.Mono Video-Based AI Corridor for Model-Free Detection of Collision-Relevant Obstacles

Authors:Thomas Michalke, Yassin Kaddar, Thomas Nürnberg, Linh Kästner, Jens Lambrecht

Abstract: The detection of previously unseen, unexpected obstacles on the road is a major challenge for automated driving systems. Different from the detection of ordinary objects with pre-definable classes, detecting unexpected obstacles on the road cannot be resolved by upscaling the sensor technology alone (e.g., high resolution video imagers / radar antennas, denser LiDAR scan lines). This is due to the fact, that there is a wide variety in the types of unexpected obstacles that also do not share a common appearance (e.g., lost cargo as a suitcase or bicycle, tire fragments, a tree stem). Also adding object classes or adding \enquote{all} of these objects to a common \enquote{unexpected obstacle} class does not scale. In this contribution, we study the feasibility of using a deep learning video-based lane corridor (called \enquote{AI ego-corridor}) to ease the challenge by inverting the problem: Instead of detecting a previously unseen object, the AI ego-corridor detects that the ego-lane ahead ends. A smart ground-truth definition enables an easy feature-based classification of an abrupt end of the ego-lane. We propose two neural network designs and research among other things the potential of training with synthetic data. We evaluate our approach on a test vehicle platform. It is shown that the approach is able to detect numerous previously unseen obstacles at a distance of up to 300 m with a detection rate of 95 %.

14.Robots Taking Initiative in Collaborative Object Manipulation: Lessons from Physical Human-Human Interaction

Authors:Zhanibek Rysbek, Ki Hwan Oh, Afagh Mehri Shervedani, Timotej Klemencic, Milos Zefran, Barbara Di Eugenio

Abstract: Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established. Nevertheless, communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, which is an integral part of successful collaboration. Our objective is to use the insights to inform the design of controllers for robot assistants. Specifically, we want to enable robots to take the lead in collaboration. To achieve this goal, we conducted a study to observe how humans behave during collaborative manipulation tasks. During our preliminary data analysis, we discovered several new features that help us better understand how the interaction progresses. From these features, we identified distinct patterns in the data that indicate when a participant is expressing their intent. Our study provides valuable insight into how humans collaborate physically, which can help us design robots that behave more like humans in such scenarios.

1.Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Authors:Hongcheng Wang, Yuxuan Wang, Fangwei Zhong, Mingdong Wu, Jianwei Zhang, Yizhou Wang, Hao Dong

Abstract: Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.

2.Contrastive Language, Action, and State Pre-training for Robot Learning

Authors:Krishan Rana, Andrew Melnik, Niko Sünderhauf

Abstract: In this paper, we introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning. Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment. By employing distributional outputs for both text and behaviour encoders, our model effectively associates diverse textual commands with a single behaviour and vice-versa. We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior for language-conditioned reinforcement learning. Our distributional encoders exhibit superior retrieval and captioning performance on unseen datasets, and the ability to generate meaningful exploratory behaviours from textual commands, capturing the intricate relationships between language, action, and state. This work represents an initial step towards developing a unified pre-trained model for robotics, with the potential to generalise to a broad range of downstream tasks.

3.A Comprehensive Review on Ontologies for Scenario-based Testing in the Context of Autonomous Driving

Authors:Maximilian Zipfl, Nina Koch, J. Marius Zöllner

Abstract: The verification and validation of autonomous driving vehicles remains a major challenge due to the high complexity of autonomous driving functions. Scenario-based testing is a promising method for validating such a complex system. Ontologies can be utilized to produce test scenarios that are both meaningful and relevant. One crucial aspect of this process is selecting the appropriate method for describing the entities involved. The level of detail and specific entity classes required will vary depending on the system being tested. It is important to choose an ontology that properly reflects these needs. This paper summarizes key representative ontologies for scenario-based testing and related use cases in the field of autonomous driving. The considered ontologies are classified according to their level of detail for both static facts and dynamic aspects. Furthermore, the ontologies are evaluated based on the presence of important entity classes and the relations between them.

4.Inverse Universal Traffic Quality -- a Criticality Metric for Crowded Urban Traffic Scenes

Authors:Barbara Schütt, Maximilian Zipfl, J. Marius Zöllner, Eric Sax

Abstract: An essential requirement for scenario-based testing the identification of critical scenes and their associated scenarios. However, critical scenes, such as collisions, occur comparatively rarely. Accordingly, large amounts of data must be examined. A further issue is that recorded real-world traffic often consists of scenes with a high number of vehicles, and it can be challenging to determine which are the most critical vehicles regarding the safety of an ego vehicle. Therefore, we present the inverse universal traffic quality, a criticality metric for urban traffic independent of predefined adversary vehicles and vehicle constellations such as intersection trajectories or car-following scenarios. Our metric is universally applicable for different urban traffic situations, e.g., intersections or roundabouts, and can be adjusted to certain situations if needed. Additionally, in this paper, we evaluate the proposed metric and compares its result to other well-known criticality metrics of this field, such as time-to-collision or post-encroachment time.

5.1001 Ways of Scenario Generation for Testing of Self-driving Cars: A Survey

Authors:Barbara Schütt, Joshua Ransiek, Thilo Braun, Eric Sax

Abstract: Scenario generation is one of the essential steps in scenario-based testing and, therefore, a significant part of the verification and validation of driver assistance functions and autonomous driving systems. However, the term scenario generation is used for many different methods, e.g., extraction of scenarios from naturalistic driving data or variation of scenario parameters. This survey aims to give a systematic overview of different approaches, establish different categories of scenario acquisition and generation, and show that each group of methods has typical input and output types. It shows that although the term is often used throughout literature, the evaluated methods use different inputs and the resulting scenarios differ in abstraction level and from a systematical point of view. Additionally, recent research and literature examples are given to underline this categorization.

6.AMP in the wild: Learning robust, agile, natural legged locomotion skills

Authors:Yikai Wang, Zheyuan Jiang, Jianyu Chen

Abstract: The successful transfer of a learned controller from simulation to the real world for a legged robot requires not only the ability to identify the system, but also accurate estimation of the robot's state. In this paper, we propose a novel algorithm that can infer not only information about the parameters of the dynamic system, but also estimate important information about the robot's state from previous observations. We integrate our algorithm with Adversarial Motion Priors and achieve a robust, agile, and natural gait in both simulation and on a Unitree A1 quadruped robot in the real world. Empirical results demonstrate that our proposed algorithm enables traversing challenging terrains with lower power consumption compared to the baselines. Both qualitative and quantitative results are presented in this paper.

7.Online Time-Optimal Trajectory Planning on Three-Dimensional Race Tracks

Authors:Matthias Rowold, Levent Ögretmen, Ulf Kasolowsky, Boris Lohmann

Abstract: We propose an online planning approach for racing that generates the time-optimal trajectory for the upcoming track section. The resulting trajectory takes the current vehicle state, effects caused by \acl{3D} track geometries, and speed limits dictated by the race rules into account. In each planning step, an optimal control problem is solved, making a quasi-steady-state assumption with a point mass model constrained by gg-diagrams. For its online applicability, we propose an efficient representation of the gg-diagrams and identify negligible terms to reduce the computational effort. We demonstrate that the online planning approach can reproduce the lap times of an offline-generated racing line during single vehicle racing. Moreover, it finds a new time-optimal solution when a deviation from the original racing line is necessary, e.g., during an overtaking maneuver. Motivated by the application in a rule-based race, we also consider the scenario of a speed limit lower than the current vehicle velocity. We introduce an initializable slack variable to generate feasible trajectories despite the constraint violation while reducing the velocity to comply with the rules.

8.IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty

Authors:Dongliang Zheng, Panagiotis Tsiotras

Abstract: In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. We solve the deterministic planning problem using sampling-based methods such as PRM or RRG to construct a graph of nominal trajectories. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching over the graph using the proposed heuristic. IBBT interleaves between batch state sampling, nominal trajectory graph construction, heuristic computing, and search over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds motion plans that converge to the optimal one. IBBT is efficient by reusing results between sequential iterations. The belief tree searching is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.

9.RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments

Authors:Jianheng Liu, Xuanfu Li, Yueqian Liu, Haoyao Chen

Abstract: Current simultaneous localization and mapping (SLAM) algorithms perform well in static environments but easily fail in dynamic environments. Recent works introduce deep learning-based semantic information to SLAM systems to reduce the influence of dynamic objects. However, it is still challenging to apply a robust localization in dynamic environments for resource-restricted robots. This paper proposes a real-time RGB-D inertial odometry system for resource-restricted robots in dynamic environments named Dynamic-VINS. Three main threads run in parallel: object detection, feature tracking, and state optimization. The proposed Dynamic-VINS combines object detection and depth information for dynamic feature recognition and achieves performance comparable to semantic segmentation. Dynamic-VINS adopts grid-based feature detection and proposes a fast and efficient method to extract high-quality FAST feature points. IMU is applied to predict motion for feature tracking and moving consistency check. The proposed method is evaluated on both public datasets and real-world applications and shows competitive localization accuracy and robustness in dynamic environments. Yet, to the best of our knowledge, it is the best-performance real-time RGB-D inertial odometry for resource-restricted platforms in dynamic environments for now. The proposed system is open source at:

10.Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation

Authors:Iris Andrussow, Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius

Abstract: Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human-robot interaction.

11.Multi-level decision framework collision avoidance algorithm in emergency scenarios

Authors:Guoying Chen, Xinyu Wang, Min Hua, Wei Liu

Abstract: With the rapid development of autonomous driving, the attention of academia has increasingly focused on the development of anti-collision systems in emergency scenarios, which have a crucial impact on driving safety. While numerous anti-collision strategies have emerged in recent years, most of them only consider steering or braking. The dynamic and complex nature of the driving environment presents a challenge to developing robust collision avoidance algorithms in emergency scenarios. To address the complex, dynamic obstacle scene and improve lateral maneuverability, this paper establishes a multi-level decision-making obstacle avoidance framework that employs the safe distance model and integrates emergency steering and emergency braking to complete the obstacle avoidance process. This approach helps avoid the high-risk situation of vehicle instability that can result from the separation of steering and braking actions. In the emergency steering algorithm, we define the collision hazard moment and propose a multi-constraint dynamic collision avoidance planning method that considers the driving area. Simulation results demonstrate that the decision-making collision avoidance logic can be applied to dynamic collision avoidance scenarios in complex traffic situations, effectively completing the obstacle avoidance task in emergency scenarios and improving the safety of autonomous driving.

12.Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

Authors:Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du

Abstract: Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.

1.Reinforcement Learning for Picking Cluttered General Objects with Dense Object Descriptors

Authors:Hoang-Giang Cao, Weihao Zeng, I-Chen Wu

Abstract: Picking cluttered general objects is a challenging task due to the complex geometries and various stacking configurations. Many prior works utilize pose estimation for picking, but pose estimation is difficult on cluttered objects. In this paper, we propose Cluttered Objects Descriptors (CODs), a dense cluttered objects descriptor that can represent rich object structures, and use the pre-trained CODs network along with its intermediate outputs to train a picking policy. Additionally, we train the policy with reinforcement learning, which enable the policy to learn picking without supervision. We conduct experiments to demonstrate that our CODs is able to consistently represent seen and unseen cluttered objects, which allowed for the picking policy to robustly pick cluttered general objects. The resulting policy can pick 96.69% of unseen objects in our experimental environment which is twice as cluttered as the training scenarios.

2.Attitude-Estimation-Free GNSS and IMU Integration

Authors:Taro Suzuki

Abstract: A global navigation satellite system (GNSS) is a sensor that can acquire 3D position and velocity in an earth-fixed coordinate system and is widely used for outdoor position estimation of robots and vehicles. Various GNSS/inertial measurement unit (IMU) integration methods have been proposed to improve the accuracy and availability of GNSS positioning. However, all of them require the addition of a 3D attitude to the estimated state in order to fuse the IMU data. This study proposes a new optimization-based positioning method for combining GNSS and IMU that does not require attitude estimation. The proposed method uses two types of constraints: one is a constraint between states using only the magnitude of the 3D acceleration observed by an accelerometer, and the other is a constraint on the angle between the velocity vectors using the amount of angular change by a gyroscope. The evaluation results with simulation data show that the proposed method maintains the position estimation accuracy even when the IMU mounting position error increases and improves the accuracy when the GNSS observations contain multipath errors or missing data. The proposed method could improve the positioning accuracy in experiments using IMUs acquired in real environments.

3.On Quantification for SOTIF Validation of Automated Driving Systems

Authors:Lina Putze, Lukas Westhofen, Tjark Koopmann, Eckard Böde, Christian Neurohr

Abstract: Automated driving systems are safety-critical cyber-physical systems whose safety of the intended functionality (SOTIF) can not be assumed without proper argumentation based on appropriate evidences. Recent advances in standards and regulations on the safety of driving automation are therefore intensely concerned with demonstrating that the intended functionality of these systems does not introduce unreasonable risks to stakeholders. In this work, we critically analyze the ISO 21448 standard which contains requirements and guidance on how the SOTIF can be provably validated. Emphasis lies on developing a consistent terminology as a basis for the subsequent definition of a validation strategy when using quantitative acceptance criteria. In the broad picture, we aim to achieve a well-defined risk decomposition that enables rigorous, quantitative validation approaches for the SOTIF of automated driving systems.

4.UAV-based Receding Horizon Control for 3D Inspection Planning

Authors:Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou

Abstract: Nowadays, unmanned aerial vehicles or UAVs are being used for a wide range of tasks, including infrastructure inspection, automated monitoring and coverage. This paper investigates the problem of 3D inspection planning with an autonomous UAV agent which is subject to dynamical and sensing constraints. We propose a receding horizon 3D inspection planning control approach for generating optimal trajectories which enable an autonomous UAV agent to inspect a finite number of feature-points scattered on the surface of a cuboid-like structure of interest. The inspection planning problem is formulated as a constrained open-loop optimal control problem and is solved using mixed integer programming (MIP) optimization. Quantitative and qualitative evaluation demonstrates the effectiveness of the proposed approach.

5.Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain

Authors:Philipp Rigoll, Patrick Petersen, Hanno Stage, Lennart Ries, Eric Sax

Abstract: Handling large amounts of data has become a key for developing automated driving systems. Especially for developing highly automated driving functions, working with images has become increasingly challenging due to the sheer size of the required data. Such data has to satisfy different requirements to be usable in machine learning-based approaches. Thus, engineers need to fully understand their large image data sets for the development and test of machine learning algorithms. However, current approaches lack automatability, are not generic and are limited in their expressiveness. Hence, this paper aims to analyze a state-of-the-art text and image embedding neural network and guides through the application in the automotive domain. This approach enables the search for similar images and the search based on a human understandable text-based description. Our experiments show the automatability and generalizability of our proposed method for handling large data sets in the automotive domain.

1.Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer

Authors:Donghyeon Kim, Glen Berseth, Mathew Schwartz, Jaeheung Park

Abstract: In this paper, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control gain tuning is required to achieve the best possible policy performance. We show that instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The paper showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot. The video is available at

2.Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Authors:Dongwon Son, Beomjoon Kim

Abstract: Our goal is to develop an efficient contact detection algorithm for large-scale GPU-based simulation of non-convex objects. Current GPU-based simulators such as IsaacGym and Brax must trade-off speed with fidelity, generality, or both when simulating non-convex objects. Their main issue lies in contact detection (CD): existing CD algorithms, such as Gilbert-Johnson-Keerthi (GJK), must trade off their computational speed with accuracy which becomes expensive as the number of collisions among non-convex objects increases. We propose a data-driven approach for CD, whose accuracy depends only on the quality and quantity of offline dataset rather than online computation time. Unlike GJK, our method inherently has a uniform computational flow, which facilitates efficient GPU usage based on advanced compilers such as XLA (Accelerated Linear Algebra). Further, we offer a data-efficient solution by learning the patterns of colliding local crop object shapes, rather than global object shapes which are harder to learn. We demonstrate our approach improves the efficiency of existing CD methods by a factor of 5-10 for non-convex objects with comparable accuracy. Using the previous work on contact resolution for a neural-network-based contact detector, we integrate our CD algorithm into the open-source GPU-based simulator, Brax, and show that we can improve the efficiency over IsaacGym and generality over standard Brax. We highly recommend the videos of our simulator included in the supplementary materials.

3.Decentralized Multi-Agent Planning for Multirotors:a Fully online and Communication Latency Robust Approach

Authors:Charbel Toumieh

Abstract: There are many industrial, commercial and social applications for multi-agent planning for multirotors such as autonomous agriculture, infrastructure inspection and search and rescue. Thus, improving on the state-of-the-art of multi-agent planning to make it a viable real-world solution is of great benefit. In this work, we propose a new method for multi-agent planning in a static environment that improves our previous work by making it fully online as well as robust to communication latency. The proposed framework generates a global path and a Safe Corridor to avoid static obstacles in an online fashion (generated offline in our previous work). It then generates a time-aware Safe Corridor which takes into account the future positions of other agents to avoid intra-agent collisions. The time-aware Safe Corridor is given with a local reference trajectory to an MIQP (Mixed-Integer Quadratic Problem)/MPC (Model Predictive Control) solver that outputs a safe and optimal trajectory. The planning frequency is adapted to account for communication delays. The proposed method is fully online, real-time, decentralized, and synchronous. It is compared to 3 recent state-of-the-art methods in simulations. It outperforms all methods in robustness and safety as well as flight time. It also outperforms the only other state-of-the-art latency robust method in computation time.

4.Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand

Authors:Yongkang Luo, Wanyi Li, Peng Wang, Haonan Duan, Wei Wei, Jia Sun

Abstract: Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.

5.Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control

Authors:Vishnu Rajendran S, Bappaditya Debnath, Bappaditya Debnath, Sariah Mghames, Willow Mandil, Soran Parsa, Simon Parsons, Amir Ghalamzan-E

Abstract: This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including perception, grasping, cutting, motion planning, and control. It also highlights the challenges in developing SHR technologies, particularly in the areas of robot design, motion planning and control. The paper also discusses the potential benefits of integrating AI and soft robots and data-driven methods to enhance the performance and robustness of SHR systems. Finally, the paper identifies several open research questions in the field and highlights the need for further research and development efforts to advance SHR technologies to meet the challenges of global food production. Overall, this paper provides a starting point for researchers and practitioners interested in developing SHRs and highlights the need for more research in this field.

6.Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

Authors:Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou

Abstract: In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.

7.CASOG: Conservative Actor-critic with SmOoth Gradient for Skill Learning in Robot-Assisted Intervention

Authors:Hao Li, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Zeng-Guang Hou

Abstract: Robot-assisted intervention has shown reduced radiation exposure to physicians and improved precision in clinical trials. However, existing vascular robotic systems follow master-slave control mode and entirely rely on manual commands. This paper proposes a novel offline reinforcement learning algorithm, Conservative Actor-critic with SmOoth Gradient (CASOG), to learn manipulation skills from human demonstrations on vascular robotic systems. The proposed algorithm conservatively estimates Q-function and smooths gradients of convolution layers to deal with distribution shift and overfitting issues. Furthermore, to focus on complex manipulations, transitions with larger temporal-difference error are sampled with higher probability. Comparative experiments in a pre-clinical environment demonstrate that CASOG can deliver guidewire to the target at a success rate of 94.00\% and mean backward steps of 14.07, performing closer to humans and better than prior offline reinforcement learning methods. These results indicate that the proposed algorithm is promising to improve the autonomy of vascular robotic systems.

8.Autonomous Agent for Beyond Visual Range Air Combat: A Deep Reinforcement Learning Approach

Authors:Joao P. A. Dantas, Marcos R. O. A. Maximo, Takashi Yoneyama

Abstract: This work contributes to developing an agent based on deep reinforcement learning capable of acting in a beyond visual range (BVR) air combat simulation environment. The paper presents an overview of building an agent representing a high-performance fighter aircraft that can learn and improve its role in BVR combat over time based on rewards calculated using operational metrics. Also, through self-play experiments, it expects to generate new air combat tactics never seen before. Finally, we hope to examine a real pilot's ability, using virtual simulation, to interact in the same environment with the trained agent and compare their performances. This research will contribute to the air combat training context by developing agents that can interact with real pilots to improve their performances in air defense missions.

9.Losing Focus: Can It Be Useful in Robotic Laser Surgery?

Authors:Nicholas Pacheco, Yash Garje, Aakash Rohra, Loris Fichera

Abstract: This paper proposes a method to regulate the tissue temperature during laser surgery by robotically controlling the laser focus. Laser-tissue interactions are generally considered hard to control due to the inherent inhomogeneity of biological tissue, which can create significant variability in its thermal response to laser irradiation. In this study, we use methods from nonlinear control theory to synthesize a temperature controller capable of working on virtually any tissue type without any prior knowledge of its physical properties. The performance of the controller is evaluated in ex-vivo experiments.

10.A Mollification Scheme for Task and Motion Planning

Authors:Jimmy Envall, Roi Poranne, Stelian Coros

Abstract: Task and motion planning is one of the key problems in robotics today. It is often formulated as a discrete task allocation problem combined with continuous motion planning. Many existing approaches to TAMP involve explicit descriptions of task primitives that cause discrete changes in the kinematic relationship between the actor and the objects. In this work we propose an alternative approach to TAMP which does not involve explicit enumeration of task primitives. Instead, the actions are represented implicitly as part of the solution to a nonlinear optimization problem. We focus on decision making for robotic manipulators, specifically for pick and place tasks, and show several possible extensions. We explore the efficacy of the model through a number of simulated experiments involving multiple robots, objects and interactions with the environment.

11.A Multi-robot Coverage Path Planning Algorithm Based on Improved DARP Algorithm

Authors:Yufan Huang, Man Li, Tao Zhao

Abstract: The research on multi-robot coverage path planning (CPP) has been attracting more and more attention. In order to achieve efficient coverage, this paper proposes an improved DARP coverage algorithm. The improved DARP algorithm based on A* algorithm is used to assign tasks to robots and then combined with STC algorithm based on Up-First algorithm to achieve full coverage of the task area. Compared with the initial DARP algorithm, this algorithm has higher efficiency and higher coverage rate.

12.FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

Authors:Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine

Abstract: We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.

13.Learning and Adapting Agile Locomotion Skills by Transferring Experience

Authors:Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

Abstract: Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.

14.Patching Neural Barrier Functions Using Hamilton-Jacobi Reachability

Authors:Sander Tonkens, Alex Toofanian, Zhizhen Qin, Sicun Gao, Sylvia Herbert

Abstract: Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal guarantees. In this paper, we propose a local dynamic programming (DP) based approach to "patch" an almost-safe learned barrier at potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains a novel barrier that provides formal safety guarantees, yet retains the global structure of the learned barrier. Our local DP based reachability algorithm, HJ-Patch, updates the barrier function "minimally" at points that both (a) neighbor the barrier safety boundary and (b) do not satisfy the safety condition. We view this as a key step to bridging the gap between learning-based barrier functions and Hamilton-Jacobi reachability analysis, providing a framework for further integration of these approaches. We demonstrate that for well-trained barriers we reduce the computational load by 2 orders of magnitude with respect to standard DP-based reachability, and demonstrate scalability to a 6-dimensional system, which is at the limit of standard DP-based reachability.

1.Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Authors:Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn

Abstract: Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See for videos and code.

2.Multi-robot Motion Planning based on Nets-within-Nets Modeling and Simulation

Authors:Sofia Hustiu, Eva Robillard, Joaquin Ezpeleta, Cristian Mahulea, Marius Kloetzer

Abstract: This paper focuses on designing motion plans for a heterogeneous team of robots that has to cooperate in fulfilling a global mission. The robots move in an environment containing some regions of interest, and the specification for the whole team can include avoidances, visits, or sequencing when entering these regions of interest. The specification is expressed in terms of a Petri net corresponding to an automaton, while each robot is also modeled by a state machine Petri net. With respect to existing solutions for related problems, the current work brings the following contributions. First, we propose a novel model, denoted {High-Level robot team Petri Net (HLPN) system, for incorporating the specification and the robot models into the Nets-within-Nets paradigm. A guard function, named Global Enabling Function (gef), is designed to synchronize the firing of transitions such that the robot motions do not violate the specification. Then, the solution is found by simulating the HPLN system in a specific software tool that accommodates Nets-within-Nets. An illustrative example based on a Linear Temporal Logic (LTL) mission is described throughout the paper, complementing the proposed rationale of the framework.

3.Neuromorphic Control using Input-Weighted Threshold Adaptation

Authors:Stein Stroobants, Christophe De Wagter, Guido C. H. E. de Croon

Abstract: Neuromorphic processing promises high energy efficiency and rapid response rates, making it an ideal candidate for achieving autonomous flight of resource-constrained robots. It will be especially beneficial for complex neural networks as are involved in high-level visual perception. However, fully neuromorphic solutions will also need to tackle low-level control tasks. Remarkably, it is currently still challenging to replicate even basic low-level controllers such as proportional-integral-derivative (PID) controllers. Specifically, it is difficult to incorporate the integral and derivative parts. To address this problem, we propose a neuromorphic controller that incorporates proportional, integral, and derivative pathways during learning. Our approach includes a novel input threshold adaptation mechanism for the integral pathway. This Input-Weighted Threshold Adaptation (IWTA) introduces an additional weight per synaptic connection, which is used to adapt the threshold of the post-synaptic neuron. We tackle the derivative term by employing neurons with different time constants. We first analyze the performance and limits of the proposed mechanisms and then put our controller to the test by implementing it on a microcontroller connected to the open-source tiny Crazyflie quadrotor, replacing the innermost rate controller. We demonstrate the stability of our bio-inspired algorithm with flights in the presence of disturbances. The current work represents a substantial step towards controlling highly dynamic systems with neuromorphic algorithms, thus advancing neuromorphic processing and robotics. In addition, integration is an important part of any temporal task, so the proposed Input-Weighted Threshold Adaptation (IWTA) mechanism may have implications well beyond control tasks.

4.Neuromorphic computing for attitude estimation onboard quadrotors

Authors:Stein Stroobants, Julien Dupeyroux, Guido C. H. E. de Croon

Abstract: Compelling evidence has been given for the high energy efficiency and update rates of neuromorphic processors, with performance beyond what standard Von Neumann architectures can achieve. Such promising features could be advantageous in critical embedded systems, especially in robotics. To date, the constraints inherent in robots (e.g., size and weight, battery autonomy, available sensors, computing resources, processing time, etc.), and particularly in aerial vehicles, severely hamper the performance of fully-autonomous on-board control, including sensor processing and state estimation. In this work, we propose a spiking neural network (SNN) capable of estimating the pitch and roll angles of a quadrotor in highly dynamic movements from 6-degree of freedom Inertial Measurement Unit (IMU) data. With only 150 neurons and a limited training dataset obtained using a quadrotor in a real world setup, the network shows competitive results as compared to state-of-the-art, non-neuromorphic attitude estimators. The proposed architecture was successfully tested on the Loihi neuromorphic processor on-board a quadrotor to estimate the attitude when flying. Our results show the robustness of neuromorphic attitude estimation and pave the way towards energy-efficient, fully autonomous control of quadrotors with dedicated neuromorphic computing systems.

5.Implicit representation priors meet Riemannian geometry for Bayesian robotic grasping

Authors:Norman Marlier, Julien Gustin, Gilles Louppe, Olivier Brüls

Abstract: Robotic grasping in highly noisy environments presents complex challenges, especially with limited prior knowledge about the scene. In particular, identifying good grasping poses with Bayesian inference becomes difficult due to two reasons: i) generating data from uninformative priors proves to be inefficient, and ii) the posterior often entails a complex distribution defined on a Riemannian manifold. In this study, we explore the use of implicit representations to construct scene-dependent priors, thereby enabling the application of efficient simulation-based Bayesian inference algorithms for determining successful grasp poses in unstructured environments. Results from both simulation and physical benchmarks showcase the high success rate and promising potential of this approach.

6.Modal-Graph 3D Shape Servoing of Deformable Objects with Raw Point Clouds

Authors:Bohan Yang, Congying Sui, Fangxun Zhong, Yun-Hui Liu

Abstract: Deformable object manipulation (DOM) with point clouds has great potential as non-rigid 3D shapes can be measured without detecting and tracking image features. However, robotic shape control of deformable objects with point clouds is challenging due to: the unknown point-wise correspondences and the noisy partial observability of raw point clouds; the modeling difficulties of the relationship between point clouds and robot motions. To tackle these challenges, this paper introduces a novel modal-graph framework for the model-free shape servoing of deformable objects with raw point clouds. Unlike the existing works studying the object's geometry structure, our method builds a low-frequency deformation structure for the DOM system, which is robust to the measurement irregularities. The built modal representation and graph structure enable us to directly extract low-dimensional deformation features from raw point clouds. Such extraction requires no extra point processing of registrations, refinements, and occlusion removal. Moreover, to shape the object using the extracted features, we design an adaptive robust controller which is proved to be input-to-state stable (ISS) without offline learning or identifying both the physical and geometric object models. Extensive simulations and experiments are conducted to validate the effectiveness of our method for linear, planar, tubular, and solid objects under different settings.

7.GoferBot: A Visual Guided Human-Robot Collaborative Assembly System

Authors:Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony

Abstract: The current transformation towards smart manufacturing has led to a growing demand for human-robot collaboration (HRC) in the manufacturing process. Perceiving and understanding the human co-worker's behaviour introduces challenges for collaborative robots to efficiently and effectively perform tasks in unstructured and dynamic environments. Integrating recent data-driven machine vision capabilities into HRC systems is a logical next step in addressing these challenges. However, in these cases, off-the-shelf components struggle due to generalisation limitations. Real-world evaluation is required in order to fully appreciate the maturity and robustness of these approaches. Furthermore, understanding the pure-vision aspects is a crucial first step before combining multiple modalities in order to understand the limitations. In this paper, we propose GoferBot, a novel vision-based semantic HRC system for a real-world assembly task. It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience. GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.

8.Robotic Gas Source Localization with Probabilistic Mapping and Online Dispersion Simulation

Authors:Pepe Ojeda, Javier Monroy, Javier Gonzalez-Jimenez

Abstract: Gas source localization (GSL) with an autonomous robot is a problem with many prospective applications, from finding pipe leaks to emergency-response scenarios. In this work we present a new method to perform GSL in realistic indoor environments, featuring obstacles and turbulent flow. Given the highly complex relationship between the source position and the measurements available to the robot (the single-point gas concentration, and the wind vector) we propose an observation model that derives from contrasting the online, real-time simulation of the gas dispersion from any candidate source localization against a gas concentration map built from sensor readings. To account for a convenient and grounded integration of both into a probabilistic estimation framework, we introduce the concept of probabilistic gas-hit maps, which provide a higher level of abstraction to model the time-dependent nature of gas dispersion. Results from both simulated and real experiments show the capabilities of our current proposal to deal with source localization in complex indoor environments. To the best of our knowledge, this is the first work in olfactory robotics that doesn't make simplistic assumptions about environmental conditions like operating in open spaces and/or having an unrealistic laminar flow wind.

9.Socially Assistive Robots as Decision Makers in the Wild: Insights from a Participatory Design Workshop

Authors:Eshtiak Ahmed, Laura Cosio, Juho Hamari, Oğuz 'Oz' Buruk

Abstract: Socially Assistive Robots (SARs) are becoming very popular every day because of their effectiveness in handling social situations. However, social robots are perceived as intelligent, and thus their decision-making process might have a significant effect on how they are perceived and how effective they are. In this paper, we present the findings from a participatory design study consisting of 5 design workshops with 30 participants, focusing on several decision-making scenarios of SARs in the wild. Through the findings of the PD study, we have discussed 5 directions that could aid the design of decision-making systems of SARs in the wild.

10.Autonomous Systems: Autonomous Systems: Indoor Drone Navigation

Authors:Aswin Iyer, Santosh Narayan, Naren M, Manoj kumar Rajagopal

Abstract: Drones are a promising technology for autonomous data collection and indoor sensing. In situations when human-controlled UAVs may not be practical or dependable, such as in uncharted or dangerous locations, the usage of autonomous UAVs offers flexibility, cost savings, and reduced risk. The system creates a simulated quadcopter capable of autonomously travelling in an indoor environment using the gazebo simulation tool and the ros navigation system framework known as Navigaation2. While Nav2 has successfully shown the functioning of autonomous navigation in terrestrial robots and vehicles, the same hasn't been accomplished with unmanned aerial vehicles and still has to be done. The goal is to use the slam toolbox for ROS and the Nav2 navigation system framework to construct a simulated drone that can move autonomously in an indoor (gps-less) environment.

11.Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments

Authors:Mario A. V. Saucedo, Akash Patel, Rucha Sawlekar, Akshit Saradagi, Christoforos Kanellakis, Ali-Akbar Agha-Mohammadi, George Nikolakopoulos

Abstract: In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.

12.A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Authors:Hongxiang Yu, Anzhe Chen, Kechun Xu, Zhongxiang Zhou, Wei Jing, Yue Wang, Rong Xiong

Abstract: Recently, several works achieve end-to-end visual servoing (VS) for robotic manipulation by replacing traditional controller with differentiable neural networks, but lose the ability to servo arbitrary desired poses. This letter proposes a differentiable architecture for arbitrary pose servoing: a hyper-network based neural controller (HPN-NC). To achieve this, HPN-NC consists of a hyper net and a low-level controller, where the hyper net learns to generate the parameters of the low-level controller and the controller uses the 2D keypoints error for control like traditional image-based visual servoing (IBVS). HPN-NC can complete 6 degree of freedom visual servoing with large initial offset. Taking advantage of the fully differentiable nature of HPN-NC, we provide a three-stage training procedure to servo real world objects. With self-supervised end-to-end training, the performance of the integrated model can be further improved in unseen scenes and the amount of manual annotations can be significantly reduced.

13.Autonomous Navigation in Rows of Trees and High Crops with Deep Semantic Segmentation

Authors:Alessandro Navone, Mauro Martini, Andrea Ostuni, Simone Angarano, Marcello Chiaberge

Abstract: Segmentation-based autonomous navigation has recently been proposed as a promising methodology to guide robotic platforms through crop rows without requiring precise GPS localization. However, existing methods are limited to scenarios where the centre of the row can be identified thanks to the sharp distinction between the plants and the sky. However, GPS signal obstruction mainly occurs in the case of tall, dense vegetation, such as high tree rows and orchards. In this work, we extend the segmentation-based robotic guidance to those scenarios where canopies and branches occlude the sky and hinder the usage of GPS and previous methods, increasing the overall robustness and adaptability of the control algorithm. Extensive experimentation on several realistic simulated tree fields and vineyards demonstrates the competitive advantages of the proposed solution.

14.Method for Comparison of Surrogate Safety Measures in Multi-Vehicle Scenarios

Authors:Enrico Del Re, Cristina Olaverri-Monreal

Abstract: With the race towards higher levels of automation in vehicles, it is imperative to guarantee the safety of all involved traffic participants. Yet, while high-risk traffic situations between two vehicles are well understood, traffic situations involving more vehicles lack the tools to be properly analyzed. This paper proposes a method to compare Surrogate Safety Measures values in highway multi-vehicle traffic situations such as lane-changes that involve three vehicles. This method allows for a comprehensive statistical analysis and highlights how the safety distance between vehicles is shifted in favor of the traffic conflict between the leading vehicle and the lane-changing vehicle.

15.Continuous-Time Range-Only Pose Estimation

Authors:Abhishek Goudar, Timothy D. Barfoot, Angela P. Schoellig

Abstract: Range-only (RO) localization involves determining the position of a mobile robot by measuring the distance to specific anchors. RO localization is challenging since the measurements are low-dimensional and a single range sensor does not have enough information to estimate the full pose of the robot. As such, range sensors are typically coupled with other sensing modalities such as wheel encoders or inertial measurement units (IMUs) to estimate the full pose. In this work, we propose a continuous-time Gaussian process (GP)- based trajectory estimation method to estimate the full pose of a robot using only range measurements from multiple range sensors. Results from simulation and real experiments show that our proposed method, using off-the-shelf range sensors, is able to achieve comparable performance and in some cases outperform alternative state-of-the-art sensor-fusion methods that use additional sensing modalities.

16.Designing the mobile robot Kevin for a life science laboratory

Authors:Sarah Kleine-Wechelmann, Kim Bastiaanse, Matthias Freundel, Christian Becker-Asano

Abstract: Laboratories are being increasingly automated. In small laboratories individual processes can be fully automated, but this is usually not economically viable. Nevertheless, individual process steps can be performed by flexible, mobile robots to relieve the laboratory staff. As a contribution to the requirements in a life science laboratory the mobile, dextrous robot Kevin was designed by the Fraunhofer IPA research institute in Stuttgart, Germany. Kevin is a mobile service robot which is able to fulfill non-value adding activities such as transportation of labware. This paper gives an overview of Kevin's functionalities, its development process, and presents a preliminary study on how its lights and sounds improve user interaction.

17.Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

Authors:Zhenshan Bing, Aleksandr Mavrichev, Sicong Shen, Xiangtong Yao, Kejia Chen, Kai Huang, Alois Knoll

Abstract: Deep reinforcement learning (RL) has been endowed with high expectations in tackling challenging manipulation tasks in an autonomous and self-directed fashion. Despite the significant strides made in the development of reinforcement learning, the practical deployment of this paradigm is hindered by at least two barriers, namely, the engineering of a reward function and ensuring the safety guaranty of learning-based controllers. In this paper, we address these challenging limitations by proposing a framework that merges a reinforcement learning \lstinline[columns=fixed]{planner} that is trained using sparse rewards with a model predictive controller (MPC) \lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. On the other hand, the MPC \lstinline[columns=fixed]{actor} takes the suggested intermediate goals from the RL \lstinline[columns=fixed]{planner} as the input and predicts how the robot's action will enable it to reach that goal while avoiding any obstacles over a short period of time. We evaluated our method on four challenging manipulation tasks with dynamic obstacles and the results demonstrate that, by leveraging the complementary strengths of these two components, the agent can solve manipulation tasks in complex, dynamic environments safely with a $100\%$ success rate. Videos are available at \url{}.

18.Using simulation to design an MPC policy for field navigation using GPS sensing

Authors:Harry Zhang, Stefan Caldararu, Ishaan Mahajan, Shouvik Chatterjee, Thomas Hansen, Abhiraj Dashora, Sriram Ashokkumar, Luning Fang, Xiangru Xu, Shen He, Dan Negrut

Abstract: Modeling a robust control system with a precise GPS-based state estimation capability in simulation can be useful in field navigation applications as it allows for testing and validation in a controlled environment. This testing process would enable navigation systems to be developed and optimized in simulation with direct transferability to real-world scenarios. The multi-physics simulation engine Chrono allows for the creation of scenarios that may be difficult or dangerous to replicate in the field, such as extreme weather or terrain conditions. Autonomy Research Testbed (ART), a specialized robotics algorithm testbed, is operated in conjunction with Chrono to develop an MPC control policy as well as an EKF state estimator. This platform enables users to easily integrate custom algorithms in the autonomy stack. This model is initially developed and used in simulation and then tested on a twin vehicle model in reality, to demonstrate the transferability between simulation and reality (also known as Sim2Real).

1.Enabling safe walking rehabilitation on the exoskeleton Atalante: experimental results

Authors:Maxime Brunet CAS, Marine Pétriaux CAS, Florent Di Meglio CAS, Nicolas Petit CAS

Abstract: This paper exposes a control architecture enabling rehabilitation of walking impaired patients with the lower-limb exoskeleton Atalante. Atalante's control system is modified to allow the patient to contribute to the walking motion through their efforts. Only the swing leg degree of freedom along the nominal path is relaxed. An online trajectory optimization checks that the muscle forces do not jeopardize stability. The optimization generates reference trajectories that satisfy several key constraints from the current point to the end of the step. One of the constraints requires that the center or pressure remains inside the support polygon, which ensures that the support leg subsystem successfully tracks the reference trajectory. As a result of the presented works, the robot provides a non-zero force in the direction of motion only when required, helping the patient go fast enough to maintain balance (or preventing him from going too fast). Experimental results are reported. They illustrate that variations of $\pm$50% of the duration of the step can be achieved in response to the patient's efforts and that many steps are achieved without falling. A video of the experiments can be viewed at

2.2D Forward Looking Sonar Simulation with Ground Echo Modeling

Authors:Yusheng Wang, Chujie Wu, Yonghoon Ji, Hiroshi Tsuchiya, Hajime Asama, Atsushi Yamashita

Abstract: Imaging sonar produces clear images in underwater environments, independent of water turbidity and lighting conditions. The next generation 2D forward looking sonars are compact in size and able to generate high-resolution images which facilitate underwater robotics research. Considering the difficulties and expenses of implementing experiments in underwater environments, tremendous work has been focused on sonar image simulation. However, sonar artifacts like multi-path reflection were not sufficiently discussed, which cannot be ignored in water tank environments. In this paper, we focus on the influence of echoes from the flat ground. We propose a method to simulate the ground echo effect physically in acoustic images. We model the multi-bounce situations using the single-bounce framework for computation efficiency. We compare the real image captured in the water tank with the synthetic images to validate the proposed methods.

3.The Impact of Frame-Dropping on Performance and Energy Consumption for Multi-Object Tracking

Authors:Matti Henning, Michael Buchholz, Klaus Dietmayer

Abstract: The safety of automated vehicles (AVs) relies on the representation of their environment. Consequently, state-of-the-art AVs employ potent sensor systems to achieve the best possible environment representation at all times. Although these high-performing systems achieve impressive results, they induce significant requirements for the processing capabilities of an AV's computational hardware components and their energy consumption. To enable a dynamic adaptation of such perception systems based on the situational perception requirements, we introduce a model-agnostic method for the scalable employment of single-frame object detection models using frame-dropping in tracking-by-detection systems. We evaluate our approach on the KITTI 3D Tracking Benchmark, showing that significant energy savings can be achieved at acceptable performance degradation, reaching up to 28% reduction of energy consumption at a performance decline of 6.6% in HOTA score.

4.Underwater Autonomous Tank Cleaning Rover

Authors:Aditya Sundarajan, Jaideepnath Anand, Kevin Timothy Muller, Mangal Das

Abstract: In order to keep aquatic ecosystems safe and healthy, it is imperative that cleaning be done frequently. This research suggests the use of autonomous underwater rovers for effective underwater cleaning as a novel approach to this issue. The enhanced sensing and navigational capabilities of the autonomous rovers enable them to independently navigate underwater environments and find and remove underwater garbage and uneaten fish feed which can be recycled. The suggested solution not only does away with the requirement for human divers, but also provides a more effective and affordable technique for underwater cleaning. The paper also examines the creation, testing, and potential of the autonomous underwater rovers.

5.Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

Authors:Shrudhi R S, Sreyash Mohanty, Dr. Susan Elias

Abstract: An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.

6.ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding

Authors:Dustin Aganian, Benedict Stephan, Markus Eisenbach, Corinna Stretz, Horst-Michael Gross

Abstract: With the emergence of collaborative robots (cobots), human-robot collaboration in industrial manufacturing is coming into focus. For a cobot to act autonomously and as an assistant, it must understand human actions during assembly. To effectively train models for this task, a dataset containing suitable assembly actions in a realistic setting is crucial. For this purpose, we present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras, which represent potential viewpoints of a cobot. Since in an assembly context workers tend to perform different actions simultaneously with their two hands, we annotated the performed actions for each hand separately. Therefore, in the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets, typically featuring more simplistic assembly tasks. For better generalization with respect to the background of the working area, we did not only record color and depth images, but also used the Azure Kinect body tracking SDK for estimating 3D skeletons of the worker. To create a first baseline, we report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs. The dataset is available at .

7.Base Placement Optimization for Coverage Mobile Manipulation Tasks

Authors:Huiwen Zhang, Kai Mi, Zhijun Zhang

Abstract: Base placement optimization (BPO) is a fundamental capability for mobile manipulation and has been researched for decades. However, it is still very challenging for some reasons. First, compared with humans, current robots are extremely inflexible, and therefore have higher requirements on the accuracy of base placements (BPs). Second, the BP and task constraints are coupled with each other. The optimal BP depends on the task constraints, and in BP will affect task constraints in turn. More tricky is that some task constraints are flexible and non-deterministic. Third, except for fulfilling tasks, some other performance metrics such as optimal energy consumption and minimal execution time need to be considered, which makes the BPO problem even more complicated. In this paper, a Scale-like disc (SLD) representation of the workspace is used to decouple task constraints and BPs. To evaluate reachability and return optimal working pose over SLDs, a reachability map (RM) is constructed offline. In order to optimize the objectives of coverage, manipulability, and time cost simultaneously, this paper formulates the BPO as a multi-objective optimization problem (MOOP). Among them, the time optimal objective is modeled as a traveling salesman problem (TSP), which is more in line with the actual situation. The evolutionary method is used to solve the MOOP. Besides, to ensure the validity and optimality of the solution, collision detection is performed on the candidate BPs, and solutions from BPO are further fine-tuned according to the specific given task. Finally, the proposed method is used to solve a real-world toilet coverage cleaning task. Experiments show that the optimized BPs can significantly improve the coverage and efficiency of the task.

8.PaaS: Planning as a Service for reactive driving in CARLA Leaderboard

Authors:Truong Nhat Hao, Mai Huu Thien, Tran Tuan Anh, Tran Minh Quang, Nguyen Duc Duy, Pham Ngoc Viet Phuong

Abstract: End-to-end deep learning approaches has been proven to be efficient in autonomous driving and robotics. By using deep learning techniques for decision-making, those systems are often referred to as a black box, and the result is driven by data. In this paper, we propose PaaS (Planning as a Service), a vanilla module to generate local trajectory planning for autonomous driving in CARLA simulation. Our method is submitted in International CARLA Autonomous Driving Leaderboard (CADL), which is a platform to evaluate the driving proficiency of autonomous agents in realistic traffic scenarios. Our approach focuses on reactive planning in Frenet frame under complex urban street's constraints and driver's comfort. The planner generates a collection of feasible trajectories, leveraging heuristic cost functions with controllable driving style factor to choose the optimal-control path that satisfies safe travelling criteria. PaaS can provide sufficient solutions to handle well under challenging traffic situations in CADL. As the strict evaluation in CADL Map Track, our approach ranked 3rd out of 9 submissions regarding the measure of driving score. However, with the focus on minimizing the risk of maneuver and ensuring passenger safety, our figures corresponding to infraction penalty dominate the two leading submissions for 20%.

9.Integration of Reinforcement Learning Based Behavior Planning With Sampling Based Motion Planning for Automated Driving

Authors:Marvin Klimke, Benjamin Völz, Michael Buchholz

Abstract: Reinforcement learning has received high research interest for developing planning approaches in automated driving. Most prior works consider the end-to-end planning task that yields direct control commands and rarely deploy their algorithm to real vehicles. In this work, we propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning. By populating an abstract objective interface, established motion planning algorithms can be leveraged, which derive smooth and drivable trajectories. Given the current environment model, we propose to use a built-in simulator to predict the traffic scene for a given horizon into the future. The behavior of automated vehicles in mixed traffic is determined by querying the learned policy. To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner, and as such lacks a state-of-the-art benchmark. Thus, we validate the proposed approach by comparing an idealistic single-shot plan with cyclic replanning through the learned policy. Experiments with a real testing vehicle on proving grounds demonstrate the potential of our approach to shrink the simulation to real world gap of deep reinforcement learning based planning approaches. Additional simulative analyses reveal that more complex multi-agent maneuvers can be managed by employing the cycling replanning approach.

10.Robust human position estimation in cooperative robotic cells

Authors:António Amorim, Diana Guimarães, Tiago Mendonça, Pedro Neto, Paulo Costa, António Paulo Moreira

Abstract: Robots are increasingly present in our lives, sharing the workspace and tasks with human co-workers. However, existing interfaces for human-robot interaction / cooperation (HRI/C) have limited levels of intuitiveness to use and safety is a major concern when humans and robots share the same workspace. Many times, this is due to the lack of a reliable estimation of the human pose in space which is the primary input to calculate the human-robot minimum distance (required for safety and collision avoidance) and HRI/C featuring machine learning algorithms classifying human behaviours / gestures. Each sensor type has its own characteristics resulting in problems such as occlusions (vision) and drift (inertial) when used in an isolated fashion. In this paper, it is proposed a combined system that merges the human tracking provided by a 3D vision sensor with the pose estimation provided by a set of inertial measurement units (IMUs) placed in human body limbs. The IMUs compensate the gaps in occluded areas to have tracking continuity. To mitigate the lingering effects of the IMU offset we propose a continuous online calculation of the offset value. Experimental tests were designed to simulate human motion in a human-robot collaborative environment where the robot moves away to avoid unexpected collisions with de human. Results indicate that our approach is able to capture the human\textsc's position, for example the forearm, with a precision in the millimetre range and robustness to occlusions.

11.Topology, dynamics, and control of an octopus-analog muscular hydrostat

Authors:Arman Tekinalp, Noel Naughton, Seung-Hyun Kim, Udit Halder, Rhanor Gillette, Prashant G. Mehta, William Kier, Mattia Gazzola

Abstract: Muscular hydrostats, such as octopus arms or elephant trunks, lack bones entirely, endowing them with exceptional dexterity and reconfigurability. Key to their unmatched ability to control nearly infinite degrees of freedom is the architecture into which muscle fibers are weaved. Their arrangement is, effectively, the instantiation of a sophisticated mechanical program that mediates, and likely facilitates, the control and realization of complex, dynamic morphological reconfigurations. Here, by combining medical imaging, biomechanical data, live behavioral experiments and numerical simulations, we synthesize a model octopus arm entailing ~200 continuous muscles groups, and begin to unravel its complexity. We show how 3D arm motions can be understood in terms of storage, transport, and conversion of topological quantities, effected by simple muscle activation templates. These, in turn, can be composed into higher-level control strategies that, compounded by the arm's compliance, are demonstrated in a range of object manipulation tasks rendered additionally challenging by the need to appropriately align suckers, to sense and grasp. Overall, our work exposes broad design and algorithmic principles pertinent to muscular hydrostats, robotics, and dynamics, while significantly advancing our ability to model muscular structures from medical imaging, with potential implications for human health and care.

12.Applications of Uncalibrated Image Based Visual Servoing in Micro- and Macroscale Robotics

Authors:Yifan Yin, Yutai Wang, Yunpu Zhang, Russell H. Taylor, Balazs P. Vagvolgyi

Abstract: We present a robust markerless image based visual servoing method that enables precision robot control without hand-eye and camera calibrations in 1, 3, and 5 degrees-of-freedom. The system uses two cameras for observing the workspace and a combination of classical image processing algorithms and deep learning based methods to detect features on camera images. The only restriction on the placement of the two cameras is that relevant image features must be visible in both views. The system enables precise robot-tool to workspace interactions even when the physical setup is disturbed, for example if cameras are moved or the workspace shifts during manipulation. The usefulness of the visual servoing method is demonstrated and evaluated in two applications: in the calibration of a micro-robotic system that dissects mosquitoes for the automated production of a malaria vaccine, and a macro-scale manipulation system for fastening screws using a UR10 robot. Evaluation results indicate that our image based visual servoing method achieves human-like manipulation accuracy in challenging setups even without camera calibration.

13.Affordances from Human Videos as a Versatile Representation for Robotics

Authors:Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak

Abstract: Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show how to seamlessly integrate our affordance model with four robot learning paradigms including offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild. Results, visualizations and videos at

1.An NMPC-ECBF Framework for Dynamic Motion Planning and Execution in vision-based Human-Robot Collaboration

Authors:Dianhao Zhang, Mien Van, Pantelis Sopasakis, Seán McLoone

Abstract: To enable safe and effective human-robot collaboration (HRC) in smart manufacturing, seamless integration of sensing, cognition, and prediction into the robot controller is critical for real-time awareness, response, and communication inside a heterogeneous environment (robots, humans, and equipment). The proposed approach takes advantage of the prediction capabilities of nonlinear model predictive control (NMPC) to execute a safe path planning based on feedback from a vision system. In order to satisfy the requirement of real-time path planning, an embedded solver based on a penalty method is applied. However, due to tight sampling times NMPC solutions are approximate, and hence the safety of the system cannot be guaranteed. To address this we formulate a novel safety-critical paradigm with an exponential control barrier function (ECBF) used as a safety filter. We also design a simple human-robot collaboration scenario using V-REP to evaluate the performance of the proposed controller and investigate whether integrating human pose prediction can help with safe and efficient collaboration. The robot uses OptiTrack cameras for perception and dynamically generates collision-free trajectories to the predicted target interactive position. Results for a number of different configurations confirm the efficiency of the proposed motion planning and execution framework. It yields a 19.8% reduction in execution time for the HRC task considered.

2.Study on Soft Robotic Pinniped Locomotion

Authors:Dimuthu D. K. Arachchige, Tanmay Varshney, Umer Huzaifa, Iyad Kanj, Thrishantha Nanayakkara, Yue Chen, Hunter B. Gilbert, Isuru S. Godage

Abstract: Legged locomotion is a highly promising but under-researched subfield within the field of soft robotics. The compliant limbs of soft-limbed robots offer numerous benefits, including the ability to regulate impacts, tolerate falls, and navigate through tight spaces. These robots have the potential to be used for various applications, such as search and rescue, inspection, surveillance, and more. The state-of-the-art still faces many challenges, including limited degrees of freedom, a lack of diversity in gait trajectories, insufficient limb dexterity, and limited payload capabilities. To address these challenges, we develop a modular soft-limbed robot that can mimic the locomotion of pinnipeds. By using a modular design approach, we aim to create a robot that has improved degrees of freedom, gait trajectory diversity, limb dexterity, and payload capabilities. We derive a complete floating-base kinematic model of the proposed robot and use it to generate and experimentally validate a variety of locomotion gaits. Results show that the proposed robot is capable of replicating these gaits effectively. We compare the locomotion trajectories under different gait parameters against our modeling results to demonstrate the validity of our proposed gait models.

3.Collaborative Ground-Aerial Multi-Robot System for Disaster Response Missions with a Low-Cost Drone Add-On for Off-the-Shelf Drones

Authors:Shalutha Rajapakshe, Dilanka Wickramasinghe, Sahan Gurusinghe, Deepana Ishtaweera, Bhanuka Silva, Peshala Jayasekara, Nick Panitz, Paul Flick, Navinda Kottege

Abstract: In disaster-stricken environments, it's vital to assess the damage quickly, analyse the stability of the environment, and allocate resources to the most vulnerable areas where victims might be present. These missions are difficult and dangerous to be conducted directly by humans. Using the complementary capabilities of both the ground and aerial robots, we investigate a collaborative approach of aerial and ground robots to address this problem. With an increased field of view, faster speed, and compact size, the aerial robot explores the area and creates a 3D feature-based map graph of the environment while providing a live video stream to the ground control station. Once the aerial robot finishes the exploration run, the ground control station processes the map and sends it to the ground robot. The ground robot, with its higher operation time, static stability, payload delivery and tele-conference capabilities, can then autonomously navigate to identified high-vulnerability locations. We have conducted experiments using a quadcopter and a hexapod robot in an indoor modelled environment with obstacles and uneven ground. Additionally, we have developed a low-cost drone add-on with value-added capabilities, such as victim detection, that can be attached to an off-the-shelf drone. The system was assessed for cost-effectiveness, energy efficiency, and scalability.

4.Near Field iToF LIDAR Depth Improvement from Limited Number of Shots

Authors:Mena Nagiub, Thorsten Beuth, Ganesh Sistu, Heinrich Gotzig, Ciar án Eising

Abstract: Indirect Time of Flight LiDARs can indirectly calculate the scene's depth from the phase shift angle between transmitted and received laser signals with amplitudes modulated at a predefined frequency. Unfortunately, this method generates ambiguity in calculated depth when the phase shift angle value exceeds $2\pi$. Current state-of-the-art methods use raw samples generated using two distinct modulation frequencies to overcome this ambiguity problem. However, this comes at the cost of increasing laser components' stress and raising their temperature, which reduces their lifetime and increases power consumption. In our work, we study two different methods to recover the entire depth range of the LiDAR using fewer raw data sample shots from a single modulation frequency with the support of sensor's gray scale output to reduce the laser components' stress and power consumption.

5.FM-Loc: Using Foundation Models for Improved Vision-based Localization

Authors:Reihaneh Mirjalili, Michael Krawez, Wolfram Burgard

Abstract: Visual place recognition is essential for vision-based robot localization and SLAM. Despite the tremendous progress made in recent years, place recognition in changing environments remains challenging. A promising approach to cope with appearance variations is to leverage high-level semantic features like objects or place categories. In this paper, we propose FM-Loc which is a novel image-based localization approach based on Foundation Models that uses the Large Language Model GPT-3 in combination with the Visual-Language Model CLIP to construct a semantic image descriptor that is robust to severe changes in scene geometry and camera viewpoint. We deploy CLIP to detect objects in an image, GPT-3 to suggest potential room labels based on the detected objects, and CLIP again to propose the most likely location label. The object labels and the scene label constitute an image descriptor that we use to calculate a similarity score between the query and database images. We validate our approach on real-world data that exhibit significant changes in camera viewpoints and object placement between the database and query trajectories. The experimental results demonstrate that our method is applicable to a wide range of indoor scenarios without the need for training or fine-tuning.

6.A Framework for Fast Prototyping of Photo-realistic Environments with Multiple Pedestrians

Authors:Sara Casao, Andrés Otero, Álvaro Serra-Gómez, Ana C. Murillo, Javier Alonso-Mora, Eduardo Montijano

Abstract: Robotic applications involving people often require advanced perception systems to better understand complex real-world scenarios. To address this challenge, photo-realistic and physics simulators are gaining popularity as a means of generating accurate data labeling and designing scenarios for evaluating generalization capabilities, e.g., lighting changes, camera movements or different weather conditions. We develop a photo-realistic framework built on Unreal Engine and AirSim to generate easily scenarios with pedestrians and mobile robots. The framework is capable to generate random and customized trajectories for each person and provides up to 50 ready-to-use people models along with an API for their metadata retrieval. We demonstrate the usefulness of the proposed framework with a use case of multi-target tracking, a popular problem in real pedestrian scenarios. The notable feature variability in the obtained perception data is presented and evaluated.

7.Plant-inspired behavior-based controller to enable reaching in redundant continuum robot arms

Authors:Enrico Donato, Yasmin Tauqeer Ansari, Cecilia Laschi, Egidio Falotico

Abstract: Enabling reaching capabilities in highly redundant continuum robot arms is an active area of research. Existing solutions comprise of task-space controllers, whose proper functioning is still limited to laboratory environments. In contrast, this work proposes a novel plant-inspired behaviour-based controller that exploits information obtained from proximity sensing embedded near the end-effector to move towards a desired spatial target. The controller is tested on a 9-DoF modular cable-driven continuum arm for reaching multiple setpoints in space. The results are promising for the deployability of these systems into unstructured environments.

8.EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Authors:Ziyun Wang, Fernando Cladera Ojeda, Anthony Bisulco, Daewon Lee, Camillo J. Taylor, Kostas Daniilidis, M. Ani Hsieh, Daniel D. Lee, Volkan Isler

Abstract: Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.

9.Learning Perceptive Bipedal Locomotion over Irregular Terrain

Authors:Bart van Marum, Matthia Sabatelli, Hamidreza Kasaei

Abstract: In this paper we propose a novel bipedal locomotion controller that uses noisy exteroception to traverse a wide variety of terrains. Building on the cutting-edge advancements in attention based belief encoding for quadrupedal locomotion, our work extends these methods to the bipedal domain, resulting in a robust and reliable internal belief of the terrain ahead despite noisy sensor inputs. Additionally, we present a reward function that allows the controller to successfully traverse irregular terrain. We compare our method with a proprioceptive baseline and show that our method is able to traverse a wide variety of terrains and greatly outperforms the state-of-the-art in terms of robustness, speed and efficiency.

10.An Open Source Design Optimization Toolbox Evaluated on a Soft Finger

Authors:Stefan Escaida Navarro, Tanguy Navez, Olivier Goury, Luis Molina, Christian Duriez

Abstract: In this paper, we introduce a novel open source toolbox for design optimization in Soft Robotics. We consider that design optimization is an important trend in Soft Robotics that is changing the way in which designs will be shared and adopted. We evaluate this toolbox on the example of a cable-driven, sensorized soft finger. For devices like these, that feature both actuation and sensing, the need for multi-objective optimization capabilities naturally arises, because at the very least, a trade-off between these two aspects has to be found. Thus, multi-objective optimization capability is one of the central features of the proposed toolbox. We evaluate the optimization of the soft finger and show that extreme points of the optimization trade-off between sensing and actuation are indeed far apart on actually fabricated devices for the established metrics. Furthermore, we provide an in depth analysis of the sim-to-real behavior of the example, taking into account factors such as the mesh density in the simulation, mechanical parameters and fabrication tolerances.

1.Survey on LiDAR Perception in Adverse Weather Conditions

Authors:Mariella Dreissig, Dominik Scheuble, Florian Piewak, Joschka Boedecker

Abstract: Autonomous vehicles rely on a variety of sensors to gather information about their surrounding. The vehicle's behavior is planned based on the environment perception, making its reliability crucial for safety reasons. The active LiDAR sensor is able to create an accurate 3D representation of a scene, making it a valuable addition for environment perception for autonomous vehicles. Due to light scattering and occlusion, the LiDAR's performance change under adverse weather conditions like fog, snow or rain. This limitation recently fostered a large body of research on approaches to alleviate the decrease in perception performance. In this survey, we gathered, analyzed, and discussed different aspects on dealing with adverse weather conditions in LiDAR-based environment perception. We address topics such as the availability of appropriate data, raw point cloud processing and denoising, robust perception algorithms and sensor fusion to mitigate adverse weather induced shortcomings. We furthermore identify the most pressing gaps in the current literature and pinpoint promising research directions.

2.Continual Learning of Hand Gestures for Human-Robot Interaction

Authors:Xavier Cucurull, Anaís Garrell

Abstract: In this paper, we present an efficient method to incrementally learn to classify static hand gestures. This method allows users to teach a robot to recognize new symbols in an incremental manner. Contrary to other works which use special sensors or external devices such as color or data gloves, our proposed approach makes use of a single RGB camera to perform static hand gesture recognition from 2D images. Furthermore, our system is able to incrementally learn up to 38 new symbols using only 5 samples for each old class, achieving a final average accuracy of over 90\%. In addition to that, the incremental training time can be reduced to a 10\% of the time required when using all data available.

3.Contact Models in Robotics: a Comparative Analysis

Authors:Quentin Le Lidec, Wilson Jallet, Louis Montaut, Ivan Laptev, Cordelia Schmid, Justin Carpentier

Abstract: Physics simulation is ubiquitous in robotics. Whether in model-based approaches (e.g., trajectory optimization), or model-free algorithms (e.g., reinforcement learning), physics simulators are a central component of modern control pipelines in robotics. Over the past decades, several robotic simulators have been developed, each with dedicated contact modeling assumptions and algorithmic solutions. In this article, we survey the main contact models and the associated numerical methods commonly used in robotics for simulating advanced robot motions involving contact interactions. In particular, we recall the physical laws underlying contacts and friction (i.e., Signorini condition, Coulomb's law, and the maximum dissipation principle), and how they are transcribed in current simulators. For each physics engine, we expose their inherent physical relaxations along with their limitations due to the numerical techniques employed. Based on our study, we propose theoretically grounded quantitative criteria on which we build benchmarks assessing both the physical and computational aspects of simulation. We support our work with an open-source and efficient C++ implementation of the existing algorithmic variations. Our results demonstrate that some approximations or algorithms commonly used in robotics can severely widen the reality gap and impact target applications. We hope this work will help motivate the development of new contact models, contact solvers, and robotic simulators in general, at the root of recent progress in motion generation in robotics.

4.Anthropomorphic finger for grasping applications: 3D printed endoskeleton in a soft skin

Authors:Mahmoud Tavakoli, Andriy Sayuk, João Lourenço, Pedro Neto

Abstract: Application of soft and compliant joints in grasping mechanisms received an increasing attention during recent years. This article suggests the design and development of a novel bio-inspired compliant finger which is composed of a 3D printed rigid endoskeleton covered by a soft matter. The overall integrated system resembles a biological structure in which a finger presents an anthropomorphic look. The mechanical properties of such structure are enhanced through optimization of the repetitive geometrical structures that constructs a flexure bearing as a joint for the fingers. The endoskeleton is formed by additive manufacturing of such geometries with rigid materials. The geometry of the endoskeleton was studied by finite element analysis (FEA) to obtain the desired properties: high stiffness against lateral deflection and twisting, and low stiffness in the desired bending axis of the fingers. Results are validated by experimental analysis.

5.Communications-Aware Robotics: Challenges and Opportunities

Authors:Daniel Bonilla Licea, Giuseppe Silano, Mounir Ghogho, Martin Saska

Abstract: The use of Unmanned Ground Vehicles (UGVs) and Unmanned Aerial Vehicles (UAVs) has seen significant growth in the research community, industry, and society. Many of these agents are equipped with communication systems that are essential for completing certain tasks successfully. This has led to the emergence of a new interdisciplinary field at the intersection of robotics and communications, which has been further driven by the integration of UAVs into 5G and 6G communication networks. However, one of the main challenges in this research area is how many researchers tend to oversimplify either the robotics or the communications aspects, hindering the full potential of this new interdisciplinary field. In this paper, we present some of the necessary modeling tools for addressing these problems from both a robotics and communications perspective, using the UAV communications relay as an example.

1.Vehicle Trajectory Prediction based Predictive Collision Risk Assessment for Autonomous Driving in Highway Scenarios

Authors:Dejian Meng, Wei Xiao, Lijun Zhang, Zhuang Zhang, Zihao Liu

Abstract: For driving safely and efficiently in highway scenarios, autonomous vehicles (AVs) must be able to predict future behaviors of surrounding object vehicles (OVs), and assess collision risk accurately for reasonable decision-making. Aiming at autonomous driving in highway scenarios, a predictive collision risk assessment method based on trajectory prediction of OVs is proposed in this paper. Firstly, the vehicle trajectory prediction is formulated as a sequence generation task with long short-term memory (LSTM) encoder-decoder framework. Convolutional social pooling (CSP) and graph attention network (GAN) are adopted for extracting local spatial vehicle interactions and distant spatial vehicle interactions, respectively. Then, two basic risk metrics, time-to-collision (TTC) and minimal distance margin (MDM), are calculated between the predicted trajectory of OV and the candidate trajectory of AV. Consequently, a time-continuous risk function is constructed with temporal and spatial risk metrics. Finally, the vehicle trajectory prediction model CSP-GAN-LSTM is evaluated on two public highway datasets. The quantitative results indicate that the proposed CSP-GAN-LSTM model outperforms the existing state-of-the-art (SOTA) methods in terms of position prediction accuracy. Besides, simulation results in typical highway scenarios further validate the feasibility and effectiveness of the proposed predictive collision risk assessment method.

2.Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives

Authors:Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, Homayoun Najjaran

Abstract: Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.

3.RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields

Authors:Xiao Han, Houxuan Liu, Yunchao Ding, Lu Yang

Abstract: Accurate perception of objects in the environment is important for improving the scene understanding capability of SLAM systems. In robotic and augmented reality applications, object maps with semantic and metric information show attractive advantages. In this paper, we present RO-MAP, a novel multi-object mapping pipeline that does not rely on 3D priors. Given only monocular input, we use neural radiance fields to represent objects and couple them with a lightweight object SLAM based on multi-view geometry, to simultaneously localize objects and implicitly learn their dense geometry. We create separate implicit models for each detected object and train them dynamically and in parallel as new observations are added. Experiments on synthetic and real-world datasets demonstrate that our method can generate semantic object map with shape reconstruction, and be competitive with offline methods while achieving real-time performance (25Hz). The code and dataset will be available at:

4.Force Map: Learning to Predict Contact Force Distribution from Vision

Authors:Ryo Hanai, Yukiyasu Domae, Ixchel G. Ramirez-Alpizar, Bruno Leme, Tetsuya Ogata

Abstract: When humans see a scene, they can roughly imagine the forces applied to objects based on their experience and use them to handle the objects properly. This paper considers transferring this "force-visualization" ability to robots. We hypothesize that a rough force distribution (named "force map") can be utilized for object manipulation strategies even if accurate force estimation is impossible. Based on this hypothesis, we propose a training method to predict the force map from vision. To investigate this hypothesis, we generated scenes where objects were stacked in bulk through simulation and trained a model to predict the contact force from a single image. We further applied domain randomization to make the trained model function on real images. The experimental results showed that the model trained using only synthetic images could predict approximate patterns representing the contact areas of the objects even for real images. Then, we designed a simple algorithm to plan a lifting direction using the predicted force distribution. We confirmed that using the predicted force distribution contributes to finding natural lifting directions for typical real-world scenes. Furthermore, the evaluation through simulations showed that the disturbance caused to surrounding objects was reduced by 26 % (translation displacement) and by 39 % (angular displacement) for scenes where objects were overlapping.

5.A Palm-Shape Variable-Stiffness Gripper based on 3D-Printed Fabric Jamming

Authors:Yuchen Zhao, Yifan Wang

Abstract: Soft grippers have excellent adaptability for a variety of objects and tasks. Jamming-based variable stiffness materials can further increase soft grippers' gripping force and capacity. Previous universal grippers enabled by granular jamming have shown great capability of handling objects with various shapes and weight. However, they require a large pushing force on the object during gripping, which is not suitable for very soft or free-hanging objects. In this paper, we create a novel palm-shape anthropomorphic variable-stiffness gripper enabled by jamming of 3D printed fabrics. This gripper is conformable and gentle to objects with different shapes, requires little pushing force, and increases gripping strength only when necessary. We present the design, fabrication and performance of this gripper and tested its conformability and gripping capacity. Our design utilizes soft pneumatic actuators to drive two wide palms to enclose objects, thanks to the excellent conformability of the structured fabrics. While the pinch force is low, the palm can significantly increase stiffness to lift heavy objects with a maximum gripping force of $17\,$N and grip-to-pinch force ratio of $42$. We also explore different variable-stiffness materials in the gripper, including sheets for layer jamming, to compare their performances. We conduct gripping tests on standard objects and daily items to show the great capacity of our gripper design.

6.Measuring a Soft Resistive Strain Sensor Array by Solving the Resistor Network Inverse Problem

Authors:Yuchen Zhao, Choo Kean Khaw, Yifan Wang

Abstract: Soft robotics is applicable to a variety of domains due to the adaptability offered by the soft and compliant materials. To develop future intelligent soft robots, soft sensors that can capture deformation with nearly infinite degree-of-freedom are necessary. Soft sensor networks can address this problem, however, measuring all sensor values throughout the body requires excessive wiring and complex fabrication that may hinder robot performance. We circumvent these challenges by developing a non-invasive measurement technique, which is based on an algorithm that solves the inverse problem of resistor network, and implement this algorithm on a soft resistive, strain sensor network. Our algorithm works by iteratively computing the resistor values based on the applied boundary voltage and current responses, and we analyze the reconstruction error of the algorithm as a function of network size and measurement error. We further develop electronics setup to implement our algorithm on a stretchable resistive strain sensor network made of soft conductive silicone, and show the response of the measured network to different deformation modes. Our work opens a new path to address the challenge of measuring many sensor values in soft sensors, and could be applied to soft robotic sensor systems.

7.NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning

Authors:Weizheng Wang, Ruiqi Wang, Le Mao, Byung-Cheol Min

Abstract: Developing robotic technologies for use in human society requires ensuring the safety of robots' navigation behaviors while adhering to pedestrians' expectations and social norms. However, maintaining real-time communication between robots and pedestrians to avoid collisions can be challenging. To address these challenges, we propose a novel socially-aware navigation benchmark called NaviSTAR, which utilizes a hybrid Spatio-Temporal grAph tRansformer (STAR) to understand interactions in human-rich environments fusing potential crowd multi-modal information. We leverage off-policy reinforcement learning algorithm with preference learning to train a policy and a reward function network with supervisor guidance. Additionally, we design a social score function to evaluate the overall performance of social navigation. To compare, we train and test our algorithm and other state-of-the-art methods in both simulator and real-world scenarios independently. Our results show that NaviSTAR outperforms previous methods with outstanding performance\footnote{The source code and experiment videos of this work are available at:

1.Scalable Real-Time Vehicle Deformation for Interactive Environments

Authors:Ben Kenwright

Abstract: This paper proposes a real-time physically-based method for simulating vehicle deformation. Our system synthesizes vehicle deformation characteristics by considering a low-dimensional coupled vehicle body technique. We simulate the motion and crumbling behavior of vehicles smashing into rigid objects. We explain and demonstrate the combination of a reduced complexity non-linear finite element system that is scalable and computationally efficient. We use an explicit position-based integration scheme to improve simulation speeds, while remaining stable and preserving modeling accuracy. We show our approach using a variety of vehicle deformation test cases which were simulated in real-time.

2.Real-Time Character Rise Motions

Authors:Ben Kenwright

Abstract: This paper presents an uncomplicated dynamic controller for generating physically-plausible three-dimensional full-body biped character rise motions on-the-fly at run-time. Our low-dimensional controller uses fundamental reference information (e.g., center-of-mass, hands, and feet locations) to produce balanced biped get-up poses by means of a real-time physically-based simulation. The key idea is to use a simple approximate model (i.e., similar to the inverted-pendulum stepping model) to create continuous reference trajectories that can be seamlessly tracked by an articulated biped character to create balanced rise-motions. Our approach does not use any key-framed data or any computationally expensive processing (e.g., offline-optimization or search algorithms). We demonstrate the effectiveness and ease of our technique through example (i.e., a biped character picking itself up from different laying positions).

3.Simulation Analysis of Exploration Strategies and UAV Planning for Search and Rescue

Authors:Phuoc Nguyen Thuan, Jorge Peña Queralta, Tomi Westerlund

Abstract: Aerial scans with unmanned aerial vehicles (UAVs) are becoming more widely adopted across industries, from smart farming to urban mapping. An application area that can leverage the strength of such systems is search and rescue (SAR) operations. However, with a vast variability in strategies and topology of application scenarios, as well as the difficulties in setting up real-world UAV-aided SAR operations for testing, designing an optimal flight pattern to search for and detect all victims can be a challenging problem. Specifically, the deployed UAV should be able to scan the area in the shortest amount of time while maintaining high victim detection recall rates. Therefore, low probability of false negatives (i.e., high recall) is more important than precision in this case. To address the issues mentioned above, we have developed a simulation environment that emulates different SAR scenarios and allows experimentation with flight missions to provide insight into their efficiency. The solution was developed with the open-source ROS framework and Gazebo simulator, with PX4 as the autopilot system for flight control, and YOLO as the object detector.

4.Evaluation of Differentially Constrained Motion Models for Graph-Based Trajectory Prediction

Authors:Theodor Westny, Joel Oskarsson, Björn Olofsson, Erik Frisk

Abstract: Given their adaptability and encouraging performance, deep-learning models are becoming standard for motion prediction in autonomous driving. However, with great flexibility comes a lack of interpretability and possible violations of physical constraints. Accompanying these data-driven methods with differentially-constrained motion models to provide physically feasible trajectories is a promising future direction. The foundation for this work is a previously introduced graph-neural-network-based model, MTP-GO. The neural network learns to compute the inputs to an underlying motion model to provide physically feasible trajectories. This research investigates the performance of various motion models in combination with numerical solvers for the prediction task. The study shows that simpler models, such as low-order integrator models, are preferred over more complex ones, e.g., kinematic models, to achieve accurate predictions. Further, the numerical solver can have a substantial impact on performance, advising against commonly used first-order methods like Euler forward. Instead, a second-order method like Heun's can significantly improve predictions.

5.Dexterous In-Hand Manipulation of Slender Cylindrical Objects through Deep Reinforcement Learning with Tactile Sensing

Authors:Wenbin Hu, Bidan Huang, Wang Wei Lee, Sicheng Yang, Yu Zheng, Zhibin Li

Abstract: Continuous in-hand manipulation is an important physical interaction skill, where tactile sensing provides indispensable contact information to enable dexterous manipulation of small objects. This work proposed a framework for end-to-end policy learning with tactile feedback and sim-to-real transfer, which achieved fine in-hand manipulation that controls the pose of a thin cylindrical object, such as a long stick, to track various continuous trajectories through multiple contacts of three fingertips of a dexterous robot hand with tactile sensor arrays. We estimated the central contact position between the stick and each fingertip from the high-dimensional tactile information and showed that the learned policies achieved effective manipulation performance with the processed tactile feedback. The policies were trained with deep reinforcement learning in simulation and successfully transferred to real-world experiments, using coordinated model calibration and domain randomization. We evaluated the effectiveness of tactile information via comparative studies and validated the sim-to-real performance through real-world experiments.

6.Simultaneous localization and mapping by using Low-Cost Ultrasonic Sensor for Underwater crawler

Authors:Trish Velan Dcruz, Cicero Estibeiro, Anil Shankar, Mangal Das

Abstract: Autonomous robots can help people explore parts of the ocean that would be hard or impossible to get to otherwise. The increase in the availability of low-cost components has made it possible to innovate, design, and implement new and innovative ideas for underwater robotics. Cost-effective and open solutions that are available today can be used to replace expensive robot systems. The prototype of an autonomous robot system that functions in brackish waterways in settings such as fish hatcheries is presented in this research. The system has low-cost ultrasonic sensors that use a SLAM algorithm to map and move through the environment. When compared to previous studies that used Lidar sensors, this system's configuration was chosen to keep costs down. A comparison is shown between ultrasonic and lidar sensors, showing their respective pros and cons.

7.TrajFlow: Learning the Distribution over Trajectories

Authors:Anna Mészáros, Javier Alonso-Mora, Jens Kober

Abstract: Predicting the future behaviour of people remains an open challenge for the development of risk-aware autonomous vehicles. An important aspect of this challenge is effectively capturing the uncertainty which is inherent to human behaviour. This paper studies an approach for probabilistic motion forecasting with improved accuracy in the predicted sample likelihoods. We are able to learn multi-modal distributions over the motions of an agent solely from data, while also being able to provide predictions in real-time. Our approach achieves state-of-the-art results on the inD dataset when evaluated with the standard metrics employed for motion forecasting. Furthermore, our approach also achieves state-of-the-art results when evaluated with respect to the likelihoods it assigns to its generated trajectories. Evaluations on artificial datasets indicate that the distributions learned by our model closely correspond to the true distributions observed in data and are not as prone towards being over-confident in a single outcome in the face of uncertainty.

8.Feed-forward Disturbance Compensation for Station Keeping in Wave-dominated Environments

Authors:Kyle L. Walker, Adam A. Stokes, Aristides Kiprakis, Francesco Giorgio-Serchi

Abstract: When deploying robots in shallow ocean waters, wave disturbances can be significant, highly dynamic and pose problems when operating near structures; this is a key limitation of current control strategies, restricting the range of conditions in which subsea vehicles can be deployed. To improve dynamic control and offer a higher level of robustness, this work proposes a Cascaded Proportional-Derivative (C-PD) with Feed-forward (FF) control scheme for disturbance mitigation, exploring the concept of explicitly using disturbance estimations to counteract state perturbations. Results demonstrate that the proposed controller is capable of higher performance in contrast to a standard C-PD controller, with an average reduction of ~48% witnessed across various sea states. Additional analysis also investigated performance when considering coarse estimations featuring inaccuracies; average improvements of ~17% demonstrate the effectiveness of the proposed strategy to handle these uncertainties. The proposal in this work shows promise for improved control without a drastic increase in required computing power; if coupled with sufficient sensors, state estimation techniques and prediction algorithms, utilising feed-forward compensating control actions offers a potential solution to improve vehicle control under wave-induced disturbances.

9.Diagnosing and Augmenting Feature Representations in Correctional Inverse Reinforcement Learning

Authors:Inês Lourenço, Andreea Bobu, Cristian R. Rojas, Bo Wahlberg

Abstract: Robots have been increasingly better at doing tasks for humans by learning from their feedback, but still often suffer from model misalignment due to missing or incorrectly learned features. When the features the robot needs to learn to perform its task are missing or do not generalize well to new settings, the robot will not be able to learn the task the human wants and, even worse, may learn a completely different and undesired behavior. Prior work shows how the robot can detect when its representation is missing some feature and can, thus, ask the human to be taught about the new feature; however, these works do not differentiate between features that are completely missing and those that exist but do not generalize to new environments. In the latter case, the robot would detect misalignment and simply learn a new feature, leading to an arbitrarily growing feature representation that can, in turn, lead to spurious correlations and incorrect learning down the line. In this work, we propose separating the two sources of misalignment: we propose a framework for determining whether a feature the robot needs is incorrectly learned and does not generalize to new environment setups vs. is entirely missing from the robot's representation. Once we detect the source of error, we show how the human can initiate the realignment process for the model: if the feature is missing, we follow prior work for learning new features; however, if the feature exists but does not generalize, we use data augmentation to expand its training and, thus, complete the correction. We demonstrate the proposed approach in experiments with a simulated 7DoF robot manipulator and physical human corrections.

10.TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

Authors:Alexey I. Boyko, Anastasiia Kornilova, Rahim Tariverdizadeh, Mirfarid Musavian, Larisa Markeeva, Ivan Oseledets, Gonzalo Ferrer

Abstract: This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have proposed an algorithm, the TT-SDF2PC, that is capable of directly registering a PC to the compressed SDF by making use of efficient calculations of its derivatives in the TT domain, saving computations and memory. We compare TT-SDF2PC with SOTA local and global registration methods in a synthetic dataset and a real dataset and show on par performance while requiring significantly less resources.

1.PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM

Authors:Yuyang Tu, Junnan Jiang, Shuang Li, Norman Hendrich, Miao Li, Jianwei Zhang

Abstract: Accurate estimation of the relative pose between an object and a robot hand is critical for many manipulation tasks. However, most of the existing object-in-hand pose datasets use two-finger grippers and also assume that the object remains fixed in the hand without any relative movements, which is not representative of real-world scenarios. To address this issue, a 6D object-in-hand pose dataset is proposed using a teleoperation method with an anthropomorphic Shadow Dexterous hand. Our dataset comprises RGB-D images, proprioception and tactile data, covering diverse grasping poses, finger contact states, and object occlusions. To overcome the significant hand occlusion and limited tactile sensor contact in real-world scenarios, we propose PoseFusion, a hybrid multi-modal fusion approach that integrates the information from visual and tactile perception channels. PoseFusion generates three candidate object poses from three estimators (tactile only, visual only, and visuo-tactile fusion), which are then filtered by a SelectLSTM network to select the optimal pose, avoiding inferior fusion poses resulting from modality collapse. Extensive experiments demonstrate the robustness and advantages of our framework. All data and codes are available on the project website:

2.Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Authors:Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Hao Dong, Chi Jin

Abstract: Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Even in simulation with no sample constraints, scripting controllers is intractable due to high degrees of freedom, and manual reward engineering can also be hard and lead to non-realistic motions. Leveraging the recent progress on Reinforcement Learning from Human Feedback (RLHF), we propose a framework to learn a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. One task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

3.NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping

Authors:Xuan Yu, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

Abstract: LiDAR Mapping has been a long-standing problem in robotics. Recent progress in neural implicit representation has brought new opportunities to robotic mapping. In this paper, we propose the multi-volume neural feature fields, called NF-Atlas, which bridge the neural feature volumes with pose graph optimization. By regarding the neural feature volume as pose graph nodes and the relative pose between volumes as pose graph edges, the entire neural feature field becomes both locally rigid and globally elastic. Locally, the neural feature volume employs a sparse feature Octree and a small MLP to encode the submap SDF with an option of semantics. Learning the map using this structure allows for end-to-end solving of maximum a posteriori (MAP) based probabilistic mapping. Globally, the map is built volume by volume independently, avoiding catastrophic forgetting when mapping incrementally. Furthermore, when a loop closure occurs, with the elastic pose graph based representation, only updating the origin of neural volumes is required without remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to the sparsity and the optimization based formulation, NF-Atlas shows competitive performance in terms of accuracy, efficiency and memory usage on both simulation and real-world datasets.