arXiv daily: Robotics

arXiv daily: Robotics (cs.RO)

1.Asynchronous Spatial Allocation Protocol for Trajectory Planning of Heterogeneous Multi-Agent Systems

Authors:Yuda Chen, Haoze Dong, Zhongkui Li

Abstract: To plan the trajectories of a large and heterogeneous swarm, sequential or synchronous distributed methods usually become intractable, due to the lack of global connectivity and clock synchronization, Moreover, the existing asynchronously distributed schemes usually require recheck-like mechanisms instead of inherently considering the other' moving tendency. To this end, we propose a novel asynchronous protocol to allocate the agents' derivable space in a distributed way, by which each agent can replan trajectory depending on its own timetable. Properties such as collision avoidance and recursive feasibility are theoretically shown and a lower bound of protocol updating is provided. Comprehensive simulations and comparisons with five state-of-the-art methods validate the effectiveness of our method and illustrate the improvement in both the completion time and the moving distance. Finally, hardware experiments are carried out, where 8 heterogeneous unmanned ground vehicles with onboard computation navigate in cluttered scenarios at a high agility.

2.A Delay Compensation Framework Based on Eye-Movement for Teleoperated Ground Vehicles

Authors:Qiang Zhang, Lingfang Yang, Zhi Huang, Xiaolin Song

Abstract: An eye-movement-based predicted trajectory guidance control (ePTGC) is proposed to mitigate the maneuverability degradation of a teleoperated ground vehicle caused by communication delays. Human sensitivity to delays is the main reason for the performance degradation of a ground vehicle teleoperation system. The proposed framework extracts human intention from eye-movement. Then, it combines it with contextual constraints to generate an intention-compliant guidance trajectory, which is then employed to control the vehicle directly. The advantage of this approach is that the teleoperator is removed from the direct control loop by using the generated trajectories to guide vehicle, thus reducing the adverse sensitivity to delay. The delay can be compensated as long as the prediction horizon exceeds the delay. A human-in-loop simulation platform is designed to evaluate the teleoperation performance of the proposed method at different delay levels. The results are analyzed by repeated measures ANOVA, which shows that the proposed method significantly improves maneuverability and cognitive burden at large delay levels (>200 ms). The overall performance is also much better than the PTGC which does not employ the eye-movement feature.

3.Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects

Authors:Chuanruo Ning, Ruihai Wu, Haoran Lu, Kaichun Mo, Hao Dong

Abstract: Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. However, extant approaches often necessitate costly and inefficient test-time interactions with each unseen instance. Recognizing this limitation, we observe that despite their distinct shapes, different categories often share similar local geometries essential for manipulation, such as pullable handles and graspable edges - a factor typically underutilized in previous few-shot learning works. To harness this commonality, we introduce 'Where2Explore', an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances. Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects. Extensive experiments in simulated and real-world environments demonstrate our framework's capacity for efficient few-shot exploration and generalization.

4.Self-Supervised Prediction of the Intention to Interact with a Service Robot

Authors:Gabriele Abbate, Alessandro Giusti, Viktor Schmuck, Oya Celiktutan, Antonio Paolillo

Abstract: A service robot can provide a smoother interaction experience if it has the ability to proactively detect whether a nearby user intends to interact, in order to adapt its behavior e.g. by explicitly showing that it is available to provide a service. In this work, we propose a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins; the approach is self-supervised because after each encounter with a human, the robot can automatically label it depending on whether it resulted in an interaction or not. We explore different classification approaches, using different sets of features considering the pose and the motion of the user. We validate and deploy the approach in three scenarios. The first collects $3442$ natural sequences (both interacting and non-interacting) representing employees in an office break area: a real-world, challenging setting, where we consider a coffee machine in place of a service robot. The other two scenarios represent researchers interacting with service robots ($200$ and $72$ sequences, respectively). Results show that, even in challenging real-world settings, our approach can learn without external supervision, and can achieve accurate classification (i.e. AUROC greater than $0.9$) of the user's intention to interact with an advance of more than $3$s before the interaction actually occurs.

5.Comparison of DDS, MQTT, and Zenoh in Edge-to-Edge/Cloud Communication with ROS 2

Authors:Jiaqiang Zhang, Xianjia Yu, Sier Ha, Jorge Pena Queralta, Tomi Westerlund

Abstract: With the development of IoT and edge computing, there is a need for efficient and reliable middleware to handle the communication among Edge devices or between Edge and Cloud. Meanwhile, ROS\,2 is more commonly used in robotic systems, but there is no comparison study of middleware using ROS Messages. In this study, we compared the middlewares that are commonly used in ROS\,2 systems, including DDS, Zenoh, and MQTT. In order to evaluate the performance of the middleware in Edge-to-Edge and Edge-to-Cloud scenarios, we conducted the experiments in a multi-host environment and compared the latency and throughput of the middlewares with different types and sizes of ROS Messages in three network setups including Ethernet, Wi-Fi, and 4G. Additionally, we implemented different middlewares on a real robot platform, TurtleBot 4, and sent commands from a host to the robot to run a square-shaped trajectory. With the Optitrack Motion Capture system, we recorded the robot trajectories and analyzed the drift error. The results showed that CycloneDDS performs better under Ethernet, and Zenoh performs better under Wifi and 4G. In the actual robot test, Zenoh's trajectory drift error was the smallest.

6.Connected Autonomous Vehicle Motion Planning with Video Predictions from Smart, Self-Supervised Infrastructure

Authors:Jiankai Sun, Shreyas Kousik, David Fridovich-Keil, Mac Schwager

Abstract: Connected autonomous vehicles (CAVs) promise to enhance safety, efficiency, and sustainability in urban transportation. However, this is contingent upon a CAV correctly predicting the motion of surrounding agents and planning its own motion safely. Doing so is challenging in complex urban environments due to frequent occlusions and interactions among many agents. One solution is to leverage smart infrastructure to augment a CAV's situational awareness; the present work leverages a recently proposed "Self-Supervised Traffic Advisor" (SSTA) framework of smart sensors that teach themselves to generate and broadcast useful video predictions of road users. In this work, SSTA predictions are modified to predict future occupancy instead of raw video, which reduces the data footprint of broadcast predictions. The resulting predictions are used within a planning framework, demonstrating that this design can effectively aid CAV motion planning. A variety of numerical experiments study the key factors that make SSTA outputs useful for practical CAV planning in crowded urban environments.

7.Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

Authors:Kai Cheng, Ruihai Wu, Yan Shen, Chuanruo Ning, Guanqi Zhan, Hao Dong

Abstract: Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints.

8.Naturalistic Robot Arm Trajectory Generation via Representation Learning

Authors:Jayjun Lee, Adam J. Spiers

Abstract: The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.

9.Dubins Curve Based Continuous-Curvature Trajectory Planning for Autonomous Mobile Robots

Authors:Xuanhao Huang, Chao-Bo Yan

Abstract: AMR is widely used in factories to replace manual labor to reduce costs and improve efficiency. However, it is often difficult for logistics robots to plan the optimal trajectory and unreasonable trajectory planning can lead to low transport efficiency and high energy consumption. In this paper, we propose a method to directly calculate the optimal trajectory for short distance on the basis of the Dubins set, which completes the calculation of the Dubins path. Additionally, as an improvement of Dubins path, we smooth the Dubins path based on clothoid, which makes the curvature varies linearly. AMR can adjust the steering wheels while following this trajectory. The experiments show that the Dubins path can be calculated quickly and well smoothed.

10.Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation

Authors:Piotr Kicki, Michał Bidziński, Krzysztof Walas

Abstract: The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and challenging task that is important in many practical applications. Classical model-based approaches to this problem require an accurate model to capture how robot motions affect the deformation of the DLO. Nowadays, data-driven models offer the best tradeoff between quality and computation time. This paper analyzes several learning-based 3D models of the DLO and proposes a new one based on the Transformer architecture that achieves superior accuracy, even on the DLOs of different lengths, thanks to the proposed scaling method. Moreover, we introduce a data augmentation technique, which improves the prediction performance of almost all considered DLO data-driven models. Thanks to this technique, even a simple Multilayer Perceptron (MLP) achieves close to state-of-the-art performance while being significantly faster to evaluate. In the experiments, we compare the performance of the learning-based 3D models of the DLO on several challenging datasets quantitatively and demonstrate their applicability in the task of shaping a DLO.

11.Neural Field Representations of Articulated Objects for Robotic Manipulation Planning

Authors:Phillip Grote, Joaquim Ortiz-Haro, Marc Toussaint, Ozgur S. Oguz

Abstract: Traditional approaches for manipulation planning rely on an explicit geometric model of the environment to formulate a given task as an optimization problem. However, inferring an accurate model from raw sensor input is a hard problem in itself, in particular for articulated objects (e.g., closets, drawers). In this paper, we propose a Neural Field Representation (NFR) of articulated objects that enables manipulation planning directly from images. Specifically, after taking a few pictures of a new articulated object, we can forward simulate its possible movements, and, therefore, use this neural model directly for planning with trajectory optimization. Additionally, this representation can be used for shape reconstruction, semantic segmentation and image rendering, which provides a strong supervision signal during training and generalization. We show that our model, which was trained only on synthetic images, is able to extract a meaningful representation for unseen objects of the same class, both in simulation and with real images. Furthermore, we demonstrate that the representation enables robotic manipulation of an articulated object in the real world directly from images.

12.Evolutionary-Based Online Motion Planning Framework for Quadruped Robot Jumping

Authors:Linzhu Yue, Zhitao Song, Hongbo Zhang, Xuanqi Zeng, Lingwei Zhang, Yun-Hui Liu

Abstract: Offline evolutionary-based methodologies have supplied a successful motion planning framework for the quadrupedal jump. However, the time-consuming computation caused by massive population evolution in offline evolutionary-based jumping framework significantly limits the popularity in the quadrupedal field. This paper presents a time-friendly online motion planning framework based on meta-heuristic Differential evolution (DE), Latin hypercube sampling, and Configuration space (DLC). The DLC framework establishes a multidimensional optimization problem leveraging centroidal dynamics to determine the ideal trajectory of the center of mass (CoM) and ground reaction forces (GRFs). The configuration space is introduced to the evolutionary optimization in order to condense the searching region. Latin hypercube sampling offers more uniform initial populations of DE under limited sampling points, accelerating away from a local minimum. This research also constructs a collection of pre-motion trajectories as a warm start when the objective state is in the neighborhood of the pre-motion state to drastically reduce the solving time. The proposed methodology is successfully validated via real robot experiments for online jumping trajectory optimization with different jumping motions (e.g., ordinary jumping, flipping, and spinning).

13.Towards Safer Robot-Assisted Surgery: A Markerless Augmented Reality Framework

Authors:Ziyang Chen, Laura Cruciani, Ke Fan, Matteo Fontana, Elena Lievore, Ottavio De Cobelli, Gennaro Musi, Giancarlo Ferrigno, Elena De Momi

Abstract: Robot-assisted surgery is rapidly developing in the medical field, and the integration of augmented reality shows the potential of improving the surgeons' operation performance by providing more visual information. In this paper, we proposed a markerless augmented reality framework to enhance safety by avoiding intra-operative bleeding which is a high risk caused by the collision between the surgical instruments and the blood vessel. Advanced stereo reconstruction and segmentation networks are compared to find out the best combination to reconstruct the intra-operative blood vessel in the 3D space for the registration of the pre-operative model, and the minimum distance detection between the instruments and the blood vessel is implemented. A robot-assisted lymphadenectomy is simulated on the da Vinci Research Kit in a dry lab, and ten human subjects performed this operation to explore the usability of the proposed framework. The result shows that the augmented reality framework can help the users to avoid the dangerous collision between the instruments and the blood vessel while not introducing an extra load. It provides a flexible framework that integrates augmented reality into the medical robot platform to enhance safety during the operation.

14.Aerial Manipulator Force Control Using Control Barrier Functions

Authors:Dimitris Chaikalis, Vinicius Goncalves, Anthony Tzes, Farshad Khorrami

Abstract: This article studies the problem of applying normal forces on a surface, using an underactuated aerial vehicle equipped with a dexterous robotic arm. A force-motion high-level controller is designed based on a Lyapunov function encompassing alignment and exerted force errors. This controller is coupled with a Control Barrier Function constraint under an optimization scheme using Quadratic Programming. This aims to enforce a prescribed relationship between the approaching motion for the end-effector and its alignment with the surface, thus ensuring safe operation. An adaptive low-level controller is devised for the aerial vehicle, capable of tracking velocity commands generated by the high-level controller. Simulations are presented to demonstrate the force exertion stability and safety of the controller in cases of large disturbances.

15.Shared Telemanipulation with VR controllers in an anti slosh scenario

Authors:Max Grobbel, Balint Varga, Sören Hohmann

Abstract: Telemanipulation has become a promising technology that combines human intelligence with robotic capabilities to perform tasks remotely. However, it faces several challenges such as insufficient transparency, low immersion, and limited feedback to the human operator. Moreover, the high cost of haptic interfaces is a major limitation for the application of telemanipulation in various fields, including elder care, where our research is focused. To address these challenges, this paper proposes the usage of nonlinear model predictive control for telemanipulation using low-cost virtual reality controllers, including multiple control goals in the objective function. The framework utilizes models for human input prediction and taskrelated models of the robot and the environment. The proposed framework is validated on an UR5e robot arm in the scenario of handling liquid without spilling. Further extensions of the framework such as pouring assistance and collision avoidance can easily be included.

16.Heuristic Satisficing Inferential Decision Making in Human and Robot Active Perception

Authors:Yucheng Chen, Pingping Zhu, Anthony Alers, Tobias Egner, Marc A. Sommer, Silvia Ferrari

Abstract: Inferential decision-making algorithms typically assume that an underlying probabilistic model of decision alternatives and outcomes may be learned a priori or online. Furthermore, when applied to robots in real-world settings they often perform unsatisfactorily or fail to accomplish the necessary tasks because this assumption is violated and/or they experience unanticipated external pressures and constraints. Cognitive studies presented in this and other papers show that humans cope with complex and unknown settings by modulating between near-optimal and satisficing solutions, including heuristics, by leveraging information value of available environmental cues that are possibly redundant. Using the benchmark inferential decision problem known as ``treasure hunt", this paper develops a general approach for investigating and modeling active perception solutions under pressure. By simulating treasure hunt problems in virtual worlds, our approach learns generalizable strategies from high performers that, when applied to robots, allow them to modulate between optimal and heuristic solutions on the basis of external pressures and probabilistic models, if and when available. The result is a suite of active perception algorithms for camera-equipped robots that outperform treasure-hunt solutions obtained via cell decomposition, information roadmap, and information potential algorithms, in both high-fidelity numerical simulations and physical experiments. The effectiveness of the new active perception strategies is demonstrated under a broad range of unanticipated conditions that cause existing algorithms to fail to complete the search for treasures, such as unmodelled time constraints, resource constraints, and adverse weather (fog).

17.GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

Authors:Zhe Ni, Xiao-Xin Deng, Cong Tai, Xin-Yue Zhu, Xiang Wu, Yong-Jin Liu, Long Zeng

Abstract: Recent works have shown that Large Language Models (LLMs) can promote grounding instructions to robotic task planning. Despite the progress, most existing works focused on utilizing raw images to help LLMs understand environmental information, which not only limits the observation scope but also typically requires massive multimodal data collection and large-scale models. In this paper, we propose a novel approach called Graph-based Robotic Instruction Decomposer (GRID), leverages scene graph instead of image to perceive global scene information and continuously plans subtask in each stage for a given instruction. Our method encodes object attributes and relationships in graphs through an LLM and Graph Attention Networks, integrating instruction features to predict subtasks consisting of pre-defined robot actions and target objects in the scene graph. This strategy enables robots to acquire semantic knowledge widely observed in the environment from the scene graph. To train and evaluate GRID, we build a dataset construction pipeline to generate synthetic datasets in graph-based robotic task planning. Experiments have shown that our method outperforms GPT-4 by over 25.4% in subtask accuracy and 43.6% in task accuracy. Experiments conducted on datasets of unseen scenes and scenes with different numbers of objects showed that the task accuracy of GRID declined by at most 3.8%, which demonstrates its good cross-scene generalization ability. We validate our method in both physical simulation and the real world.

18.Imitation Learning-based Visual Servoing for Tracking Moving Objects

Authors:Rocco Felici, Matteo Saveriano, Loris Roveda, Antonio Paolillo

Abstract: In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of visual servoing and imitation learning allows us to pursue the objective of realizing friendly robotic interfaces that (i) are able to adapt to the environment thanks to the use of visual perception and (ii) avoid explicit programming thanks to the emulation of previous demonstrations. This work aims to exploit imitation learning for the visual servoing paradigm to address the specific problem of tracking moving objects. In particular, we show that it is possible to infer from data the compensation term required for realizing the tracking controller, avoiding the explicit implementation of estimators or observers. The effectiveness of the proposed method has been validated through simulations with a robotic manipulator.

19.VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning

Authors:Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Dinesh Manocha

Abstract: We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using Offline Reinforcement Learning (RL). Our method trains a novel RL policy from unlabeled data collected in real outdoor vegetation. This policy uses height and intensity-based cost maps derived from 3D LiDAR point clouds, a goal cost map, and processed proprioception data as state inputs, and learns the physical and geometric properties of the surrounding vegetation such as height, density, and solidity/stiffness for navigation. Instead of using end-to-end policy actions, the fully-trained RL policy's Q network is used to evaluate dynamically feasible robot actions generated from a novel adaptive planner capable of navigating through dense narrow passages and preventing entrapment in vegetation such as tall grass and bushes. We demonstrate our method's capabilities on a legged robot in complex outdoor vegetation. We observe an improvement in success rates, a decrease in average power consumption, and decrease in normalized trajectory length compared to both existing end-to-end offline RL and outdoor navigation methods.

20.A Unified Perspective on Multiple Shooting In Differential Dynamic Programming

Authors:He Li, Wenhao Yu, Tingnan Zhang, Patrick M. Wensing

Abstract: Differential Dynamic Programming (DDP) is an efficient computational tool for solving nonlinear optimal control problems. It was originally designed as a single shooting method and thus is sensitive to the initial guess supplied. This work considers the extension of DDP to multiple shooting (MS), improving its robustness to initial guesses. A novel derivation is proposed that accounts for the defect between shooting segments during the DDP backward pass, while still maintaining quadratic convergence locally. The derivation enables unifying multiple previous MS algorithms, and opens the door to many smaller algorithmic improvements. A penalty method is introduced to strategically control the step size, further improving the convergence performance. An adaptive merit function and a more reliable acceptance condition are employed for globalization. The effects of these improvements are benchmarked for trajectory optimization with a quadrotor, an acrobot, and a manipulator. MS-DDP is also demonstrated for use in Model Predictive Control (MPC) for dynamic jumping with a quadruped robot, showing its benefits over a single shooting approach.

21.Ca$^2$Lib: Simple and Accurate LiDAR-RGB Calibration using Small Common Markers

Authors:Emanuele Giacomini, Leonardo Brizi, Luca Di Giammarino, Omar Salem, Patrizio Perugini, Giorgio Grisetti

Abstract: In many fields of robotics, knowing the relative position and orientation between two sensors is a mandatory precondition to operate with multiple sensing modalities. In this context, the pair LiDAR-RGB cameras offer complementary features: LiDARs yield sparse high quality range measurements, while RGB cameras provide a dense color measurement of the environment. Existing techniques often rely either on complex calibration targets that are expensive to obtain, or extracted virtual correspondences that can hinder the estimate's accuracy. In this paper we address the problem of LiDAR-RGB calibration using typical calibration patterns (i.e. A3 chessboard) with minimal human intervention. Our approach exploits the planarity of the target to find correspondences between the sensors measurements, leading to features that are robust to LiDAR noise. Moreover, we estimate a solution by solving a joint non-linear optimization problem. We validated our approach by carrying on quantitative and comparative experiments with other state-of-the-art approaches. Our results show that our simple schema performs on par or better than other approches using complex calibration targets. Finally, we release an open-source C++ implementation at \url{}

22.Physically Plausible Full-Body Hand-Object Interaction Synthesis

Authors:Jona Braun, Sammy Christen, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

Abstract: We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.

1.Solar-powered shape-changing origami microfliers

Authors:Kyle Johnson, Vicente Arroyos, Amélie Ferran, Tilboon Elberier, Raul Villanueva, Dennis Yin, Alberto Aliseda, Sawyer Fuller, Vikram Iyer, Shyamnath Gollakota

Abstract: Using wind to disperse microfliers that fall like seeds and leaves can help automate large-scale sensor deployments. Here, we present battery-free microfliers that can change shape in mid-air to vary their dispersal distance. We design origami microfliers using bi-stable leaf-out structures and uncover an important property: a simple change in the shape of these origami structures causes two dramatically different falling behaviors. When unfolded and flat, the microfliers exhibit a tumbling behavior that increases lateral displacement in the wind. When folded inward, their orientation is stabilized, resulting in a downward descent that is less influenced by wind. To electronically transition between these two shapes, we designed a low-power electromagnetic actuator that produces peak forces of up to 200 millinewtons within 25 milliseconds while powered by solar cells. We fabricated a circuit directly on the folded origami structure that includes a programmable microcontroller, Bluetooth radio, solar power harvesting circuit, a pressure sensor to estimate altitude and a temperature sensor. Outdoor evaluations show that our 414 milligram origami microfliers are able to electronically change their shape mid-air, travel up to 98 meters in a light breeze, and wirelessly transmit data via Bluetooth up to 60 meters away, using only power collected from the sun.

2.Hierarchical Time-Optimal Planning for Multi-Vehicle Racing

Authors:Georg Jank, Matthias Rowold, Boris Lohmann

Abstract: This paper presents a hierarchical planning algorithm for racing with multiple opponents. The two-stage approach consists of a high-level behavioral planning step and a low-level optimization step. By combining discrete and continuous planning methods, our algorithm encourages global time optimality without being limited by coarse discretization. In the behavioral planning step, the fastest behavior is determined with a low-resolution spatio-temporal visibility graph. Based on the selected behavior, we calculate maneuver envelopes that are subsequently applied as constraints in a time-optimal control problem. The performance of our method is comparable to a parallel approach that selects the fastest trajectory from multiple optimizations with different behavior classes. However, our algorithm can be executed on a single core. This significantly reduces computational requirements, especially when multiple opponents are involved. Therefore, the proposed method is an efficient and practical solution for real-time multi-vehicle racing scenarios.

3.Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing

Authors:Chao Qin, Maxime S. J. Michet, Jingxiang Chen, Hugh H. -T. Liu

Abstract: Time-minimum trajectories through race tracks are determined by the drone's capability as well as the configuration of all gates (e.g., their shapes, sizes, and orientations). However, prior works neglect the impact of the gate configuration and formulate drone racing as a waypoint flight task, leading to conservative waypoint selection through each gate. We present a novel time-optimal planner that can account for gate constraints explicitly, enabling quadrotors to follow the most time-efficient waypoints at their single-rotor-thrust limits in tracks with hybrid gate types. Our approach provides comparable solution quality to the state-of-the-art but with a computation time orders of magnitude faster. Furthermore, the proposed framework allows users to customize gate constraints such as tunnels by concatenating existing gate classes, enabling high-fidelity race track modeling. Owing to the superior computation efficiency and flexibility, we can generate optimal racing trajectories for complex race tracks with tens or even hundreds of gates with distinct shapes. We validate our method in real-world flights and demonstrate that faster lap times can be produced by using gate constraints instead of waypoint constraints.

4.Stepwise Model Reconstruction of Robotic Manipulator Based on Data-Driven Method

Authors:Dingxu Guo, Jian xu, Shu Zhang

Abstract: Research on dynamics of robotic manipulators provides promising support for model-based control. In general, rigorous first-principles-based dynamics modeling and accurate identification of mechanism parameters are critical to achieving high precision in model-based control, while data-driven model reconstruction provides alternative approaches of the above process. Taking the level of activation of data as an indicator, this paper classifies the collected robotic manipulator data by means of K-means clustering algorithm. With the fundamental prior knowledge, we find the corresponding dynamical properties behind the classified data separately. Afterwards, the sparse identification of nonlinear dynamics (SINDy) method is used to reconstruct the dynamics model of the robotic manipulator step by step according to the activation level of the classified data. The simulation results show that the proposed method not only reduces the complexity of the basis function library, enabling the application of SINDy method to multi-degree-of-freedom robotic manipulators, but also decreases the influence of data noise on the regression results. Finally, the dynamic control based on the reconfigured model is deployed on the experimental platform, and the experimental results prove the effectiveness of the proposed method.

5.Lavender Autonomous Navigation with Semantic Segmentation at the Edge

Authors:Alessandro Navone, Fabrizio Romanelli, Marco Ambrosio, Mauro Martini, Simone Angarano, Marcello Chiaberge

Abstract: Achieving success in agricultural activities heavily relies on precise navigation in row crop fields. Recently, segmentation-based navigation has emerged as a reliable technique when GPS-based localization is unavailable or higher accuracy is needed due to vegetation or unfavorable weather conditions. It also comes in handy when plants are growing rapidly and require an online adaptation of the navigation algorithm. This work applies a segmentation-based visual agnostic navigation algorithm to lavender fields, considering both simulation and real-world scenarios. The effectiveness of this approach is validated through a wide set of experimental tests, which show the capability of the proposed solution to generalize over different scenarios and provide highly-reliable results.

6.Towards Connecting Control to Perception: High-Performance Whole-Body Collision Avoidance Using Control-Compatible Obstacles

Authors:Moritz Eckhoff, Dennis Knobbe, Henning Zwirnmann, Abdalla Swikir, Sami Haddadin

Abstract: One of the most important aspects of autonomous systems is safety. This includes ensuring safe human-robot and safe robot-environment interaction when autonomously performing complex tasks or in collaborative scenarios. Although several methods have been introduced to tackle this, most are unsuitable for real-time applications and require carefully hand-crafted obstacle descriptions. In this work, we propose a method combining high-frequency and real-time self and environment collision avoidance of a robotic manipulator with low-frequency, multimodal, and high-resolution environmental perceptions accumulated in a digital twin system. Our method is based on geometric primitives, so-called primitive skeletons. These, in turn, are information-compressed and real-time compatible digital representations of the robot's body and environment, automatically generated from ultra-realistic virtual replicas of the real world provided by the digital twin. Our approach is a key enabler for closing the loop between environment perception and robot control by providing the millisecond real-time control stage with a current and accurate world description, empowering it to react to environmental changes. We evaluate our whole-body collision avoidance on a 9-DOFs robot system through five experiments, demonstrating the functionality and efficiency of our framework.

7.Utilizing Hybrid Trajectory Prediction Models to Recognize Highly Interactive Traffic Scenarios

Authors:Maximilian Zipfl, Sven Spickermann, J. Marius Zöllner

Abstract: Autonomous vehicles hold great promise in improving the future of transportation. The driving models used in these vehicles are based on neural networks, which can be difficult to validate. However, ensuring the safety of these models is crucial. Traditional field tests can be costly, time-consuming, and dangerous. To address these issues, scenario-based closed-loop simulations can simulate many hours of vehicle operation in a shorter amount of time and allow for specific investigation of important situations. Nonetheless, the detection of relevant traffic scenarios that also offer substantial testing benefits remains a significant challenge. To address this need, in this paper we build an imitation learning based trajectory prediction for traffic participants. We combine an image-based (CNN) approach to represent spatial environmental factors and a graph-based (GNN) approach to specifically represent relations between traffic participants. In our understanding, traffic scenes that are highly interactive due to the network's significant utilization of the social component are more pertinent for a validation process. Therefore, we propose to use the activity of such sub networks as a measure of interactivity of a traffic scene. We evaluate our model using a motion dataset and discuss the value of the relationship information with respect to different traffic situations.

8.3D Active Metric-Semantic SLAM

Authors:Yuezhan Tao, Xu Liu, Igor Spasojevic, Saurav Agarwa, Vijay Kumar

Abstract: In this letter, we address the problem of exploration and metric-semantic mapping of multi-floor GPS-denied indoor environments using Size Weight and Power (SWaP) constrained aerial robots. Most previous work in exploration assumes that robot localization is solved. However, neglecting the state uncertainty of the agent can ultimately lead to cascading errors both in the resulting map and in the state of the agent itself. Furthermore, actions that reduce localization errors may be at direct odds with the exploration task. We propose a framework that balances the efficiency of exploration with actions that reduce the state uncertainty of the agent. In particular, our algorithmic approach for active metric-semantic SLAM is built upon sparse information abstracted from raw problem data, to make it suitable for SWaP-constrained robots. Furthermore, we integrate this framework within a fully autonomous aerial robotic system that achieves autonomous exploration in cluttered, 3D environments. From extensive real-world experiments, we showed that by including Semantic Loop Closure (SLC), we can reduce the robot pose estimation errors by over 90% in translation and approximately 75% in yaw, and the uncertainties in pose estimates and semantic maps by over 70% and 65%, respectively. Although discussed in the context of indoor multi-floor exploration, our system can be used for various other applications, such as infrastructure inspection and precision agriculture where reliable GPS data may not be available.

9.Real-Time Motion Planning for In-Hand Manipulation with a Multi-Fingered Hand

Authors:Xiao Gao, Kunpeng Yao, Farshad Khadivar, Aude Billard

Abstract: Dexterous manipulation of objects once held in hand remains a challenge. Such skills are, however, necessary for robotics to move beyond gripper-based manipulation and use all the dexterity offered by anthropomorphic robotic hands. One major challenge when manipulating an object within the hand is that fingers must move around the object while avoiding collision with other fingers or the object. Such collision-free paths must be computed in real-time, as the smallest deviation from the original plan can easily lead to collisions. We present a real-time approach to computing collision-free paths in a high-dimensional space. To guide the exploration, we learn an explicit representation of the free space, retrievable in real-time. We further combine this representation with closed-loop control via dynamical systems and sampling-based motion planning and show that the combination increases performance compared to alternatives, offering efficient search of feasible paths and real-time obstacle avoidance in a multi-fingered robotic hand.

10.Learning to Explore Indoor Environments using Autonomous Micro Aerial Vehicles

Authors:Yuezhan Tao, Eran Iceland, Beiming Li, Elchanan Zwecher, Uri Heinemann, Avraham Cohen, Amir Avni, Oren Gal, Ariel Barel, Vijay Kumar

Abstract: In this paper, we address the challenge of exploring unknown indoor aerial environments using autonomous aerial robots with Size Weight and Power (SWaP) constraints. The SWaP constraints induce limits on mission time requiring efficiency in exploration. We present a novel exploration framework that uses Deep Learning (DL) to predict the most likely indoor map given the previous observations, and Deep Reinforcement Learning (DRL) for exploration, designed to run on modern SWaP constraints neural processors. The DL-based map predictor provides a prediction of the occupancy of the unseen environment while the DRL-based planner determines the best navigation goals that can be safely reached to provide the most information. The two modules are tightly coupled and run onboard allowing the vehicle to safely map an unknown environment. Extensive experimental and simulation results show that our approach surpasses state-of-the-art methods by 50-60% in efficiency, which we measure by the fraction of the explored space as a function of the length of the trajectory traveled.

11.Using Lidar Intensity for Robot Navigation

Authors:Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Dinesh Manocha

Abstract: We present Multi-Layer Intensity Map, a novel 3D object representation for robot perception and autonomous navigation. They consist of multiple stacked layers of 2D grid maps each derived from reflected point cloud intensities corresponding to a certain height interval. The different layers of the intensity maps can be used to simultaneously estimate obstacles' height, solidity/density, and opacity. We demonstrate that they can help accurately differentiate obstacles that are safe to navigate through (e.g. beaded/string curtains, pliable tall grass), from ones that must be avoided (e.g. transparent surfaces such as glass walls, bushes, trees, etc.) in indoor and outdoor environments. Further, to handle narrow passages, and navigate through non-solid obstacles in dense environments, we propose an approach to adaptively inflate or enlarge the obstacles detected on intensity maps based on their solidity, and the robot's preferred velocity direction. We demonstrate these improved navigation capabilities in real-world narrow, dense environments using a real Turtlebot and Boston Dynamics Spot. We observe significant increases in success rates (up to 50%), a 9.55% decrease in trajectory length, and up to a 10.9% increase in the F-score compared to current navigation methods using other sensor modalities.

12.Efficient Reinforcement Learning for Jumping Monopods

Authors:Riccardo Bussola, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, Luigi Palopoli

Abstract: In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.

13.Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes

Authors:Kalvik Jakkala, Srinivas Akella

Abstract: This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots in order to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for multi-robot IPP. The proposed approach is demonstrated to be fast and accurate on real-world data.

14.CLiFF-LHMP: Using Spatial Dynamics Patterns for Long-Term Human Motion Prediction

Authors:Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Luigi Palmieri, Kai O. Arras, Achim J. Lilienthal, Martin Magnusson

Abstract: Human motion prediction is important for mobile service robots and intelligent vehicles to operate safely and smoothly around people. The more accurate predictions are, particularly over extended periods of time, the better a system can, e.g., assess collision risks and plan ahead. In this paper, we propose to exploit maps of dynamics (MoDs, a class of general representations of place-dependent spatial motion patterns, learned from prior observations) for long-term human motion prediction (LHMP). We present a new MoD-informed human motion prediction approach, named CLiFF-LHMP, which is data efficient, explainable, and insensitive to errors from an upstream tracking system. Our approach uses CLiFF-map, a specific MoD trained with human motion data recorded in the same environment. We bias a constant velocity prediction with samples from the CLiFF-map to generate multi-modal trajectory predictions. In two public datasets we show that this algorithm outperforms the state of the art for predictions over very extended periods of time, achieving 45% more accurate prediction performance at 50s compared to the baseline.

15.RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Authors:Mirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, Matteo Matteucci

Abstract: Loop Closure Detection (LCD) is an essential task in robotics and computer vision, serving as a fundamental component for various applications across diverse domains. These applications encompass object recognition, image retrieval, and video analysis. LCD consists in identifying whether a robot has returned to a previously visited location, referred to as a loop, and then estimating the related roto-translation with respect to the analyzed location. Despite the numerous advantages of radar sensors, such as their ability to operate under diverse weather conditions and provide a wider range of view compared to other commonly used sensors (e.g., cameras or LiDARs), integrating radar data remains an arduous task due to intrinsic noise and distortion. To address this challenge, this research introduces RadarLCD, a novel supervised deep learning pipeline specifically designed for Loop Closure Detection using the FMCW Radar (Frequency Modulated Continuous Wave) sensor. RadarLCD, a learning-based LCD methodology explicitly designed for radar systems, makes a significant contribution by leveraging the pre-trained HERO (Hybrid Estimation Radar Odometry) model. Being originally developed for radar odometry, HERO's features are used to select key points crucial for LCD tasks. The methodology undergoes evaluation across a variety of FMCW Radar dataset scenes, and it is compared to state-of-the-art systems such as Scan Context for Place Recognition and ICP for Loop Closure. The results demonstrate that RadarLCD surpasses the alternatives in multiple aspects of Loop Closure Detection.

1.Digital Twin System for Home Service Robot Based on Motion Simulation

Authors:Zhengsong Jiang, Guohui Tian, Yongcheng Cui, Tiantian Liu, Yu Gu, Yifei Wang

Abstract: In order to improve the task execution capability of home service robot, and to cope with the problem that purely physical robot platforms cannot sense the environment and make decisions online, a method for building digital twin system for home service robot based on motion simulation is proposed. A reliable mapping of the home service robot and its working environment from physical space to digital space is achieved in three dimensions: geometric, physical and functional. In this system, a digital space-oriented URDF file parser is designed and implemented for the automatic construction of the robot geometric model. Next, the physical model is constructed from the kinematic equations of the robot and an improved particle swarm optimization algorithm is proposed for the inverse kinematic solution. In addition, to adapt to the home environment, functional attributes are used to describe household objects, thus improving the semantic description of the digital space for the real home environment. Finally, through geometric model consistency verification, physical model validity verification and virtual-reality consistency verification, it shows that the digital twin system designed in this paper can construct the robot geometric model accurately and completely, complete the operation of household objects successfully, and the digital twin system is effective and practical.

2.Gait Design of a Novel Arboreal Concertina Locomotion for Snake-like Robots

Authors:Shuoqi Chen, Aaron Roth

Abstract: In this paper, we propose a novel strategy for a snake robot to move straight up a cylindrical surface. Prior works on pole-climbing for a snake robot mainly utilized a rolling helix gait, and although proven to be efficient, it does not reassemble movements made by a natural snake. We take inspiration from nature and seek to imitate the Arboreal Concertina Locomotion (ACL) from real-life serpents. In order to represent the 3D curves that make up the key motion patterns of ACL, we establish a set of parametric equations that identify periodic functions, which produce a sequence of backbone curves. We then build up the gait equation using the curvature integration method, and finally, we propose a simple motion estimation strategy using virtual chassis and non-slip model assumptions. We present experimental results using a 20-DOF snake robot traversing outside of a straight pipe.

3.Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping

Authors:Tianhao Wu, Mingdong Wu, Jiyao Zhang, Yunchong Gan, Hao Dong

Abstract: The use of anthropomorphic robotic hands for assisting individuals in situations where human hands may be unavailable or unsuitable has gained significant importance. In this paper, we propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects. Unlike conventional dexterous grasping, this task presents a more complex challenge as the policy needs to adapt to diverse user intentions, in addition to the object's geometry. We address this challenge by proposing an approach consisting of two sub-modules: a hand-object-conditional grasping primitive called Grasping Gradient Field~(GraspGF), and a history-conditional residual policy. GraspGF learns `how' to grasp by estimating the gradient from a success grasping example set, while the residual policy determines `when' and at what speed the grasping action should be executed based on the trajectory history. Experimental results demonstrate the superiority of our proposed method compared to baselines, highlighting the user-awareness and practicality in real-world applications. The codes and demonstrations can be viewed at "".

4.GVD-Exploration: An Efficient Autonomous Robot Exploration Framework Based on Fast Generalized Voronoi Diagram Extraction

Authors:Dingfeng Chen, Anxing Xiao, Meiyuan Zou, Wenzheng Chi, Jiankun Wang, Lining Sun

Abstract: Rapidly-exploring Random Trees (RRTs) are a popular technique for autonomous exploration of mobile robots. However, the random sampling used by RRTs can result in inefficient and inaccurate frontiers extraction, which affects the exploration performance. To address the issues of slow path planning and high path cost, we propose a framework that uses a generalized Voronoi diagram (GVD) based multi-choice strategy for robot exploration. Our framework consists of three components: a novel mapping model that uses an end-to-end neural network to construct GVDs of the environments in real time; a GVD-based heuristic scheme that accelerates frontiers extraction and reduces frontiers redundancy; and a multi-choice frontiers assignment scheme that considers different types of frontiers and enables the robot to make rational decisions during the exploration process. We evaluate our method on simulation and real-world experiments and show that it outperforms RRT-based exploration methods in terms of efficiency and robustness.

5.Inspection planning under execution uncertainty

Authors:Shmuel David Alpert, Kiril Solovey, Itzik Klein, Oren Salzman

Abstract: Autonomous inspection tasks necessitate effective path-planning mechanisms to efficiently gather observations from points of interest (POI). However, localization errors commonly encountered in urban environments can introduce execution uncertainty, posing challenges to the successful completion of such tasks. To tackle these challenges, we present IRIS-under uncertainty (IRIS-U^2), an extension of the incremental random inspection-roadmap search (IRIS) algorithm, that addresses the offline planning problem via an A*-based approach, where the planning process occurs prior the online execution. The key insight behind IRIS-U^2 is transforming the computed localization uncertainty, obtained through Monte Carlo (MC) sampling, into a POI probability. IRIS-U^2 offers insights into the expected performance of the execution task by providing confidence intervals (CI) for the expected coverage, expected path length, and collision probability, which becomes progressively tighter as the number of MC samples increase. The efficacy of IRIS-U^2 is demonstrated through a case study focusing on structural inspections of bridges. Our approach exhibits improved expected coverage, reduced collision probability, and yields increasingly-precise CIs as the number of MC samples grows. Furthermore, we emphasize the potential advantages of computing bounded sub-optimal solutions to reduce computation time while still maintaining the same CI boundaries.

6.An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Authors:Long Xu, Kaixin Chai, Zhichao Han, Hong Liu, Chao Xu, Yanjun Cao, Fei Gao

Abstract: Autonomous navigation of ground robots on uneven terrain is being considered in more and more tasks. However, uneven terrain will bring two problems to motion planning: how to assess the traversability of the terrain and how to cope with the dynamics model of the robot associated with the terrain. The trajectories generated by existing methods are often too conservative or cannot be tracked well by the controller since the second problem is not well solved. In this paper, we propose terrain pose mapping to describe the impact of terrain on the robot. With this mapping, we can obtain the SE(3) state of the robot on uneven terrain for a given state in SE(2). Then, based on it, we present a trajectory optimization framework for car-like robots on uneven terrain that can consider both of the above problems. The trajectories generated by our method conform to the dynamics model of the system without being overly conservative and yet able to be tracked well by the controller. We perform simulations and real-world experiments to validate the efficiency and trajectory quality of our algorithm.

7.Predicting Routine Object Usage for Proactive Robot Assistance

Authors:Maithili Patel, Aswin Prakash, Sonia Chernova

Abstract: Proactivity in robot assistance refers to the robot's ability to anticipate user needs and perform assistive actions without explicit requests. This requires understanding user routines, predicting consistent activities, and actively seeking information to predict inconsistent behaviors. We propose SLaTe-PRO (Sequential Latent Temporal model for Predicting Routine Object usage), which improves upon prior state-of-the-art by combining object and user action information, and conditioning object usage predictions on past history. Additionally, we find some human behavior to be inherently stochastic and lacking in contextual cues that the robot can use for proactive assistance. To address such cases, we introduce an interactive query mechanism that can be used to ask queries about the user's intended activities and object use to improve prediction. We evaluate our approach on longitudinal data from three households, spanning 24 activity classes. SLaTe-PRO performance raises the F1 score metric to 0.57 without queries, and 0.60 with user queries, over a score of 0.43 from prior work. We additionally present a case study with a fully autonomous household robot.

8.Lighter-Than-Air Autonomous Ball Capture and Scoring Robot -- Design, Development, and Deployment

Authors:Joseph Prince Mathew, Dinesh Karri, James Yang, Kevin Zhu, Yojan Gautam, Kentaro Nojima-Schmunk, Daigo Shishika, Ningshi Yao, Cameron Nowzari

Abstract: This paper describes the full end-to-end design of our primary scoring agent in an aerial autonomous robotics competition from April 2023. As open-ended robotics competitions become more popular, we wish to begin documenting successful team designs and approaches. The intended audience of this paper is not only any future or potential participant in this particular national Defend The Republic (DTR) competition, but rather anyone thinking about designing their first robot or system to be entered in a competition with clear goals. Future DTR participants can and should either build on the ideas here, or find new alternate strategies that can defeat the most successful design last time. For non-DTR participants but students interested in robotics competitions, identifying the minimum viable system needed to be competitive is still important in helping manage time and prioritizing tasks that are crucial to competition success first.

9.Human-Centered Autonomy for Autonomous sUAS Target Searching

Authors:Hunter M. Ray, Zakariya Laouar, Zachary Sunberg, Nisar Ahmed

Abstract: Deploying robots that operate in dynamic, uncertain environments, such as Uncrewed Aerial Systems in search \& rescue missions, require nearly continuous human supervision for vehicle guidance and operation. Without approaches that consider high level mission context, operational methods of autonomous flying necessitate cumbersome manual operation or inefficient exhaustive search patterns. To facilitate more effective use of autonomy, we present a human-centered autonomous system that infers geospatial mission context through dynamic features sets, which then guides a probabilistic target search planner. Operators provide a limited set of diverse inputs, including priority definition, spatial semantic observations over ad-hoc geographical areas, and reference waypoints, which are probabilistically fused with geographical database information and condensed into a discretized value map representing an operator's preferences over an operational area. An online, POMDP-based planner, optimized for target searching, is augmented with this value map to generate an operator-constrained vehicle waypoint guidance plan. We validate the system by gathering input from five first responders trained in search \& rescue and compare simulated system performance against current operational methods for autonomous missions. These results display effective task mental model alignment and more efficient guidance plans, resulting in faster rescue times.

10.LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

Authors:Kenneth Shaw, Ananye Agarwal, Deepak Pathak

Abstract: Dexterous manipulation has been a long-standing challenge in robotics. While machine learning techniques have shown some promise, results have largely been currently limited to simulation. This can be mostly attributed to the lack of suitable hardware. In this paper, we present LEAP Hand, a low-cost dexterous and anthropomorphic hand for machine learning research. In contrast to previous hands, LEAP Hand has a novel kinematic structure that allows maximal dexterity regardless of finger pose. LEAP Hand is low-cost and can be assembled in 4 hours at a cost of 2000 USD from readily available parts. It is capable of consistently exerting large torques over long durations of time. We show that LEAP Hand can be used to perform several manipulation tasks in the real world -- from visual teleoperation to learning from passive video data and sim2real. LEAP Hand significantly outperforms its closest competitor Allegro Hand in all our experiments while being 1/8th of the cost. We release detailed assembly instructions, the Sim2Real pipeline and a development platform with useful APIs on our website at

1.Evaluating Visual Odometry Methods for Autonomous Driving in Rain

Authors:Yu Xiang Tan, Marcel Bartholomeus Prasetyo, Mohammad Alif Daffa, Deshpande Sunny Nitin, Malika Meghjani

Abstract: The increasing demand for autonomous vehicles has created a need for robust navigation systems that can also operate effectively in adverse weather conditions. Visual odometry is a technique used in these navigation systems, enabling the estimation of vehicle position and motion using input from onboard cameras. However, visual odometry accuracy can be significantly impacted in challenging weather conditions, such as heavy rain, snow, or fog. In this paper, we evaluate a range of visual odometry methods, including our DROIDSLAM based heuristic approach. Specifically, these algorithms are tested on both clear and rainy weather urban driving data to evaluate their robustness. We compiled a dataset comprising of a range of rainy weather conditions from different cities. This includes, the Oxford Robotcar dataset from Oxford, the 4Seasons dataset from Munich and an internal dataset collected in Singapore. We evaluated different visual odometry algorithms for both monocular and stereo camera setups using the Absolute Trajectory Error (ATE). Our evaluation suggests that the Depth and Flow for Visual Odometry (DF-VO) algorithm with monocular setup worked well for short range distances (< 500m) and our proposed DROID-SLAM based heuristic approach for the stereo setup performed relatively well for long-term localization. Both algorithms performed consistently well across all rain conditions.

2.Real-Time Parallel Trajectory Optimization with Spatiotemporal Safety Constraints for Autonomous Driving in Congested Traffic

Authors:Lei Zheng, Rui Yang, Zengqi Peng, Haichao Liu, Michael Yu Wang, Jun Ma

Abstract: Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the safe interaction between the AV and SVs in the presence of trajectory prediction errors resulting from the multi-modal behaviors of the SVs. By leveraging multiple shooting and constraint transcription, we transform the trajectory optimization problem into a nonlinear programming problem, which allows for the use of optimization solvers and parallel computing techniques to generate multiple feasible trajectories in parallel. Subsequently, these spatiotemporal trajectories are fed into a multi-objective evaluation module considering both safety and efficiency objectives, such that the optimal feasible trajectory corresponding to the optimal target lane can be selected. The proposed framework is validated through simulations in a dense and congested driving scenario with multiple uncertain SVs. The results demonstrate that our method enables the AV to safely navigate through a dense and congested traffic scenario while achieving high travel efficiency and task accuracy in real time.

3.Unsupervised human-to-robot motion retargeting via expressive latent space

Authors:Yashuai Yan, Esteve Valls Mascaro, Dongheui Lee

Abstract: This paper introduces a novel approach for human-to-robot motion retargeting, enabling robots to mimic human motion with precision while preserving the semantics of the motion. For that, we propose a deep learning method for direct translation from human to robot motion. Our method does not require annotated paired human-to-robot motion data, which reduces the effort when adopting new robots. To this end, we first propose a cross-domain similarity metric to compare the poses from different domains (i.e., human and robot). Then, our method achieves the construction of a shared latent space via contrastive learning and decodes latent representations to robot motion control commands. The learned latent space exhibits expressiveness as it captures the motions precisely and allows direct motion control in the latent space. We showcase how to generate in-between motion through simple linear interpolation in the latent space between two projected human poses. Additionally, we conducted a comprehensive evaluation of robot control using diverse modality inputs, such as texts, RGB videos, and key-poses, which enhances the ease of robot control to users of all backgrounds. Finally, we compare our model with existing works and quantitatively and qualitatively demonstrate the effectiveness of our approach, enhancing natural human-robot communication and fostering trust in integrating robots into daily life.

4.PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics

Authors:Claus Smitt, Michael Halstead, Patrick Zimmer, Thomas Läbe, Esra Guclu, Cyrill Stachniss, Chris McCool

Abstract: Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, our system is able to output scene geometry, photo-realistic renders and 3D consistent panoptic representations with consistent instance IDs. We evaluate this novel system in a very challenging horticultural scenario and in doing so demonstrate an end-to-end trainable system that can make use of noisy robot poses rather than precise poses that have to be pre-calculated. Compared to a baseline approach the peak signal to noise ratio is improved from 21.34dB to 23.37dB while the panoptic quality improves from 56.65% to 70.08%. Furthermore, our approach is faster and can be tuned to improve inference time by more than a factor of 2 while being memory efficient with approximately 12 times fewer parameters.

5.A survey on real-time 3D scene reconstruction with SLAM methods in embedded systems

Authors:Quentin Picard, Stephane Chevobbe, Mehdi Darouich, Jean-Yves Didier

Abstract: The 3D reconstruction of simultaneous localization and mapping (SLAM) is an important topic in the field for transport systems such as drones, service robots and mobile AR/VR devices. Compared to a point cloud representation, the 3D reconstruction based on meshes and voxels is particularly useful for high-level functions, like obstacle avoidance or interaction with the physical environment. This article reviews the implementation of a visual-based 3D scene reconstruction pipeline on resource-constrained hardware platforms. Real-time performances, memory management and low power consumption are critical for embedded systems. A conventional SLAM pipeline from sensors to 3D reconstruction is described, including the potential use of deep learning. The implementation of advanced functions with limited resources is detailed. Recent systems propose the embedded implementation of 3D reconstruction methods with different granularities. The trade-off between required accuracy and resource consumption for real-time localization and reconstruction is one of the open research questions identified and discussed in this paper.

6.Incipient Slip-Based Rotation Measurement via Visuotactile Sensing During In-Hand Object Pivoting

Authors:Mingxuan Li, Yen Hang Zhou, Tiemin Li, Yao Jiang

Abstract: In typical in-hand manipulation tasks represented by object pivoting, the real-time perception of rotational slippage has been proven beneficial for improving the dexterity and stability of robotic hands. An effective strategy is to obtain the contact properties for measuring rotation angle through visuotactile sensing. However, existing methods for rotation estimation did not consider the impact of the incipient slip during the pivoting process, which introduces measurement errors and makes it hard to determine the boundary between stable contact and macro slip. This paper describes a generalized 2-d contact model under pivoting, and proposes a rotation measurement method based on the line-features in the stick region. The proposed method was applied to the Tac3D vision-based tactile sensors using continuous marker patterns. Experiments show that the rotation measurement system could achieve an average static measurement error of 0.17 degree and an average dynamic measurement error of 1.34 degree. Besides, the proposed method requires no training data and can achieve real-time sensing during the in-hand object pivoting.

7.Grabbing power line conductors based on the measurements of the magnetic field strength

Authors:Goran Vasiljevic, Dean Martinovic, Matko Orsag, Stjepan Bogdan

Abstract: This paper presents the method for the localization and grabbing of the long straight conductor based only on the magnetic field generated by the alternating current flowing through the conductor. The method uses two magnetometers mounted on the robot arm end-effector for localization. This location is then used to determine needed robot movement in order to grab the conductor. The method was tested in the laboratory conditions using the Schunk LWA 4P 6-axis robot arm.

8.Design and Validation of a Wireless Drone Docking Station

Authors:Dario Stuhne, Goran Vasiljevic, Stjepan Bogdan, Zdenko Kovacic

Abstract: Drones are increasingly operating autonomously, and the need for extending drone power autonomy is rapidly increasing. One of the most promising solutions to extend drone power autonomy is the use of docking stations to support both landing and recharging of the drone. To this end, we introduce a novel wireless drone docking station with three commercial wireless charging modules. We have developed two independent units, both in mechanical and electrical aspects: the energy transmitting unit and the energy receiving unit. We have also studied the efficiency of wireless power transfer and demonstrated the advantages of connecting three receiver modules connected in series and parallel. We have achieved maximum output power of 96.5 W with a power transfer efficiency of 56.6% for the series connection of coils. Finally, we implemented the system in practice on a drone and tested both energy transfer and landing.

9.A Comparison between Frame-based and Event-based Cameras for Flapping-Wing Robot Perception

Authors:Raul Tapia, Juan Pablo Rodríguez-Gómez, Juan Antonio Sanchez-Diaz, Francisco Javier Gañán, Iván Gutierrez Rodríguez, Javier Luna-Santamaria, José Ramiro Martínez-de Dios, Anibal Ollero

Abstract: Perception systems for ornithopters face severe challenges. The harsh vibrations and abrupt movements caused during flapping are prone to produce motion blur and strong lighting condition changes. Their strict restrictions in weight, size, and energy consumption also limit the type and number of sensors to mount onboard. Lightweight traditional cameras have become a standard off-the-shelf solution in many flapping-wing designs. However, bioinspired event cameras are a promising solution for ornithopter perception due to their microsecond temporal resolution, high dynamic range, and low power consumption. This paper presents an experimental comparison between frame-based and an event-based camera. Both technologies are analyzed considering the particular flapping-wing robot specifications and also experimentally analyzing the performance of well-known vision algorithms with data recorded onboard a flapping-wing robot. Our results suggest event cameras as the most suitable sensors for ornithopters. Nevertheless, they also evidence the open challenges for event-based vision on board flapping-wing robots.

10.STAR-loc: Dataset for STereo And Range-based localization

Authors:Frederike Dümbgen, Mohammed A. Shalaby, Connor Holmes, Charles C. Cossette, James R. Forbes, Jerome Le Ny, Timothy D. Barfoot

Abstract: This document contains a detailed description of the STAR-loc dataset. For a quick starting guide please refer to the associated Github repository ( The dataset consists of stereo camera data (rectified/raw images and inertial measurement unit measurements) and ultra-wideband (UWB) data (range measurements) collected on a sensor rig in a Vicon motion capture arena. The UWB anchors and visual landmarks (Apriltags) are of known position, so the dataset can be used for both localization and Simultaneous Localization and Mapping (SLAM).

11.Undergraduate Research of Decentralized Localization of Roombas Through Usage of Wall-Finding Software

Authors:Madeline Corvin, Johnathan McDowell, Timothy Anglea, Yongqiang Wang

Abstract: This paper introduces the research effort of an undergraduate research team in realizing robot localization. More specifically, the undergraduate research team developed and tested wall-following software that allowed a ground robot Roombas to independently find their positions within a defined space. The software also allows a robot to send its localized position to other Roombas, so that each Roomba knows its relative location to realize robot cooperation.

12.Task-Oriented Cross-System Design for Timely and Accurate Modeling in the Metaverse

Authors:Zhen Meng, Kan Chen, Yufeng Diao, Changyang She, Guodong Zhao, Muhammad Ali Imran, Branka Vucetic

Abstract: In this paper, we establish a task-oriented cross-system design framework to minimize the required packet rate for timely and accurate modeling of a real-world robotic arm in the Metaverse, where sensing, communication, prediction, control, and rendering are considered. To optimize a scheduling policy and prediction horizons, we design a Constraint Proximal Policy Optimization(C-PPO) algorithm by integrating domain knowledge from relevant systems into the advanced reinforcement learning algorithm, Proximal Policy Optimization(PPO). Specifically, the Jacobian matrix for analyzing the motion of the robotic arm is included in the state of the C-PPO algorithm, and the Conditional Value-at-Risk(CVaR) of the state-value function characterizing the long-term modeling error is adopted in the constraint. Besides, the policy is represented by a two-branch neural network determining the scheduling policy and the prediction horizons, respectively. To evaluate our algorithm, we build a prototype including a real-world robotic arm and its digital model in the Metaverse. The experimental results indicate that domain knowledge helps to reduce the convergence time and the required packet rate by up to 50%, and the cross-system design framework outperforms a baseline framework in terms of the required packet rate and the tail distribution of the modeling error.

13.MAPS$^2$: Multi-Robot Anytime Motion Planning under Signal Temporal Logic Specifications

Authors:Mayank Sewlia, Christos K. Verginis, Dimos V. Dimarogonas

Abstract: This article presents MAPS$^2$ : a distributed algorithm that allows multi-robot systems to deliver coupled tasks expressed as Signal Temporal Logic (STL) constraints. Classical control theoretical tools addressing STL constraints either adopt a limited fragment of the STL formula or require approximations of min/max operators, whereas works maximising robustness through optimisation-based methods often suffer from local minima, relaxing any completeness arguments due to the NP-hard nature of the problem. Endowed with probabilistic guarantees, MAPS$^2$ provides an anytime algorithm that iteratively improves the robots' trajectories. The algorithm selectively imposes spatial constraints by taking advantage of the temporal properties of the STL. The algorithm is distributed, in the sense that each robot calculates its trajectory by communicating only with its immediate neighbours as defined via a communication graph. We illustrate the efficiency of MAPS$^2$ by conducting extensive simulation and experimental studies, verifying the generation of STL satisfying trajectories.

14.Dynamic Handover: Throw and Catch with Bimanual Hands

Authors:Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang

Abstract: Humans throw and catch objects all the time. However, such a seemingly common skill introduces a lot of challenges for robots to achieve: The robots need to operate such dynamic actions at high-speed, collaborate precisely, and interact with diverse objects. In this paper, we design a system with two multi-finger hands attached to robot arms to solve this problem. We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots. To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object. Such a model can help the robot catcher has a real-time estimation of where the object will be heading, and then react accordingly. We conduct our experiments with multiple objects in the real-world system, and show significant improvements over multiple baselines. Our project page is available at \url{}.

15.ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion

Authors:Hongyu Li, Snehal Dikhale, Soshi Iba, Nawid Jamali

Abstract: In this letter, we introduce ViHOPE, a novel framework for estimating the 6D pose of an in-hand object using visuotactile perception. Our key insight is that the accuracy of the 6D object pose estimate can be improved by explicitly completing the shape of the object. To this end, we introduce a novel visuotactile shape completion module that uses a conditional Generative Adversarial Network to complete the shape of an in-hand object based on volumetric representation. This approach improves over prior works that directly regress visuotactile observations to a 6D pose. By explicitly completing the shape of the in-hand object and jointly optimizing the shape completion and pose estimation tasks, we improve the accuracy of the 6D object pose estimate. We train and test our model on a synthetic dataset and compare it with the state-of-the-art. In the visuotactile shape completion task, we outperform the state-of-the-art by 265% using the Intersection of Union metric and achieve 88% lower Chamfer Distance. In the visuotactile pose estimation task, we present results that suggest our framework reduces position and angular errors by 35% and 64%, respectively. Furthermore, we ablate our framework to confirm the gain on the 6D object pose estimate from explicitly completing the shape. Ultimately, we show that our framework produces models that are robust to sim-to-real transfer on a real-world robot platform.

16.Robot Parkour Learning

Authors:Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao

Abstract: Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments. Existing methods can generate either diverse but blind locomotion skills or vision-based but specialized skills by using reference animal data or complex rewards. However, autonomous parkour requires robots to learn generalizable skills that are both vision-based and diverse to perceive and react to various scenarios. In this work, we propose a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data. We develop a reinforcement learning method inspired by direct collocation to generate parkour skills, including climbing over high obstacles, leaping over large gaps, crawling beneath low barriers, squeezing through thin slits, and running. We distill these skills into a single vision-based parkour policy and transfer it to a quadrupedal robot using its egocentric depth camera. We demonstrate that our system can empower two different low-cost robots to autonomously select and execute appropriate parkour skills to traverse challenging real-world environments.

1.Proprioceptive External Torque Learning for Floating Base Robot and its Applications to Humanoid Locomotion

Authors:Daegyu Lim, Myeong-Ju Kim, Junhyeok Cha, Donghyeon Kim, Jaeheung Park

Abstract: The estimation of external joint torque and contact wrench is essential for achieving stable locomotion of humanoids and safety-oriented robots. Although the contact wrench on the foot of humanoids can be measured using a force-torque sensor (FTS), FTS increases the cost, inertia, complexity, and failure possibility of the system. This paper introduces a method for learning external joint torque solely using proprioceptive sensors (encoders and IMUs) for a floating base robot. For learning, the GRU network is used and random walking data is collected. Real robot experiments demonstrate that the network can estimate the external torque and contact wrench with significantly smaller errors compared to the model-based method, momentum observer (MOB) with friction modeling. The study also validates that the estimated contact wrench can be utilized for zero moment point (ZMP) feedback control, enabling stable walking. Moreover, even when the robot's feet and the inertia of the upper body are changed, the trained network shows consistent performance with a model-based calibration. This result demonstrates the possibility of removing FTS on the robot, which reduces the disadvantages of hardware sensors. The summary video is available at

2.A novel method for layer jamming-based continuum robots

Authors:Bowen Yi, Yeman Fan, Dikai Liu

Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the first solution, to the best of our knowledge, to close the gap. We propose an energy-based model that is integrated with the LuGre frictional model for LJ-based continuum robots. Then, we take a comprehensive theoretical analysis for this model, focusing on two fundamental characteristics of LJ-based continuum robots: shape locking and adjustable stiffness. To validate the modeling approach and theoretical results, a series of experiments using our \textit{OctRobot-I} continuum robotic platform was conducted. The results show that the proposed model is capable of interpreting and predicting the dynamical behaviors in LJ-based continuum robots.

3.Predictive and Robust Robot Assistance for Sequential Manipulation

Authors:Theodoros Stouraitis, Michael Gienger

Abstract: This paper presents a novel concept to support physically impaired humans in daily object manipulation tasks with a robot. Given a user's manipulation sequence, we propose a predictive model that uniquely casts the user's sequential behavior as well as a robot support intervention into a hierarchical multi-objective optimization problem. A major contribution is the prediction formulation, which allows to consider several different future paths concurrently. The second contribution is the encoding of a general notion of constancy constraints, which allows to consider dependencies between consecutive or far apart keyframes (in time or space) of a sequential task. We perform numerical studies, simulations and robot experiments to analyse and evaluate the proposed method in several table top tasks where a robot supports impaired users by predicting their posture and proactively re-arranging objects.

4.Toward Certifying Maps for Safe Localization Under Adversarial Corruption

Authors:Johann Laconte, Daniil Lisus, Timothy D. Barfoot

Abstract: In this paper, we propose a way to model the resilience of the Iterative Closest Point (ICP) algorithm in the presence of corrupted measurements. In the context of autonomous vehicles, certifying the safety of the localization process poses a significant challenge. As robots evolve in a complex world, various types of noise can impact the measurements. Conventionally, this noise has been assumed to be distributed according to a zero-mean Gaussian distribution. However, this assumption does not hold in numerous scenarios, including adverse weather conditions, occlusions caused by dynamic obstacles, or long-term changes in the map. In these cases, the measurements are instead affected by a large, deterministic fault. This paper introduces a closed-form formula approximating the highest pose error caused by corrupted measurements using the ICP algorithm. Using this formula, we develop a metric to certify and pinpoint specific regions within the environment where the robot is more vulnerable to localization failures in the presence of faults in the measurements.

5.A Tutorial on Distributed Optimization for Cooperative Robotics: from Setups and Algorithms to Toolboxes and Research Directions

Authors:Andrea Testa, Guido Carnevale, Giuseppe Notarstefano

Abstract: Several interesting problems in multi-robot systems can be cast in the framework of distributed optimization. Examples include multi-robot task allocation, vehicle routing, target protection and surveillance. While the theoretical analysis of distributed optimization algorithms has received significant attention, its application to cooperative robotics has not been investigated in detail. In this paper, we show how notable scenarios in cooperative robotics can be addressed by suitable distributed optimization setups. Specifically, after a brief introduction on the widely investigated consensus optimization (most suited for data analytics) and on the partition-based setup (matching the graph structure in the optimization), we focus on two distributed settings modeling several scenarios in cooperative robotics, i.e., the so-called constraint-coupled and aggregative optimization frameworks. For each one, we consider use-case applications, and we discuss tailored distributed algorithms with their convergence properties. Then, we revise state-of-the-art toolboxes allowing for the implementation of distributed schemes on real networks of robots without central coordinators. For each use case, we discuss their implementation in these toolboxes and provide simulations and real experiments on networks of heterogeneous robots.

6.The use of deception in dementia-care robots: Should robots tell "white lies" to limit emotional distress?

Authors:Samuel Rhys Cox, Grace Cheong, Wei Tsang Ooi

Abstract: With projections of ageing populations and increasing rates of dementia, there is need for professional caregivers. Assistive robots have been proposed as a solution to this, as they can assist people both physically and socially. However, caregivers often need to use acts of deception (such as misdirection or white lies) in order to ensure necessary care is provided while limiting negative impacts on the cared-for such as emotional distress or loss of dignity. We discuss such use of deception, and contextualise their use within robotics.

7.Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

Authors:Leonard Bärmann, Rainer Kartmann, Fabian Peller-Konrad, Alex Waibel, Tamim Asfour

Abstract: Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.

8.Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Authors:David DeFazio, Eisuke Hirota, Shiqi Zhang

Abstract: Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page:

9.Data-Driven Batch Localization and SLAM Using Koopman Linearization

Authors:Zi Cong Guo, Frederike Dümbgen, James R. Forbes, Timothy D. Barfoot

Abstract: We present a framework for model-free batch localization and SLAM. We use lifting functions to map a control-affine system into a high-dimensional space, where both the process model and the measurement model are rendered bilinear. During training, we solve a least-squares problem using groundtruth data to compute the high-dimensional model matrices associated with the lifted system purely from data. At inference time, we solve for the unknown robot trajectory and landmarks through an optimization problem, where constraints are introduced to keep the solution on the manifold of the lifting functions. The problem is efficiently solved using a sequential quadratic program (SQP), where the complexity of an SQP iteration scales linearly with the number of timesteps. Our algorithms, called Reduced Constrained Koopman Linearization Localization (RCKL-Loc) and Reduced Constrained Koopman Linearization SLAM (RCKL-SLAM), are validated experimentally in simulation and on two datasets: one with an indoor mobile robot equipped with a laser rangefinder that measures range to cylindrical landmarks, and one on a golf cart equipped with RFID range sensors. We compare RCKL-Loc and RCKL-SLAM with classic model-based nonlinear batch estimation. While RCKL-Loc and RCKL-SLAM have similar performance compared to their model-based counterparts, they outperform the model-based approaches when the prior model is imperfect, showing the potential benefit of the proposed data-driven technique.

10.Realistic pedestrian behaviour in the CARLA simulator using VR and mocap

Authors:Sergio Martín Serrano, David Fernández Llorca, Iván García Daza, Miguel Ángel Sotelo Vázquez

Abstract: Simulations are gaining increasingly significance in the field of autonomous driving due to the demand for rapid prototyping and extensive testing. Employing physics-based simulation brings several benefits at an affordable cost, while mitigating potential risks to prototypes, drivers, and vulnerable road users. However, there exit two primary limitations. Firstly, the reality gap which refers to the disparity between reality and simulation and prevents the simulated autonomous driving systems from having the same performance in the real world. Secondly, the lack of empirical understanding regarding the behavior of real agents, such as backup drivers or passengers, as well as other road users such as vehicles, pedestrians, or cyclists. Agent simulation is commonly implemented through deterministic or randomized probabilistic pre-programmed models, or generated from real-world data; but it fails to accurately represent the behaviors adopted by real agents while interacting within a specific simulated scenario. This paper extends the description of our proposed framework to enable real-time interaction between real agents and simulated environments, by means immersive virtual reality and human motion capture systems within the CARLA simulator for autonomous driving. We have designed a set of usability examples that allow the analysis of the interactions between real pedestrians and simulated autonomous vehicles and we provide a first measure of the user's sensation of presence in the virtual environment.

11.Comparative Study of Visual SLAM-Based Mobile Robot Localization Using Fiducial Markers

Authors:Jongwon Lee, Su Yeon Choi, David Hanley, Timothy Bretl

Abstract: This paper presents a comparative study of three modes for mobile robot localization based on visual SLAM using fiducial markers (i.e., square-shaped artificial landmarks with a black-and-white grid pattern): SLAM, SLAM with a prior map, and localization with a prior map. The reason for comparing the SLAM-based approaches leveraging fiducial markers is because previous work has shown their superior performance over feature-only methods, with less computational burden compared to methods that use both feature and marker detection without compromising the localization performance. The evaluation is conducted using indoor image sequences captured with a hand-held camera containing multiple fiducial markers in the environment. The performance metrics include absolute trajectory error and runtime for the optimization process per frame. In particular, for the last two modes (SLAM and localization with a prior map), we evaluate their performances by perturbing the quality of prior map to study the extent to which each mode is tolerant to such perturbations. Hardware experiments show consistent trajectory error levels across the three modes, with the localization mode exhibiting the shortest runtime among them. Yet, with map perturbations, SLAM with a prior map maintains performance, while localization mode degrades in both aspects.

12.Multi-contact Stochastic Predictive Control for Legged Robots with Contact Locations Uncertainty

Authors:Ahmad Gazar, Majid Khadiv, Andrea Del Prete, Ludovic Righetti

Abstract: Trajectory optimization under uncertainties is a challenging problem for robots in contact with the environment. Such uncertainties are inevitable due to estimation errors, control imperfections, and model mismatches between planning models used for control and the real robot dynamics. This induces control policies that could violate the contact location constraints by making contact at unintended locations, and as a consequence leading to unsafe motion plans. This work addresses the problem of robust kino-dynamic whole-body trajectory optimization using stochastic nonlinear model predictive control (SNMPC) by considering additive uncertainties on the model dynamics subject to contact location chance-constraints as a function of robot's full kinematics. We demonstrate the benefit of using SNMPC over classic nonlinear MPC (NMPC) for whole-body trajectory optimization in terms of contact location constraint satisfaction (safety). We run extensive Monte-Carlo simulations for a quadruped robot performing agile trotting and bounding motions over small stepping stones, where contact location satisfaction becomes critical. Our results show that SNMPC is able to perform all motions safely with 100% success rate, while NMPC failed 48.3% of all motions.

1.A method for Selecting Scenes and Emotion-based Descriptions for a Robot's Diary

Authors:Aiko Ichikura, Kento Kawaharazuka, Yoshiki Obinata, Kei Okada, Masayuki Inaba

Abstract: In this study, we examined scene selection methods and emotion-based descriptions for a robot's daily diary. We proposed a scene selection method and an emotion description method that take into account semantic and affective information, and created several types of diaries. Experiments were conducted to examine the change in sentiment values and preference of each diary, and it was found that the robot's feelings and impressions changed more from date to date when scenes were selected using the affective captions. Furthermore, we found that the robot's emotion generally improves the preference of the robot's diary regardless of the scene it describes. However, presenting negative or mixed emotions at once may decrease the preference of the diary or reduce the robot's robot-likeness, and thus the method of presenting emotions still needs further investigation.

2.Deep Imitation Learning for Humanoid Loco-manipulation through Human Teleoperation

Authors:Mingyo Seo, Steve Han, Kyutae Sim, Seung Hyeon Bang, Carlos Gonzalez, Luis Sentis, Yuke Zhu

Abstract: We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The difficulty of collecting task demonstrations and training policies for humanoids with a high degree of freedom presents substantial challenges. We introduce TRILL, a data-efficient framework for training humanoid loco-manipulation policies from human demonstrations. In this framework, we collect human demonstration data through an intuitive Virtual Reality (VR) interface. We employ the whole-body control formulation to transform task-space commands by human operators into the robot's joint-torque actuation while stabilizing its dynamics. By employing high-level action abstractions tailored for humanoid loco-manipulation, our method can efficiently learn complex sensorimotor skills. We demonstrate the effectiveness of TRILL in simulation and on a real-world robot for performing various loco-manipulation tasks. Videos and additional materials can be found on the project page:

3.Graph-Based Interaction-Aware Multimodal 2D Vehicle Trajectory Prediction using Diffusion Graph Convolutional Networks

Authors:Keshu Wu, Yang Zhou, Haotian Shi, Xiaopeng Li, Bin Ran

Abstract: Predicting vehicle trajectories is crucial for ensuring automated vehicle operation efficiency and safety, particularly on congested multi-lane highways. In such dynamic environments, a vehicle's motion is determined by its historical behaviors as well as interactions with surrounding vehicles. These intricate interactions arise from unpredictable motion patterns, leading to a wide range of driving behaviors that warrant in-depth investigation. This study presents the Graph-based Interaction-aware Multi-modal Trajectory Prediction (GIMTP) framework, designed to probabilistically predict future vehicle trajectories by effectively capturing these interactions. Within this framework, vehicles' motions are conceptualized as nodes in a time-varying graph, and the traffic interactions are represented by a dynamic adjacency matrix. To holistically capture both spatial and temporal dependencies embedded in this dynamic adjacency matrix, the methodology incorporates the Diffusion Graph Convolutional Network (DGCN), thereby providing a graph embedding of both historical states and future states. Furthermore, we employ a driving intention-specific feature fusion, enabling the adaptive integration of historical and future embeddings for enhanced intention recognition and trajectory prediction. This model gives two-dimensional predictions for each mode of longitudinal and lateral driving behaviors and offers probabilistic future paths with corresponding probabilities, addressing the challenges of complex vehicle interactions and multi-modality of driving behaviors. Validation using real-world trajectory datasets demonstrates the efficiency and potential.

4.AutonomROS: A ReconROS-based Autonomonous Driving Unit

Authors:Christian Lienen, Mathis Brede, Daniel Karger, Kevin Koch, Dalisha Logan, Janet Mazur, Alexander Philipp Nowosad, Alexander Schnelle, Mohness Waizy, Marco Platzner

Abstract: Autonomous driving has become an important research area in recent years, and the corresponding system creates an enormous demand for computations. Heterogeneous computing platforms such as systems-on-chip that combine CPUs with reprogrammable hardware offer both computational performance and flexibility and are thus interesting targets for autonomous driving architectures. The de-facto software architecture standard in robotics, including autonomous driving systems, is ROS 2. ReconROS is a framework for creating robotics applications that extends ROS 2 with the possibility of mapping compute-intense functions to hardware. This paper presents AutonomROS, an autonomous driving unit based on the ReconROS framework. AutonomROS serves as a blueprint for a larger robotics application developed with ReconROS and demonstrates its suitability and extendability. The application integrates the ROS 2 package Navigation 2 with custom-developed software and hardware-accelerated functions for point cloud generation, obstacle detection, and lane detection. In addition, we detail a new communication middleware for shared memory communication between software and hardware functions. We evaluate AutonomROS and show the advantage of hardware acceleration and the new communication middleware for improving turnaround times, achievable frame rates, and, most importantly, reducing CPU load.

5.A Quantitative Method to Determine What Collisions Are Reasonably Foreseeable and Preventable

Authors:Erwin de Gelder, Olaf Op den Camp

Abstract: The development of Automated Driving Systems (ADSs) has made significant progress in the last years. To enable the deployment of Automated Vehicles (AVs) equipped with such ADSs, regulations concerning the approval of these systems need to be established. In 2021, the World Forum for Harmonization of Vehicle Regulations has approved a new United Nations regulation concerning the approval of Automated Lane Keeping Systems (ALKSs). An important aspect of this regulation is that "the activated system shall not cause any collisions that are reasonably foreseeable and preventable." The phrasing of "reasonably foreseeable and preventable" might be subjected to different interpretations and, therefore, this might result in disagreements among AV developers and the authorities that are requested to approve AVs. The objective of this work is to propose a method for quantifying what is "reasonably foreseeable and preventable". The proposed method considers the Operational Design Domain (ODD) of the system and can be applied to any ODD. Having a quantitative method for determining what is reasonably foreseeable and preventable provides developers, authorities, and the users of ADSs a better understanding of the residual risks to be expected when deploying these systems in real traffic. Using our proposed method, we can estimate what collisions are reasonably foreseeable and preventable. This will help in setting requirements regarding the safety of ADSs and can lead to stronger justification for design decisions and test coverage for developing ADSs.

6.Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

Authors:Haozhe Lei, Quanyan Zhu

Abstract: In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers towards both safety and efficiency. To illustrate, when a self-driving model encounters an unfamiliar environment in real-time execution, the focus must not solely revolve around enhancing its anticipated performance; equal consideration must be given to ensuring its execution or real-time adaptation maintains a requisite level of safety. This study introduces an algorithm for online meta-reinforcement learning, employing lookahead symbolic constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning} (NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the efficiency of online adaptations with the overarching goal of ensuring long-term safety. Experimental results demonstrate NUMERLA confers the self-driving agent with the capacity for real-time adaptability, leading to safe and self-adaptive driving under non-stationary urban human-vehicle interaction scenarios.

7.A Lightweight and Transferable Design for Robust LEGO Manipulation

Authors:Ruixuan Liu, Yifan Sun, Changliu Liu

Abstract: LEGO is a well-known platform for prototyping pixelized objects. However, robotic LEGO prototyping (i.e. manipulating LEGO bricks) is challenging due to the tight connections and accuracy requirement. This paper investigates safe and efficient robotic LEGO manipulation. In particular, this paper reduces the complexity of the manipulation by hardware-software co-design. An end-of-arm tool (EOAT) is designed, which reduces the problem dimension and allows large industrial robots to easily manipulate LEGO bricks. In addition, this paper uses evolution strategy to safely optimize the robot motion for LEGO manipulation. Experiments demonstrate that the EOAT performs reliably in manipulating LEGO bricks and the learning framework can effectively and safely improve the manipulation performance to a 100\% success rate. The co-design is deployed to multiple robots (i.e. FANUC LR-mate 200id/7L and Yaskawa GP4) to demonstrate its generalizability and transferability. In the end, we show that the proposed solution enables sustainable robotic LEGO prototyping, in which the robot can repeatedly assemble and disassemble different prototypes.

8.Magnetic Navigation using Attitude-Invariant Magnetic Field Information for Loop Closure Detection

Authors:Natalia Pavlasek, Charles Champagne Cossette, David Roy-Guay, James Richard Forbes

Abstract: Indoor magnetic fields are a combination of Earth's magnetic field and disruptions induced by ferromagnetic objects, such as steel structural components in buildings. As a result of these disruptions, pervasive in indoor spaces, magnetic field data is often omitted from navigation algorithms in indoor environments. This paper leverages the spatially-varying disruptions to Earth's magnetic field to extract positional information for use in indoor navigation algorithms. The algorithm uses a rate gyro and an array of four magnetometers to estimate the robot's pose. Additionally, the magnetometer array is used to compute attitude-invariant measurements associated with the magnetic field and its gradient. These measurements are used to detect loop closure points. Experimental results indicate that the proposed approach can estimate the pose of a ground robot in an indoor environment within meter accuracy.

1.End-to-end Lidar-Driven Reinforcement Learning for Autonomous Racing

Authors:Meraj Mammadov

Abstract: Reinforcement Learning (RL) has emerged as a transformative approach in the domains of automation and robotics, offering powerful solutions to complex problems that conventional methods struggle to address. In scenarios where the problem definitions are elusive and challenging to quantify, learning-based solutions such as RL become particularly valuable. One instance of such complexity can be found in the realm of car racing, a dynamic and unpredictable environment that demands sophisticated decision-making algorithms. This study focuses on developing and training an RL agent to navigate a racing environment solely using feedforward raw lidar and velocity data in a simulated context. The agent's performance, trained in the simulation environment, is then experimentally evaluated in a real-world racing scenario. This exploration underlines the feasibility and potential benefits of RL algorithm enhancing autonomous racing performance, especially in the environments where prior map information is not available.

2.Deep Segmented DMP Networks for Learning Discontinuous Motions

Authors:Edgar Anarossi, Hirotaka Tahara, Naoto Komeno, Takamitsu Matsubara

Abstract: Discontinuous motion which is a motion composed of multiple continuous motions with sudden change in direction or velocity in between, can be seen in state-aware robotic tasks. Such robotic tasks are often coordinated with sensor information such as image. In recent years, Dynamic Movement Primitives (DMP) which is a method for generating motor behaviors suitable for robotics has garnered several deep learning based improvements to allow associations between sensor information and DMP parameters. While the implementation of deep learning framework does improve upon DMP's inability to directly associate to an input, we found that it has difficulty learning DMP parameters for complex motion which requires large number of basis functions to reconstruct. In this paper we propose a novel deep learning network architecture called Deep Segmented DMP Network (DSDNet) which generates variable-length segmented motion by utilizing the combination of multiple DMP parameters predicting network architecture, double-stage decoder network, and number of segments predictor. The proposed method is evaluated on both artificial data (object cutting & pick-and-place) and real data (object cutting) where our proposed method could achieve high generalization capability, task-achievement, and data-efficiency compared to previous method on generating discontinuous long-horizon motions.

3.Implementing BDI Continual Temporal Planning for Robotic Agents

Authors:Alex Zanetti, Devis Dal Moro, Redi Vreto, Marco Robol, Marco Roveri, Paolo Giorgini

Abstract: Making autonomous agents effective in real-life applications requires the ability to decide at run-time and a high degree of adaptability to unpredictable and uncontrollable events. Reacting to events is still a fundamental ability for an agent, but it has to be boosted up with proactive behaviors that allow the agent to explore alternatives and decide at run-time for optimal solutions. This calls for a continuous planning as part of the deliberation process that makes an agent able to reconsider plans on the base of temporal constraints and changes of the environment. Online planning literature offers several approaches used to select the next action on the base of a partial exploration of the solution space. In this paper, we propose a BDI continuous temporal planning framework, where interleave planning and execution loop is used to integrate online planning with the BDI control-loop. The framework has been implemented with the ROS2 robotic framework and planning algorithms offered by JavaFF.

4.Learning State-Space Models for Mapping Spatial Motion Patterns

Authors:Junyi Shi, Tomasz Piotr Kucner

Abstract: Mapping the surrounding environment is essential for the successful operation of autonomous robots. While extensive research has focused on mapping geometric structures and static objects, the environment is also influenced by the movement of dynamic objects. Incorporating information about spatial motion patterns can allow mobile robots to navigate and operate successfully in populated areas. In this paper, we propose a deep state-space model that learns the map representations of spatial motion patterns and how they change over time at a certain place. To evaluate our methods, we use two different datasets: one generated dataset with specific motion patterns and another with real-world pedestrian data. We test the performance of our model by evaluating its learning ability, mapping quality, and application to downstream tasks. The results demonstrate that our model can effectively learn the corresponding motion pattern, and has the potential to be applied to robotic application tasks.

5.Learning-based NLOS Detection and Uncertainty Prediction of GNSS Observations with Transformer-Enhanced LSTM Network

Authors:Haoming Zhang, Zhanxin Wang, Heike Vallery

Abstract: The global navigation satellite systems (GNSS) play a vital role in transport systems for accurate and consistent vehicle localization. However, GNSS observations can be distorted due to multipath effects and non-line-of-sight (NLOS) receptions in challenging environments such as urban canyons. In such cases, traditional methods to classify and exclude faulty GNSS observations may fail, leading to unreliable state estimation and unsafe system operations. This work proposes a Deep-Learning-based method to detect NLOS receptions and predict GNSS pseudorange errors by analyzing GNSS observations as a spatio-temporal modeling problem. Compared to previous works, we construct a transformer-like attention mechanism to enhance the long short-term memory (LSTM) networks, improving model performance and generalization. For the training and evaluation of the proposed network, we used labeled datasets from the cities of Hong Kong and Aachen. We also introduce a dataset generation process to label the GNSS observations using lidar maps. In experimental studies, we compare the proposed network with a deep-learning-based model and classical machine-learning models. Furthermore, we conduct ablation studies of our network components and integrate the NLOS detection with data out-of-distribution in a state estimator. As a result, our network presents improved precision and recall ratios compared to other models. Additionally, we show that the proposed method avoids trajectory divergence in real-world vehicle localization by classifying and excluding NLOS observations.

6.Towards a Reduced Dependency Framework for Autonomous Unified Inspect-Explore Missions

Authors:Vignesh Kottayam Viswanathan, Sumeet Gajanan Satpute, Ali-akbar Agha-mohammadi, George Nikolakopoulos

Abstract: The task of establishing and maintaining situational awareness in an unknown environment is a critical step to fulfil in a mission related to the field of rescue robotics. Predominantly, the problem of visual inspection of urban structures is dealt with view-planning being addressed by map-based approaches. In this article, we propose a novel approach towards effective use of Micro Aerial Vehicles (MAVs) for obtaining a 3-D shape of an unknown structure of objects utilizing a map-independent planning framework. The problem is undertaken via a bifurcated approach to address the task of executing a closer inspection of detected structures with a wider exploration strategy to identify and locate nearby structures, while being equipped with limited sensing capability. The proposed framework is evaluated experimentally in a controlled indoor environment in presence of a mock-up environment validating the efficacy of the proposed inspect-explore policy.

7.Powder-Bot: A Modular Autonomous Multi-Robot Workflow for Powder X-Ray Diffraction

Authors:Amy M. Lunt, Hatem Fakhruldeen, Gabriella Pizzuto, Louis Longley, Alexander White, Nicola Rankin, Rob Clowes, Ben M. Alston, Andrew I. Cooper, Samantha Y. Chong

Abstract: Powder X-ray diffraction (PXRD) is a key technique for the structural characterisation of solid-state materials, but compared with tasks such as liquid handling, its end-to-end automation is highly challenging. This is because coupling PXRD experiments with crystallisation comprises multiple solid handling steps that include sample recovery, sample preparation by grinding, sample mounting and, finally, collection of X-ray diffraction data. Each of these steps has individual technical challenges from an automation perspective, and hence no commercial instrument exists that can grow crystals, process them into a powder, mount them in a diffractometer, and collect PXRD data in an autonomous, closed-loop way. Here we present an automated robotic workflow to carry out autonomous PXRD experiments. The PXRD data collected for polymorphs of small organic compounds is comparable to that collected under the same conditions manually. Beyond accelerating PXRD experiments, this workflow involves 13 component steps and integrates three different types of robots, each from a separate supplier, illustrating the power of flexible, modular automation in complex, multitask laboratories.

1.A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems

Authors:Satoshi Yamamori, Jun Morimoto

Abstract: In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.

2.Inter-finger Small Object Manipulation with DenseTact Optical Tactile Sensor

Authors:Won Kyung Do, Bianca Aumann, Camille Chungyoun, Monroe Kennedy III

Abstract: The ability to grasp and manipulate small objects in cluttered environments remains a significant challenge. This paper introduces a novel approach that utilizes a tactile sensor-equipped gripper with eight degrees of freedom to overcome these limitations. We employ DenseTact 2.0 for the gripper, enabling precise control and improved grasp success rates, particularly for small objects ranging from 5mm to 25mm. Our integrated strategy incorporates the robot arm, gripper, and sensor to manipulate and orient small objects for subsequent classification effectively. We contribute a specialized dataset designed for classifying these objects based on tactile sensor output and a new control algorithm for in-hand orientation tasks. Our system demonstrates 88% of successful grasp and successfully classified small objects in cluttered scenarios.

3.A Customizable Conflict Resolution and Attribute-Based Access Control Framework for Multi-Robot Systems

Authors:Salma Salimi, Farhad Keramat, Tomi Westerlund, Jorge Peña Queralta

Abstract: As multi-robot systems continue to advance and become integral to various applications, managing conflicts and ensuring secure access control are critical challenges that need to be addressed. Access control is essential in multi-robot systems to ensure secure and authorized interactions among robots, protect sensitive data, and prevent unauthorized access to resources. This paper presents a novel framework for customizable conflict resolution and attribute-based access control in multi-robot systems for ROS 2 leveraging the Hyperledger Fabric blockchain. We introduce an attribute-based access control (ABAC) Fabric-ROS 2 bridge to enable secure communication and control between users and robots. By defining conflict resolution policies based on task priorities, robot capabilities, and user-defined constraints, our framework offers a flexible way to resolve conflicts. Additionally, it incorporates attribute-based access control, granting access rights based on user and robot attributes. ABAC offers a modular approach to control access compared to existing access control approaches in ROS 2, such as SROS2. Through this framework, multi-robot systems can be managed efficiently, securely, and adaptably, ensuring controlled access to resources and managing conflicts. Our experimental evaluation shows that our framework marginally improves latency and throughput over exiting Fabric and ROS 2 integration solutions. At higher network load, it is the only solution to operate reliably without a diverging transaction commitment latency. We also demonstrate how conflicts arising from simultaneous control or a robot by two users are resolved in real-time and motion distortion is effectively eliminated.

4.Graph-based SLAM-Aware Exploration with Prior Topo-Metric Information

Authors:Ruofei Bai, Hongliang Guo, Wei-Yun Yau, Lihua Xie

Abstract: Autonomous exploration requires the robot to explore an unknown environment while constructing an accurate map with the SLAM (Simultaneous Localization and Mapping) techniques. Without prior information, the exploratory performance is usually conservative due to the limited planning horizon. This paper exploits a prior topo-metric graph of the environment to benefit both the exploration efficiency and the pose graph accuracy in SLAM. Based on recent advancements in relating pose graph reliability with graph topology, we are able to formulate both objectives into a SLAM-aware path planning problem over the prior graph, which finds a fast exploration path with informative loop closures that globally stabilize the pose graph. Furthermore, we derive theoretical thresholds to speed up the greedy algorithm to the problem, which significantly prune non-optimal loop closures in iterations. The proposed planner is incorporated into a hierarchical exploration framework, with flexible features including path replanning and online prior map update that adds additional information to the prior graph. Extensive experiments indicate that our method has comparable exploration efficiency to others while consistently maintaining higher mapping accuracy in various environments. Our implementations will be open-source on GitHub.

5.Developing Social Robots with Empathetic Non-Verbal Cues Using Large Language Models

Authors:Yoon Kyung Lee, Yoonwon Jung, Gyuyi Kang, Sowon Hahn

Abstract: We propose augmenting the empathetic capacities of social robots by integrating non-verbal cues. Our primary contribution is the design and labeling of four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot. These cues are generated using a Large Language Model (LLM). We developed an LLM-based conversational system for the robot and assessed its alignment with social cues as defined by human counselors. Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures. Despite these tendencies, our approach has led to the development of a social robot capable of context-aware and more authentic interactions. Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.

6.On a Connection between Differential Games, Optimal Control, and Energy-based Models for Multi-Agent Interactions

Authors:Christopher Diehl, Tobias Klosek, Martin Krüger, Nils Murzyn, Torsten Bertram

Abstract: Game theory offers an interpretable mathematical framework for modeling multi-agent interactions. However, its applicability in real-world robotics applications is hindered by several challenges, such as unknown agents' preferences and goals. To address these challenges, we show a connection between differential games, optimal control, and energy-based models and demonstrate how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this formulation, this work introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The experiments using simulated mobile robot pedestrian interactions and real-world automated driving data provide empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones.

7.A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Authors:Spencer Teetaert University of Toronto Institute for Aerospace Studies, Wenda Zhao University of Toronto Institute for Aerospace Studies, Niu Xinyuan Team H2, Hashir Zahir Team H2, Huiyu Leong Team H2, Michel Hidalgo Team Ekumen, Gerardo Puga Team Ekumen, Tomas Lorente Team Ekumen, Nahuel Espinosa Team Ekumen, John Alejandro Duarte Carrasco Team Ekumen, Kaizheng Zhang University of Science and Technology of China, Jian Di University of Science and Technology of China, Tao Jin University of Science and Technology of China, Xiaohan Li University of Science and Technology of China, Yijia Zhou University of Science and Technology of China, Xiuhua Liang University of Science and Technology of China, Chenxu Zhang University of Science and Technology of China, Antonio Loquercio University of California Berkeley, Siqi Zhou University of Toronto Institute for Aerospace Studies Technical University of Munich, Lukas Brunke University of Toronto Institute for Aerospace Studies Technical University of Munich, Melissa Greeff University of Toronto Institute for Aerospace Studies, Wolfgang Hoenig Technical University of Berlin, Jacopo Panerati University of Toronto Institute for Aerospace Studies, Angela P. Schoellig University of Toronto Institute for Aerospace Studies Technical University of Munich

Abstract: Shared benchmark problems have historically been a fundamental driver of progress for scientific communities. In the context of academic conferences, competitions offer the opportunity to researchers with different origins, backgrounds, and levels of seniority to quantitatively compare their ideas. In robotics, a hot and challenging topic is sim2real-porting approaches that work well in simulation to real robot hardware. In our case, creating a hybrid competition with both simulation and real robot components was also dictated by the uncertainties around travel and logistics in the post-COVID-19 world. Hence, this article motivates and describes an aerial sim2real robot competition that ran during the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, from the specification of the competition task, to the details of the software infrastructure supporting simulation and real-life experiments, to the approaches of the top-placed teams and the lessons learned by participants and organizers.

8.A Novel Mapping and Navigation Framework for Robot Autonomy in Orchards

Authors:Yaoqiang Pan, Hao Cao, Kewei Hu, Hanwen Kang, Xing Wang

Abstract: Target detection is a basic task to divide the object types in the orchard point cloud global map, which is used to count the overall situation of the orchard. And provide necessary information for unmanned navigation planning of agricultural vehicles. In order to divide the fruit trees and the ground in the point cloud global map of the standardized orchard, and provide the orchard overall information for the path planning of autonomous vehicles in the natural orchard environment. A fruit tree detection method based on the Yolo-V7 network is proposed, which can effectively detect fruit tree targets from multi-sensor fused radar point cloud, reduce the 3D point cloud information of the point cloud map to 2D for the fruit tree point cloud in the Yolo-V7 network detection map, and project the prediction results into the point cloud map. Generally, the target detection network based on PointNet has the problem of low speed and large computational load. The method proposed in this paper is fast and low computational load and is suitable for deployment in mobile robots. From the experimental results, the recall rate and accuracy rate of the proposed method in orchard fruit tree detection are 0.4 and 0.696 respectively, and its weight and reasoning time are 7.4 M and 28 ms respectively. The experimental results show that this method can achieve the robustness and efficiency of real-time detection of orchard fruit trees.

9.Reinforcement learning for safety-critical control of an automated vehicle

Authors:Florian Thaler Virtual Vehicle Research GmbH, Franz Rammerstorfer Virtual Vehicle Research GmbH, Jon Ander Gomez Solver Intelligent Analytics, Raul Garcia Crespo Solver Intelligent Analytics, Leticia Pasqual Solver Intelligent Analytics, Markus Postl Virtual Vehicle Research GmbH

Abstract: We present our approach for the development, validation and deployment of a data-driven decision-making function for the automated control of a vehicle. The decisionmaking function, based on an artificial neural network is trained to steer the mobile robot SPIDER towards a predefined, static path to a target point while avoiding collisions with obstacles along the path. The training is conducted by means of proximal policy optimisation (PPO), a state of the art algorithm from the field of reinforcement learning. The resulting controller is validated using KPIs quantifying its capability to follow a given path and its reactivity on perceived obstacles along the path. The corresponding tests are carried out in the training environment. Additionally, the tests shall be performed as well in the robotics situation Gazebo and in real world scenarios. For the latter the controller is deployed on a FPGA-based development platform, the FRACTAL platform, and integrated into the SPIDER software stack.

10.Learning Whole-body Manipulation for Quadrupedal Robot

Authors:Seunghun Jeon, Moonkyu Jung, Suyoung Choi, Beomjoon Kim, Jemin Hwangbo

Abstract: We propose a learning-based system for enabling quadrupedal robots to manipulate large, heavy objects using their whole body. Our system is based on a hierarchical control strategy that uses the deep latent variable embedding which captures manipulation-relevant information from interactions, proprioception, and action history, allowing the robot to implicitly understand object properties. We evaluate our framework in both simulation and real-world scenarios. In the simulation, it achieves a success rate of 93.6 % in accurately re-positioning and re-orienting various objects within a tolerance of 0.03 m and 5 {\deg}. Real-world experiments demonstrate the successful manipulation of objects such as a 19.2 kg water-filled drum and a 15.3 kg plastic box filled with heavy objects while the robot weighs 27 kg. Unlike previous works that focus on manipulating small and light objects using prehensile manipulation, our framework illustrates the possibility of using quadrupeds for manipulating large and heavy objects that are ungraspable with the robot's entire body. Our method does not require explicit object modeling and offers significant computational efficiency compared to optimization-based methods. The video can be found at $\href{}{this \ http \ URL}$.

11.Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization

Authors:Wissam Kontar, Xinzhi Zhong, Soyoung Ahn

Abstract: This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detrimental to their safe and efficient operation. It is then critical to share knowledge across AVs that increase exposure to driving scenarios occurring in the real world. This paper explores a method to collaboratively train a driver model by sharing knowledge and borrowing strength across vehicles while retaining a personalized model tailored to the vehicle's unique conditions and properties. Our model brings a federated learning approach to collaborate between multiple vehicles while circumventing the need to share raw data between them. We showcase our method's performance in experimental simulations. Such an approach to learning finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication. Code and sample dataset are made available at the project page

12.D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles

Authors:Alberto Dionigi, Simone Felicioni, Mirko Leomanni, Gabriele Costante

Abstract: Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aerial platforms still pose important design constraints that limit their applicability. To overcome these limitations, in this paper we propose D-VAT, a novel end-to-end visual active tracking methodology based on deep reinforcement learning that is tailored to micro aerial vehicle platforms. The D-VAT agent computes the vehicle thrust and angular velocity commands needed to track the target by directly processing monocular camera measurements. We show that the proposed approach allows for precise and collision-free tracking operations, outperforming different state-of-the-art baselines on simulated environments which differ significantly from those encountered during training.

13.GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Authors:Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

Abstract: It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic manipulation with $\textbf{G}$eneralizable $\textbf{N}$eural feature $\textbf{F}$ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e.g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is .

14.Language-Conditioned Path Planning

Authors:Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Abstract: Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.

1.Learning the References of Online Model Predictive Control for Urban Self-Driving

Authors:Yubin Wang, Zengqi Peng, Hakim Ghazzai, Jun Ma

Abstract: In this work, we propose a novel learning-based online model predictive control (MPC) framework for motion synthesis of self-driving vehicles. In this framework, the decision variables are generated as instantaneous references to modulate the cost functions of online MPC, where the constraints of collision avoidance and drivable surface boundaries are latently represented in the soft form. Hence, the embodied maneuvers of the ego vehicle are empowered to adapt to complex and dynamic traffic environments, even with unmodeled uncertainties of other traffic participants. Furthermore, we implement a deep reinforcement learning (DRL) framework for policy search to cast the step actions as the decision variables, where the practical and lightweight observations are considered as the input features of the policy network. The proposed approach is implemented in the high-fidelity simulator involving compound-complex urban driving scenarios, and the results demonstrate that the proposed development manifests remarkable adaptiveness to complex and dynamic traffic environments with a success rate of 85%. Also, its advantages in terms of safety, maneuverability, and robustness are illustrated.

2.Learning to Navigate from Scratch using World Models and Curiosity: the Good, the Bad, and the Ugly

Authors:Daria de Tinguy, Sven Remmery, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt

Abstract: Learning to navigate unknown environments from scratch is a challenging problem. This work presents a system that integrates world models with curiosity-driven exploration for autonomous navigation in new environments. We evaluate performance through simulations and real-world experiments of varying scales and complexities. In simulated environments, the approach rapidly and comprehensively explores the surroundings. Real-world scenarios introduce additional challenges. Despite demonstrating promise in a small controlled environment, we acknowledge that larger and dynamic environments can pose challenges for the current system. Our analysis emphasizes the significance of developing adaptable and robust world models that can handle environmental changes to prevent repetitive exploration of the same areas.

3.Sarrus-inspired Deployable Polyhedral Mechanisms

Authors:Yuanqing Gu, Xiao Zhang, Guowu Wei, Yan Chen

Abstract: Deployable polyhedral mechanisms (DPMs) have witnessed flourishing growth in recent years because of their potential applications in robotics, space exploration, structure engineering, etc. This paper firstly presents the construction, mobility and kinematics of a family of Sarrus-inspired deployable polyhedral mechanisms. By carrying out expansion operation and implanting Sarrus linkages along the straight-line motion paths, deployable tetrahedral, cubic and dodecahedral mechanisms are identified and constructed following tetrahedral, octahedral and icosahedral symmetry, respectively. Three paired transformations with synchronized radial motion between Platonic and Archimedean polyhedrons are revealed, and their significant symmetric properties are perfectly remained in each work configuration. Subsequently, with assistant of equivalent prismatic joints, the equivalent analysis strategy for mobility of multiloop polyhedral mechanisms is proposed to significantly simplify the calculation process. This paper hence presents the construction method and equivalent analysis of the Sarrus-inspired DPMs that are not only valuable in theoretical investigation, but also have great potential in practical applications such as mechanical metamaterials, deployable architectures and space exploration.

4.Sparse Waypoint Validity Checking for Self-Entanglement-Free Tethered Path Planning

Authors:Tong Yang, Jiangpin Liu, Yue Wang, Rong Xiong

Abstract: A novel mechanism to derive self-entanglement-free (SEF) path for tethered differential-driven robots is proposed in this work. The problem is tailored to the deployment of tethered differential-driven robots in situations where an omni-directional tether re-tractor is not available. This is frequently encountered when it is impractical to concurrently equip an omni-directional tether retracting mechanism with other geometrically intricate devices, such as a manipulator, which is notably relevant in applications like disaster recovery, spatial exploration, etc. Without specific attention to the spatial relation between the shape of the tether and the pose of the mobile unit, the issue of self-entanglement arises when the robot moves, resulting in unsafe robot movements and the risk of damaging the tether. In this paper, the SEF constraint is first formulated as the boundedness of a relative angle function which characterises the angular difference between the tether stretching direction and the robot's heading direction. Then, a constrained searching-based path planning algorithm is proposed which produces a path that is sub-optimal whilst ensuring the avoidance of tether self-entanglement. Finally, the algorithmic efficiency of the proposed path planner is further enhanced by proving the conditioned sparsity of the primitive path validity checking module. The effectiveness of the proposed algorithm is assessed through case studies, comparing its performance against untethered differential-driven planners in challenging planning scenarios. A comparative analysis is further conducted between the normal node expansion module and the improved node expansion module which incorporates sparse waypoint validity checking. Real-world tests are also conducted to validate the algorithm's performance. An open-source implementation has also made available for the benefit of the robotics community.

5.High Performance Networking Layer for Simulation Applications

Authors:Amir Mohammad Zarif Shahsavan Nejad, Amir Mahdi Zarif Shahsavan Nejad, Amirali Setayeshi, Soroush Sadeghnejad

Abstract: Autonomous vehicles are one of the most popular and also fast-growing technologies in the world. As we go further, there are still a lot of challenges that are unsolved and may cause problems in the future when it comes to testing in real world. Simulations on the other hand have always had a huge impact in the fields of science, technology, physics, etc. The simulation also powers real-world Autonomous Vehicles nowadays. Therefore, We have built an Autonomous Vehicle Simulation Software - called AVIS Engine - that provides tools and features that help develop autonomous vehicles in various environments. AVIS Engine features an advanced input and output system for the vehicle and includes a traffic system and vehicle sensor system which can be communicated using the fast networking system and ROS Bridge.

6.WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model

Authors:Tianyu Wang, Yifan Li, Haitao Lin, Xiangyang Xue, Yanwei Fu

Abstract: Enabling robots to understand language instructions and react accordingly to visual perception has been a long-standing goal in the robotics research community. Achieving this goal requires cutting-edge advances in natural language processing, computer vision, and robotics engineering. Thus, this paper mainly investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system to enhance the effectiveness of the human-robot interaction. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. The system utilizes the LLM of ChatGPT to summarize the preference object of the users as a target instruction via the multi-round interactive dialogue. The target instruction is then forwarded to a visual grounding system for object pose and size estimation, following which the robot grasps the object accordingly. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task. The further experimental results on various real-world scenarios demonstrated the feasibility and efficacy of our proposed framework.

7.RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Authors:Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

Abstract: For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.

8.Predicting Energy Consumption and Traversal Time of Ground Robots for Outdoor Navigation on Multiple Types of Terrain

Authors:Matthias Eder, Gerald Steinbauer-Wagner

Abstract: The outdoor navigation capabilities of ground robots have improved significantly in recent years, opening up new potential applications in a variety of settings. Cost-based representations of the environment are frequently used in the path planning domain to obtain an optimized path based on various objectives, such as traversal time or energy consumption. However, obtaining such cost representations is still cumbersome, particularly in outdoor settings with diverse terrain types and slope angles. In this paper, we address this problem by using a data-driven approach to develop a cost representation for various outdoor terrain types that supports two optimization objectives, namely energy consumption and traversal time. We train a supervised machine learning model whose inputs consists of extracted environment data along a path and whose outputs are the predicted energy consumption and traversal time. The model is based on a ResNet neural network architecture and trained using field-recorded data. The error of the proposed method on different types of terrain is within 11\% of the ground truth data. To show that it performs and generalizes better than currently existing approaches on various types of terrain, a comparison to a baseline method is made.

9.DRL-Based Trajectory Tracking for Motion-Related Modules in Autonomous Driving

Authors:Yinda Xu, Lidong Yu

Abstract: Autonomous driving systems are always built on motion-related modules such as the planner and the controller. An accurate and robust trajectory tracking method is indispensable for these motion-related modules as a primitive routine. Current methods often make strong assumptions about the model such as the context and the dynamics, which are not robust enough to deal with the changing scenarios in a real-world system. In this paper, we propose a Deep Reinforcement Learning (DRL)-based trajectory tracking method for the motion-related modules in autonomous driving systems. The representation learning ability of DL and the exploration nature of RL bring strong robustness and improve accuracy. Meanwhile, it enhances versatility by running the trajectory tracking in a model-free and data-driven manner. Through extensive experiments, we demonstrate both the efficiency and effectiveness of our method compared to current methods.

10.EnsembleFollower: A Hybrid Car-Following Framework Based On Reinforcement Learning and Hierarchical Planning

Authors:Xu Han, Xianda Chen, Meixin Zhu, Pinlong Cai, Jianshan Zhou, Xiaowen Chu

Abstract: Car-following models have made significant contributions to our understanding of longitudinal driving behavior. However, they often exhibit limited accuracy and flexibility, as they cannot fully capture the complexity inherent in car-following processes, or may falter in unseen scenarios due to their reliance on confined driving skills present in training data. It is worth noting that each car-following model possesses its own strengths and weaknesses depending on specific driving scenarios. Therefore, we propose EnsembleFollower, a hierarchical planning framework for achieving advanced human-like car-following. The EnsembleFollower framework involves a high-level Reinforcement Learning-based agent responsible for judiciously managing multiple low-level car-following models according to the current state, either by selecting an appropriate low-level model to perform an action or by allocating different weights across all low-level components. Moreover, we propose a jerk-constrained kinematic model for more convincing car-following simulations. We evaluate the proposed method based on real-world driving data from the HighD dataset. The experimental results illustrate that EnsembleFollower yields improved accuracy of human-like behavior and achieves effectiveness in combining hybrid models, demonstrating that our proposed framework can handle diverse car-following conditions by leveraging the strengths of various low-level models.

11.Learning Vision-based Pursuit-Evasion Robot Policies

Authors:Andrea Bajcsy, Antonio Loquercio, Ashish Kumar, Jitendra Malik

Abstract: Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent intent uncertainty. In this paper, we transform this intractable problem into a supervised learning problem, where a fully-observable robot policy generates supervision for a partially-observable one. We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy. We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild. Despite all the challenges, the sensing constraints bring about creativity: the robot is pushed to gather information when uncertain, predict intent from noisy measurements, and anticipate in order to intercept. Project webpage:

1.R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics

Authors:Zexin Li, Aritra Samanta, Yufei Li, Andrea Soltoggio, Hyoseung Kim, Cong Liu

Abstract: Autonomous robotic systems, like autonomous vehicles and robotic search and rescue, require efficient on-device training for continuous adaptation of Deep Reinforcement Learning (DRL) models in dynamic environments. This research is fundamentally motivated by the need to understand and address the challenges of on-device real-time DRL, which involves balancing timing and algorithm performance under memory constraints, as exposed through our extensive empirical studies. This intricate balance requires co-optimizing two pivotal parameters of DRL training -- batch size and replay buffer size. Configuring these parameters significantly affects timing and algorithm performance, while both (unfortunately) require substantial memory allocation to achieve near-optimal performance. This paper presents R^3, a holistic solution for managing timing, memory, and algorithm performance in on-device real-time DRL training. R^3 employs (i) a deadline-driven feedback loop with dynamic batch sizing for optimizing timing, (ii) efficient memory management to reduce memory footprint and allow larger replay buffer sizes, and (iii) a runtime coordinator guided by heuristic analysis and a runtime profiler for dynamically adjusting memory resource reservations. These components collaboratively tackle the trade-offs in on-device DRL training, improving timing and algorithm performance while minimizing the risk of out-of-memory (OOM) errors. We implemented and evaluated R^3 extensively across various DRL frameworks and benchmarks on three hardware platforms commonly adopted by autonomous robotic systems. Additionally, we integrate R^3 with a popular realistic autonomous car simulator to demonstrate its real-world applicability. Evaluation results show that R^3 achieves efficacy across diverse platforms, ensuring consistent latency performance and timing predictability with minimal overhead.

2.Motion Priority Optimization Framework towards Automated and Teleoperated Robot Cooperation in Industrial Recovery Scenarios

Authors:Shunki Itadera, Yukiyasu Domae

Abstract: In this study, we present an optimization framework for efficient motion priority design between automated and teleoperated robots in an industrial recovery scenario. Although robots have recently become increasingly common in industrial sites, there are still challenges in achieving human-robot collaboration/cooperation (HRC), where human workers and robots are engaged in collaborative and cooperative tasks in a shared workspace. For example, the corresponding factory cell must be suspended for safety when an industrial robot drops an assembling part in the workspace. After that, a human worker is allowed to enter the robot workspace to address the robot recovery. This process causes non-continuous manufacturing, which leads to a productivity reduction. Recently, robotic teleoperation technology has emerged as a promising solution to enable people to perform tasks remotely and safely. This technology can be used in the recovery process in manufacturing failure scenarios. Our proposition involves the design of an appropriate priority function that aids in collision avoidance between the manufacturing and recovery robots and facilitates continuous processes with minimal production loss within an acceptable risk level. This paper presents a framework, including an HRC simulator and an optimization formulation, for finding optimal parameters of the priority function. Through quantitative and qualitative experiments, we address the proof of our novel concept and demonstrate its feasibility.

3.AIoT-Based Drum Transcription Robot using Convolutional Neural Networks

Authors:Yukun Su, Yi Yang

Abstract: With the development of information technology, robot technology has made great progress in various fields. These new technologies enable robots to be used in industry, agriculture, education and other aspects. In this paper, we propose a drum robot that can automatically complete music transcription in real-time, which is based on AIoT and fog computing technology. Specifically, this drum robot system consists of a cloud node for data storage, edge nodes for real-time computing, and data-oriented execution application nodes. In order to analyze drumming music and realize drum transcription, we further propose a light-weight convolutional neural network model to classify drums, which can be more effectively deployed in terminal devices for fast edge calculations. The experimental results show that the proposed system can achieve more competitive performance and enjoy a variety of smart applications and services.

4.GPS-aided Visual Wheel Odometry

Authors:Junlin Song, Pedro J. Sanchez-Cuevas, Antoine Richard, Miguel Olivares-Mendez

Abstract: This paper introduces a novel GPS-aided visual-wheel odometry (GPS-VWO) for ground robots. The state estimation algorithm tightly fuses visual, wheeled encoder and GPS measurements in the way of Multi-State Constraint Kalman Filter (MSCKF). To avoid accumulating calibration errors over time, the proposed algorithm calculates the extrinsic rotation parameter between the GPS global coordinate frame and the VWO reference frame online as part of the estimation process. The convergence of this extrinsic parameter is guaranteed by the observability analysis and verified by using real-world visual and wheel encoder measurements as well as simulated GPS measurements. Moreover, a novel theoretical finding is presented that the variance of unobservable state could converge to zero for specific Kalman filter system. We evaluate the proposed system extensively in large-scale urban driving scenarios. The results demonstrate that better accuracy than GPS is achieved through the fusion of GPS and VWO. The comparison between extrinsic parameter calibration and non-calibration shows significant improvement in localization accuracy thanks to the online calibration.

5.Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

Authors:Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang, Yuzhen Liu, Cheng Zhou, Rui Zhao, Jie Li, Yufeng Zhang, Rui Wang, Wanchao Chi, Xiong Li, Yonghui Zhu, Lingzhu Xiang, Xiao Teng, Zhengyou Zhang

Abstract: Summarizing knowledge from animals and human beings inspires robotic innovations. In this work, we propose a framework for driving legged robots act like real animals with lifelike agility and strategy in complex environments. Inspired by large pre-trained models witnessed with impressive performance in language and image understanding, we introduce the power of advanced deep generative models to produce motor control signals stimulating legged robots to act like real animals. Unlike conventional controllers and end-to-end RL methods that are task-specific, we propose to pre-train generative models over animal motion datasets to preserve expressive knowledge of animal behavior. The pre-trained model holds sufficient primitive-level knowledge yet is environment-agnostic. It is then reused for a successive stage of learning to align with the environments by traversing a number of challenging obstacles that are rarely considered in previous approaches, including creeping through narrow spaces, jumping over hurdles, freerunning over scattered blocks, etc. Finally, a task-specific controller is trained to solve complex downstream tasks by reusing the knowledge from previous stages. Enriching the knowledge regarding each stage does not affect the usage of other levels of knowledge. This flexible framework offers the possibility of continual knowledge accumulation at different levels. We successfully apply the trained multi-level controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles, and play in a designed challenging multi-agent Chase Tag Game, where lifelike agility and strategy emerge on the robots. The present research pushes the frontier of robot control with new insights on reusing multi-level pre-trained knowledge and solving highly complex downstream tasks in the real world.

6.In-hand manipulation planning using human motion dictionary

Authors:Ali Hammoud, Valerio Belcamino, Alessandro Carfi, Veronique Perdereau, Fulvio Mastrogiovanni

Abstract: Dexterous in-hand manipulation is a peculiar and useful human skill. This ability requires the coordination of many senses and hand motion to adhere to many constraints. These constraints vary and can be influenced by the object characteristics or the specific application. One of the key elements for a robotic platform to implement reliable inhand manipulation skills is to be able to integrate those constraints in their motion generations. These constraints can be implicitly modelled, learned through experience or human demonstrations. We propose a method based on motion primitives dictionaries to learn and reproduce in-hand manipulation skills. In particular, we focused on fingertip motions during the manipulation, and we defined an optimization process to combine motion primitives to reach specific fingertip configurations. The results of this work show that the proposed approach can generate manipulation motion coherent with the human one and that manipulation constraints are inherited even without an explicit formalization.

7.Dynamic Collaborative Path Planning for Remote Assistance of Highly-Automated Vehicles

Authors:Domagoj Majstorovic, Frank Diermeyer

Abstract: Given its increasing popularity in recent years, teleoperation technology is now recognized as a robust fallback solution for Automated Driving (AD). Remote Assistance (RA) represents an event-driven class of teleoperation with a distinct division of tasks between the Autonomous Vehicle (AV) and the remote human operator. This paper presents a novel approach for RA of AVs in urban environments. The concept draws inspiration from the potential synergy between highly-automated systems and human operators to collaboratively solve complex driving situations. Utilizing a hybrid algorithm that makes use of the Operational Design Domain (ODD) modification idea, it considers actions that go beyond the nominal operational space. Combined with the advanced cognitive reasoning of the human remote operator, the concept offers features that hold the potential to significantly improve both RA and AD user experiences.

8.Collision-Free Inverse Kinematics Through QP Optimization (iKinQP)

Authors:Julia Ashkanazy, Ariana Spalter, Joe Hays, Laura Hiatt, Roxana Leontie, C. Glen Henshaw

Abstract: Robotic manipulators are often designed with more actuated degrees-of-freedom than required to fully control an end effector's position and orientation. These "redundant" manipulators can allow infinite joint configurations that satisfy a particular task-space position and orientation, providing more possibilities for the manipulator to traverse a smooth collision-free trajectory. However, finding such a trajectory is non-trivial because the inverse kinematics for redundant manipulators cannot typically be solved analytically. Many strategies have been developed to tackle this problem, including Jacobian pseudo-inverse method, rapidly-expanding-random tree (RRT) motion planning, and quadratic programming (QP) based methods. Here, we present a flexible inverse kinematics-based QP strategy (iKinQP). Because it is independent of robot dynamics, the algorithm is relatively light-weight, and able to run in real-time in step with torque control. Collisions are defined as kinematic trees of elementary geometries, making the algorithm agnostic to the method used to determine what collisions are in the environment. Collisions are treated as hard constraints which guarantees the generation of collision-free trajectories. Trajectory smoothness is accomplished through the QP optimization. Our algorithm was evaluated for computational efficiency, smoothness, and its ability to provide trackable trajectories. It was shown that iKinQP is capable of providing smooth, collision-free trajectories at real-time rates.

9.Enhancing Robot Learning through Learned Human-Attention Feature Maps

Authors:Daniel Scheuchenstuhl, Stefan Ulmer, Felix Resch, Luigi Berducci, Radu Grosu

Abstract: Robust and efficient learning remains a challenging problem in robotics, in particular with complex visual inputs. Inspired by human attention mechanism, with which we quickly process complex visual scenes and react to changes in the environment, we think that embedding auxiliary information about focus point into robot learning would enhance efficiency and robustness of the learning process. In this paper, we propose a novel approach to model and emulate the human attention with an approximate prediction model. We then leverage this output and feed it as a structured auxiliary feature map into downstream learning tasks. We validate this idea by learning a prediction model from human-gaze recordings of manual driving in the real world. We test our approach on two learning tasks - object detection and imitation learning. Our experiments demonstrate that the inclusion of predicted human attention leads to improved robustness of the trained models to out-of-distribution samples and faster learning in low-data regime settings. Our work highlights the potential of incorporating structured auxiliary information in representation learning for robotics and opens up new avenues for research in this direction. All code and data are available online.

10.Ego-Motion Estimation and Dynamic Motion Separation from 3D Point Clouds for Accumulating Data and Improving 3D Object Detection

Authors:Patrick Palmer, Martin Krueger, Richard Altendorfer, Torsten Bertram

Abstract: New 3+1D high-resolution radar sensors are gaining importance for 3D object detection in the automotive domain due to their relative affordability and improved detection compared to classic low-resolution radar sensors. One limitation of high-resolution radar sensors, compared to lidar sensors, is the sparsity of the generated point cloud. This sparsity could be partially overcome by accumulating radar point clouds of subsequent time steps. This contribution analyzes limitations of accumulating radar point clouds on the View-of-Delft dataset. By employing different ego-motion estimation approaches, the dataset's inherent constraints, and possible solutions are analyzed. Additionally, a learning-based instance motion estimation approach is deployed to investigate the influence of dynamic motion on the accumulated point cloud for object detection. Experiments document an improved object detection performance by applying an ego-motion estimation and dynamic motion correction approach.

11.RED: A Systematic Real-Time Scheduling Approach for Robotic Environmental Dynamics

Authors:Zexin Li, Tao Ren, Xiaoxi He, Cong Liu

Abstract: Intelligent robots are designed to effectively navigate dynamic and unpredictable environments laden with moving mechanical elements and objects. Such environment-induced dynamics, including moving obstacles, can readily alter the computational demand (e.g., the creation of new tasks) and the structure of workloads (e.g., precedence constraints among tasks) during runtime, thereby adversely affecting overall system performance. This challenge is amplified when multi-task inference is expected on robots operating under stringent resource and real-time constraints. To address such a challenge, we introduce RED, a systematic real-time scheduling approach designed to support multi-task deep neural network workloads in resource-limited robotic systems. It is designed to adaptively manage the Robotic Environmental Dynamics (RED) while adhering to real-time constraints. At the core of RED lies a deadline-based scheduler that employs an intermediate deadline assignment policy, effectively managing to change workloads and asynchronous inference prompted by complex, unpredictable environments. This scheduling framework also facilitates the flexible deployment of MIMONet (multi-input multi-output neural networks), which are commonly utilized in multi-tasking robotic systems to circumvent memory bottlenecks. Building on this scheduling framework, RED recognizes and leverages a unique characteristic of MIMONet: its weight-shared architecture. To further accommodate and exploit this feature, RED devises a novel and effective workload refinement and reconstruction process. This process ensures the scheduling framework's compatibility with MIMONet and maximizes efficiency.

1.End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR Data

Authors:Jin Bok Park, Jinkyu Lee, Muhyun Back, Hyunmin Han, David T. Ma, Sang Min Won, Sung Soo Hwang, Il Yong Chun

Abstract: In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Often, publicly available driving datasets are collected with limited driving scenes, and collecting vehicle control data is only available by vehicle manufacturers. To address these challenges, this paper proposes the first self-supervised learning framework, self-supervised imitation learning (SSIL), that can learn E2E driving networks without using driving command data. To construct pseudo steering angle data, proposed SSIL predicts a pseudo target from the vehicle's poses at the current and previous time points that are estimated with light detection and ranging sensors. Our numerical experiments demonstrate that the proposed SSIL framework achieves comparable E2E driving accuracy with the supervised learning counterpart. In addition, our qualitative analyses using a conventional visual explanation tool show that trained NNs by proposed SSIL and the supervision counterpart attend similar objects in making predictions.

2.Geometric Mechanics of Simultaneous Nonslip Contact in a Planar Quadruped

Authors:Hari Krishna Hari Prasad, Kaushik Jayaram

Abstract: In this paper, we develop a geometric framework for generating non-slip quadrupedal two-beat gaits. We consider a four-bar mechanism as a surrogate model for a contact state and develop the geometric tools such as shape-change basis to aid in gait generation, local connection as the matrix-equation of motion, and stratified panels to model net locomotion in line with previous work\cite{prasad2023contactswitch}. Standard two-beat gaits in quadrupedal systems like trot divide the shape space into two equal, decoupled subspaces. The subgaits generated in each subspace space are designed independently and when combined with appropriate phasing generate a two-beat gait where the displacements add up due to the geometric nature of the system. By adding ``scaling" and ``sliding" control knobs to subgaits defined as flows over the shape-change basis, we continuously steer an arbitrary, planar quadrupedal system. This exhibits translational anisotropy when modulated using the scaling inputs. To characterize the steering induced by sliding inputs, we define an average path curvature function analytically and show that the steering gaits can be generated using a geometric nonslip contact modeling framework.

3.Data-Efficient Online Learning of Ball Placement in Robot Table Tennis

Authors:Philip Tobuschat, Hao Ma, Dieter Büchler, Bernhard Schölkopf, Michael Muehlebach

Abstract: We present an implementation of an online optimization algorithm for hitting a predefined target when returning ping-pong balls with a table tennis robot. The online algorithm optimizes over so-called interception policies, which define the manner in which the robot arm intercepts the ball. In our case, these are composed of the state of the robot arm (position and velocity) at interception time. Gradient information is provided to the optimization algorithm via the mapping from the interception policy to the landing point of the ball on the table, which is approximated with a black-box and a grey-box approach. Our algorithm is applied to a robotic arm with four degrees of freedom that is driven by pneumatic artificial muscles. As a result, the robot arm is able to return the ball onto any predefined target on the table after about 2-5 iterations. We highlight the robustness of our approach by showing rapid convergence with both the black-box and the grey-box gradients. In addition, the small number of iterations required to reach close proximity to the target also underlines the sample efficiency. A demonstration video can be found here:

4.Quantitative Data Analysis: CRASAR Small Unmanned Aerial Systems at Hurricane Ian

Authors:Thomas Manzini, Robin Murphy, David Merrick

Abstract: This paper provides a summary of the 281 sorties that were flown by the 10 different models of small unmanned aerial systems (sUAS) at Hurricane Ian, and the failures made in the field. These 281 sorties, supporting 44 missions, represents the largest use of sUAS in a disaster to date (previously Hurricane Florence with 260 sorties). The sUAS operations at Hurricane Ian differ slightly from prior operations as they included the first documented uses of drones performing interior search for victims, and the first use of a VTOL fixed wing aircraft during a large scale disaster. However, there are substantive similarities to prior drone operations. Most notably, rotorcraft continue to perform the vast majority of flights, wireless data transmission capacity continues to be a limitation, and the lack of centralized control for unmanned and manned aerial systems continues to cause operational friction. This work continues by documenting the failures, both human and technological made in the field and concludes with a discussion summarizing potential areas for further work to improve sUAS response to large scale disasters.

5.Towards Standardized Disturbance Rejection Testing of Legged Robot Locomotion with Linear Impactor: A Preliminary Study, Observations, and Implications

Authors:Bowen Weng, Guillermo A. Castillo, Yun-Seok Kang, Ayonga Hereid

Abstract: Dynamic locomotion in legged robots is close to industrial collaboration, but a lack of standardized testing obstructs commercialization. The issues are not merely political, theoretical, or algorithmic but also physical, indicating limited studies and comprehension regarding standard testing infrastructure and equipment. For decades, the approaches we have been testing legged robots were rarely standardizable with hand-pushing, foot-kicking, rope-dragging, stick-poking, and ball-swinging. This paper aims to bridge the gap by proposing the use of the linear impactor, a well-established tool in other standardized testing disciplines, to serve as an adaptive, repeatable, and fair disturbance rejection testing equipment for legged robots. A pneumatic linear impactor is also adopted for the case study involving the humanoid robot Digit. Three locomotion controllers are examined, including a commercial one, using a walking-in-place task against frontal impacts. The statistically best controller was able to withstand the impact momentum (26.376 kg$\cdot$m/s) on par with a reported average effective momentum from straight punches by Olympic boxers (26.506 kg$\cdot$m/s). Moreover, the case study highlights other anti-intuitive observations, demonstrations, and implications that, to the best of the authors' knowledge, are first-of-its-kind revealed in real-world testing of legged robots.

6.Human Comfortability Index Estimation in Industrial Human-Robot Collaboration Task

Authors:Celal Savur, Jamison Heard, Ferat Sahin

Abstract: Fluent human-robot collaboration requires a robot teammate to understand, learn, and adapt to the human's psycho-physiological state. Such collaborations require a computing system that monitors human physiological signals during human-robot collaboration (HRC) to quantitatively estimate a human's level of comfort, which we have termed in this research as comfortability index (CI) and uncomfortability index (unCI). Subjective metrics (surprise, anxiety, boredom, calmness, and comfortability) and physiological signals were collected during a human-robot collaboration experiment that varied robot behavior. The emotion circumplex model is adapted to calculate the CI from the participant's quantitative data as well as physiological data. To estimate CI/unCI from physiological signals, time features were extracted from electrocardiogram (ECG), galvanic skin response (GSR), and pupillometry signals. In this research, we successfully adapt the circumplex model to find the location (axis) of 'comfortability' and 'uncomfortability' on the circumplex model, and its location match with the closest emotions on the circumplex model. Finally, the study showed that the proposed approach can estimate human comfortability/uncomfortability from physiological signals.

7.Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera

Authors:Jun Yang, Jian Yao, Steven L. Waslander

Abstract: 6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. To mitigate this issue, we present a complete active vision framework for 6D object pose refinement and next-best-view prediction. Specifically, we first develop an optimization-based pose refinement module for the structured light camera. Our system then selects the next best camera viewpoint to collect depth measurements by minimizing the predicted uncertainty of the object pose. Compared to previous approaches, we additionally predict measurement uncertainties of future viewpoints by online rendering, which significantly improves the next-best-view prediction performance. We test our approach on the challenging real-world ROBI dataset. The results demonstrate that our pose refinement method outperforms the traditional ICP-based approach when given the same input depth data, and our next-best-view strategy can achieve high object pose accuracy with significantly fewer viewpoints than the heuristic-based policies.

8.Symmetric Models for Visual Force Policy Learning

Authors:Colin Kohler, Anuj Shrivatsav Srikanth, Eshan Arora, Robert Platt

Abstract: While it is generally acknowledged that force feedback is beneficial to robotic control, applications of policy learning to robotic manipulation typically only leverage visual feedback. Recently, symmetric neural models have been used to significantly improve the sample efficiency and performance of policy learning across a variety of robotic manipulation domains. This paper explores an application of symmetric policy learning to visual-force problems. We present Symmetric Visual Force Learning (SVFL), a novel method for robotic control which leverages visual and force feedback. We demonstrate that SVFL can significantly outperform state of the art baselines for visual force learning and report several interesting empirical findings related to the utility of learning force feedback control policies in both general manipulation tasks and scenarios with low visual acuity.

1.Design and Control of a Bio-inspired Wheeled Bipedal Robot

Authors:Haizhou Zhao, Lei Yu, Siying Qin, Yurui Jin, Yuqing Chen

Abstract: Wheeled bipedal robots have the capability to execute agile and versatile locomotion tasks in unknown terrains, with balancing being a key criteria in evaluating their dynamic performance. This paper focuses on enhancing the balancing performance of wheeled bipedal robots through innovations in both hardware and software aspects. A bio-inspired mechanical design, inspired by the human barbell squat, is proposed and implemented to achieve an efficient distribution of load onto the limb joints. This design improves knee torque joint efficiency and facilitates control over the distribution of the center of mass (CoM). Meanwhile, a customized balance model, namely the wheeled linear inverted pendulum (wLIP), is developed. The wLIP surpasses other alternatives by providing a more accurate estimation of wheeled robot dynamics while ensuring balancing stability. Experimental results demonstrate that the robot is capable of maintaining balance while manipulating pelvis states and CoM velocity; furthermore, it exhibits robustness against external disturbances and unknown terrains.

2.WSTac: Interactive Surface Perception based on Whisker-Inspired and Self-Illuminated Vision-Based Tactile Sensor

Authors:Kai Chong Lei, Kit Wa Sou, Wang Sing Chan, Jiayi Yan, Siqi Ping, Dengfeng Peng, Wenbo Ding, Xiao-Ping Zhang

Abstract: Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminating VBTS comprising a mechanoluminescence (ML) whisker elastomer, camera, and 3D printed parts. The ML whisker elastomer, inspired by the touch sensitivity of vibrissae, offers both light isolation and high ML intensity under stress, thereby removing the necessity for additional LED modules. With the incorporation of machine learning, the sensor effectively utilizes the dynamic contact variations of 25 whiskers to successfully perform tasks like speed regression, directional identification, and texture classification. Videos are available at:

3.Asch Meets HRI: Human Conformity to Robot Groups

Authors:Jasmina Bernotat, Doreen Jirak, Eduardo Benitez Sandoval, Francisco Cruz

Abstract: We present a research outline that aims at investigating group dynamics and peer pressure in the context of industrial robots. Our research plan was motivated by the fact that industrial robots became already an integral part of human-robot co-working. However, industrial robots have been sparsely integrated into research on robot credibility, group dynamics, and potential users' tendency to follow a robot's indication. Therefore, we aim to transfer the classic Asch experiment (see \cite{Asch_51}) into HRI with industrial robots. More precisely, we will test to what extent participants follow a robot's response when confronted with a group (vs. individual) industrial robot arms (vs. human) peers who give a false response. We are interested in highlighting the effects of group size, perceived robot credibility, psychological stress, and peer pressure in the context of industrial robots. With the results of this research, we hope to highlight group dynamics that might underlie HRI in industrial settings in which numerous robots already work closely together with humans in shared environments.

4.iCub Detecting Gazed Objects: A Pipeline Estimating Human Attention

Authors:Shiva Hanifi, Elisa Maiettini, Maria Lombardi, Lorenzo Natale

Abstract: This paper explores the role of eye gaze in human-robot interactions and proposes a novel system for detecting objects gazed by the human using solely visual feedback. The system leverages on face detection, human attention prediction, and online object detection, and it allows the robot to perceive and interpret human gaze accurately, paving the way for establishing joint attention with human partners. Additionally, a novel dataset collected with the humanoid robot iCub is introduced, comprising over 22,000 images from ten participants gazing at different annotated objects. This dataset serves as a benchmark for evaluating the performance of the proposed pipeline. The paper also includes an experimental analysis of the pipeline's effectiveness in a human-robot interaction setting, examining the performance of each component. Furthermore, the developed system is deployed on the humanoid robot iCub, and a supplementary video showcases its functionality. The results demonstrate the potential of the proposed approach to enhance social awareness and responsiveness in social robotics, as well as improve assistance and support in collaborative scenarios, promoting efficient human-robot collaboration. The code and the collected dataset will be released upon acceptance.

5.Small Celestial Body Exploration with CubeSat Swarms

Authors:Emmanuel Blazquez, Dario Izzo, Francesco Biscani, Roger Walker, Franco Perez-Lissi

Abstract: This work presents a large-scale simulation study investigating the deployment and operation of distributed swarms of CubeSats for interplanetary missions to small celestial bodies. Utilizing Taylor numerical integration and advanced collision detection techniques, we explore the potential of large CubeSat swarms in capturing gravity signals and reconstructing the internal mass distribution of a small celestial body while minimizing risks and Delta V budget. Our results offer insight into the applicability of this approach for future deep space exploration missions.

6.Unlocking the Performance of Proximity Sensors by Utilizing Transient Histograms

Authors:Carter Sifferman, Yeping Wang, Mohit Gupta, Michael Gleicher

Abstract: We provide methods which recover planar scene geometry by utilizing the transient histograms captured by a class of close-range time-of-flight (ToF) distance sensor. A transient histogram is a one dimensional temporal waveform which encodes the arrival time of photons incident on the ToF sensor. Typically, a sensor processes the transient histogram using a proprietary algorithm to produce distance estimates, which are commonly used in several robotics applications. Our methods utilize the transient histogram directly to enable recovery of planar geometry more accurately than is possible using only proprietary distance estimates, and consistent recovery of the albedo of the planar surface, which is not possible with proprietary distance estimates alone. This is accomplished via a differentiable rendering pipeline, which simulates the transient imaging process, allowing direct optimization of scene geometry to match observations. To validate our methods, we capture 3,800 measurements of eight planar surfaces from a wide range of viewpoints, and show that our method outperforms the proprietary-distance-estimate baseline by an order of magnitude in most scenarios. We demonstrate a simple robotics application which uses our method to sense the distance to and slope of a planar surface from a sensor mounted on the end effector of a robot arm.

7.Towards Optimal Head-to-head Autonomous Racing with Curriculum Reinforcement Learning

Authors:Dvij Kalaria, Qin Lin, John M. Dolan

Abstract: Head-to-head autonomous racing is a challenging problem, as the vehicle needs to operate at the friction or handling limits in order to achieve minimum lap times while also actively looking for strategies to overtake/stay ahead of the opponent. In this work we propose a head-to-head racing environment for reinforcement learning which accurately models vehicle dynamics. Some previous works have tried learning a policy directly in the complex vehicle dynamics environment but have failed to learn an optimal policy. In this work, we propose a curriculum learning-based framework by transitioning from a simpler vehicle model to a more complex real environment to teach the reinforcement learning agent a policy closer to the optimal policy. We also propose a control barrier function-based safe reinforcement learning algorithm to enforce the safety of the agent in a more effective way while not compromising on optimality.

8.MRNAV: Multi-Robot Aware Planning and Control Stack for Collision and Deadlock-free Navigation in Cluttered Environments

Authors:Baskın Şenbaşlar, Pilar Luiz, Wolfgang Hönig, Gaurav S. Sukhatme

Abstract: Multi-robot collision-free and deadlock-free navigation in cluttered environments with static and dynamic obstacles is a fundamental problem for many applications. We introduce MRNAV, a framework for planning and control to effectively navigate in such environments. Our design utilizes short, medium, and long horizon decision making modules with qualitatively different properties, and defines the responsibilities of them. The decision making modules complement each other and provide the effective navigation capability. MRNAV is the first hierarchical approach combining these three levels of decision making modules and explicitly defining their responsibilities. We implement our design for simulated multi-quadrotor flight. In our evaluations, we show that all three modules are required for effective navigation in diverse situations. We show the long-term executability of our approach in an eight hour long wall time (six hour long simulation time) uninterrupted simulation without collisions or deadlocks.

1.Joint Intrinsic and Extrinsic LiDAR-Camera Calibration in Targetless Environments Using Plane-Constrained Bundle Adjustment

Authors:Liang Li, Haotian Li, Xiyuan Liu, Dongjiao He, Ziliang Miao, Fanze Kong, Rundong Li, Zheng Liu, Fu Zhang

Abstract: This paper introduces a novel targetless method for joint intrinsic and extrinsic calibration of LiDAR-camera systems using plane-constrained bundle adjustment (BA). Our method leverages LiDAR point cloud measurements from planes in the scene, alongside visual points derived from those planes. The core novelty of our method lies in the integration of visual BA with the registration between visual points and LiDAR point cloud planes, which is formulated as a unified optimization problem. This formulation achieves concurrent intrinsic and extrinsic calibration, while also imparting depth constraints to the visual points to enhance the accuracy of intrinsic calibration. Experiments are conducted on both public data sequences and self-collected dataset. The results showcase that our approach not only surpasses other state-of-the-art (SOTA) methods but also maintains remarkable calibration accuracy even within challenging environments. For the benefits of the robotics community, we have open sourced our codes.

2.Potato: A Data-Oriented Programming 3D Simulator for Large-Scale Heterogeneous Swarm Robotics

Authors:Jinjie Li, Liang Han, Haoyang Yu, Zhaotian Wang, Pengzhi Yang, Ziwei Yan, Zhang Ren

Abstract: Large-scale simulation with realistic nonlinear dynamic models is crucial for algorithms development for swarm robotics. However, existing platforms are mainly developed based on Object-Oriented Programming (OOP) and either use simple kinematic models to pursue a large number of simulating nodes or implement realistic dynamic models with limited simulating nodes. In this paper, we develop a simulator based on Data-Oriented Programming (DOP) that utilizes GPU parallel computing to achieve large-scale swarm robotic simulations. Specifically, we use a multi-process approach to simulate heterogeneous agents and leverage PyTorch with GPU to simulate homogeneous agents with a large number. We test our approach using a nonlinear quadrotor model and demonstrate that this DOP approach can maintain almost the same computational speed when quadrotors are less than 5,000. We also provide two examples to present the functionality of the platform.

3.Reinforcement learning informed evolutionary search for autonomous systems testing

Authors:Dmytro Humeniuk, Foutse Khomh, Giuliano Antoniol

Abstract: Evolutionary search-based techniques are commonly used for testing autonomous robotic systems. However, these approaches often rely on computationally expensive simulator-based models for test scenario evaluation. To improve the computational efficiency of the search-based testing, we propose augmenting the evolutionary search (ES) with a reinforcement learning (RL) agent trained using surrogate rewards derived from domain knowledge. In our approach, known as RIGAA (Reinforcement learning Informed Genetic Algorithm for Autonomous systems testing), we first train an RL agent to learn useful constraints of the problem and then use it to produce a certain part of the initial population of the search algorithm. By incorporating an RL agent into the search process, we aim to guide the algorithm towards promising regions of the search space from the start, enabling more efficient exploration of the solution space. We evaluate RIGAA on two case studies: maze generation for an autonomous ant robot and road topology generation for an autonomous vehicle lane keeping assist system. In both case studies, RIGAA converges faster to fitter solutions and produces a better test suite (in terms of average test scenario fitness and diversity). RIGAA also outperforms the state-of-the-art tools for vehicle lane keeping assist system testing, such as AmbieGen and Frenetic.

4.Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Authors:Taisuke Kobayashi

Abstract: Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally underestimate the value after termination to avoid learning failures due to the unintentional overestimation. In addition, the degree of underestimation is adjusted according to the degree of stationarity at termination, thereby preventing excessive exploration due to the intentional underestimation. Simulations and real robot experiments showed that the proposed method can stably obtain the optimal policies for various tasks and reward designs.

5.TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search

Authors:Licheng Wen, Ze Fu, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi

Abstract: Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adaptability. In this paper, we introduce TrafficMCTS, an innovative framework that harnesses the synergy of groupbased Monte Carlo tree search (MCTS) and Social Value Orientation (SVO) to engender a multifaceted traffic flow replete with varying driving styles and cooperative tendencies. Anchored by a closed-loop architecture, our framework enables vehicles to dynamically adapt to their environment in real time, and ensure feasible collision-free trajectories. Through comprehensive comparisons with state-of-the-art methods, we illuminate the advantages of our approach in terms of computational efficiency, planning success rate, intent completion time, and diversity metrics. Besides, we simulate highway and roundabout scenarios to illustrate the effectiveness of the proposed framework and highlight its ability to induce diverse social behaviors within the traffic flow. Finally, we validate the scalability of TrafficMCTS by showcasing its prowess in simultaneously mass vehicles within a sprawling road network, cultivating a landscape of traffic flow that mirrors the intricacies of human behavior.

6.Actuator Trajectory Planning for UAVs with Overhead Manipulator using Reinforcement Learning

Authors:Hazim Alzorgan, Abolfazl Razi, Ata Jahangir Moshayedi

Abstract: In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called \textit{end-effector}. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator's reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator's end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92\% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes

7.Object level footprint uncertainty quantification in infrastructure based sensing

Authors:Arpan Kusari, Asma Almutairi, Mark E. Gilbert, David J. LeBlanc

Abstract: We examine the problem of estimating footprint uncertainty of objects imaged using the infrastructure based camera sensing. A closed form relationship is established between the ground coordinates and the sources of the camera errors. Using the error propagation equation, the covariance of a given ground coordinate can be measured as a function of the camera errors. The uncertainty of the footprint of the bounding box can then be given as the function of all the extreme points of the object footprint. In order to calculate the uncertainty of a ground point, the typical error sizes of the error sources are required. We present a method of estimating the typical error sizes from an experiment using a static, high-precision LiDAR as the ground truth. Finally, we present a simulated case study of uncertainty quantification from infrastructure based camera in CARLA to provide a sense of how the uncertainty changes across a left turn maneuver.

8.BridgeData V2: A Dataset for Robot Learning at Scale

Authors:Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at

1.Value of Assistance for Mobile Agents

Authors:Adi Amuzig, David Dovrat, Sarah Keren

Abstract: Mobile robotic agents often suffer from localization uncertainty which grows with time and with the agents' movement. This can hinder their ability to accomplish their task. In some settings, it may be possible to perform assistive actions that reduce uncertainty about a robot's location. For example, in a collaborative multi-robot system, a wheeled robot can request assistance from a drone that can fly to its estimated location and reveal its exact location on the map or accompany it to its intended location. Since assistance may be costly and limited, and may be requested by different members of a team, there is a need for principled ways to support the decision of which assistance to provide to an agent and when, as well as to decide which agent to help within a team. For this purpose, we propose Value of Assistance (VOA) to represent the expected cost reduction that assistance will yield at a given point of execution. We offer ways to compute VOA based on estimations of the robot's future uncertainty, modeled as a Gaussian process. We specify conditions under which our VOA measures are valid and empirically demonstrate the ability of our measures to predict the agent's average cost reduction when receiving assistance in both simulated and real-world robotic settings.

2.Multi-Modal Multi-Task (3MT) Road Segmentation

Authors:Erkan Milli, Özgür Erkent, Asım Egemen Yılmaz

Abstract: Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both the pre-processing andmodel computation costs. This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensorswithin a multi-task learning architecture.Afusion architecture is proposed in which RGB and LiDAR depth images constitute the inputs of the network. Another contribution of this study is to use IMU/GNSS (inertial measurement unit/global navigation satellite system) inertial navigation system whose data is collected synchronously and calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images. It has been demonstrated by experiments on the KITTI dataset that the proposed method offers fast and high-performance solutions. We have also shown the performance of our method on Cityscapes where raw LiDAR data is not available. The segmentation results obtained for both full and half resolution images are competitive with existing methods. Therefore, we conclude that our method is not dependent only on raw LiDAR data; rather, it can be used with different sensor modalities. The inference times obtained in all experiments are very promising for real-time experiments.

3.MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving

Authors:Tong Li, Lu Zhang, Sikang Liu, Shaojie Shen

Abstract: Generating safe and non-conservative behaviors in dense, dynamic environments remains challenging for automated vehicles due to the stochastic nature of traffic participants' behaviors and their implicit interaction with the ego vehicle. This paper presents a novel planning framework, Multipolicy And Risk-aware Contingency planning (MARC), that systematically addresses these challenges by enhancing the multipolicy-based pipelines from both behavior and motion planning aspects. Specifically, MARC realizes a critical scenario set that reflects multiple possible futures conditioned on each semantic-level ego policy. Then, the generated policy-conditioned scenarios are further formulated into a tree-structured representation with a dynamic branchpoint based on the scene-level divergence. Moreover, to generate diverse driving maneuvers, we introduce risk-aware contingency planning, a bi-level optimization algorithm that simultaneously considers multiple future scenarios and user-defined risk tolerance levels. Owing to the more unified combination of behavior and motion planning layers, our framework achieves efficient decision-making and human-like driving maneuvers. Comprehensive experimental results demonstrate superior performance to other strong baselines in various environments.

4.Identifying Reaction-Aware Driving Styles of Stochastic Model Predictive Controlled Vehicles by Inverse Reinforcement Learning

Authors:Ni Dang, Tao Shi, Zengjie Zhang, Wanxin Jin, Marion Leibold, Martin Buss

Abstract: The driving style of an Autonomous Vehicle (AV) refers to how it behaves and interacts with other AVs. In a multi-vehicle autonomous driving system, an AV capable of identifying the driving styles of its nearby AVs can reliably evaluate the risk of collisions and make more reasonable driving decisions. However, there has not been a consistent definition of driving styles for an AV in the literature, although it is considered that the driving style is encoded in the AV's trajectories and can be identified using Maximum Entropy Inverse Reinforcement Learning (ME-IRL) methods as a cost function. Nevertheless, an important indicator of the driving style, i.e., how an AV reacts to its nearby AVs, is not fully incorporated in the feature design of previous ME-IRL methods. In this paper, we describe the driving style as a cost function of a series of weighted features. We design additional novel features to capture the AV's reaction-aware characteristics. Then, we identify the driving styles from the demonstration trajectories generated by the Stochastic Model Predictive Control (SMPC) using a modified ME-IRL method with our newly proposed features. The proposed method is validated using MATLAB simulation and an off-the-shelf experiment.

5.Path-Constrained State Estimation for Rail Vehicles

Authors:Cornelius von Einem, Andrei Cramariuc, Roland Siegwart, Cesar Cadena, Florian Tschopp

Abstract: Globally rising demand for transportation by rail is pushing existing infrastructure to its capacity limits, necessitating the development of accurate, robust, and high-frequency positioning systems to ensure safe and efficient train operation. As individual sensor modalities cannot satisfy the strict requirements of robustness and safety, a combination thereof is required. We propose a path-constrained sensor fusion framework to integrate various modalities while leveraging the unique characteristics of the railway network. To reflect the constrained motion of rail vehicles along their tracks, the state is modeled in 1D along the track geometry. We further leverage the limited action space of a train by employing a novel multi-hypothesis tracking to account for multiple possible trajectories a vehicle can take through the railway network. We demonstrate the reliability and accuracy of our fusion framework on multiple tram datasets recorded in the city of Zurich, utilizing Visual-Inertial Odometry for local motion estimation and a standard GNSS for global localization. We evaluate our results using ground truth localizations recorded with a RTK-GNSS, and compare our method to standard baselines. A Root Mean Square Error of 4.78 m and a track selectivity score of up to 94.9 % have been achieved.

6.Trajectory Tracking Control of Dual-PAM Soft Actuator with Hysteresis Compensator

Authors:Junyi Shen, Tetsuro Miyazaki, Shingo Ohno, Maina Sogabe, Kenji Kawashima

Abstract: Soft robotics is an emergent and swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is not easy due to their hysteresis characteristics. In response to these challenges, we propose an adaptive control method to compensate hysteresis of a soft actuator. Employing a novel dual pneumatic artificial muscle (PAM) bending actuator, the innovative control strategy abates hysteresis effects by dynamically modulating gains within a traditional PID controller corresponding with the predicted motion of the reference trajectory. Through comparative experimental evaluation, we found that the new control method outperforms its conventional counterparts regarding tracking accuracy and response speed. Our work reveals a new direction for advancing control in soft actuators.

7.Constrained Stein Variational Trajectory Optimization

Authors:Thomas Power, Dmitry Berenson

Abstract: We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines.

8.Multi-UAV Deployment in Obstacle-Cluttered Environments with LOS Connectivity

Authors:Yuda Chen, Meng Guo

Abstract: A reliable communication network is essential for multiple UAVs operating within obstacle-cluttered environments, where limited communication due to obstructions often occurs. A common solution is to deploy intermediate UAVs to relay information via a multi-hop network, which introduces two challenges: (i) how to design the structure of multi-hop networks; and (ii) how to maintain connectivity during collaborative motion. To this end, this work first proposes an efficient constrained search method based on the minimum-edge RRT$^\star$ algorithm, to find a spanning-tree topology that requires a less number of UAVs for the deployment task. To achieve this deployment, a distributed model predictive control strategy is proposed for the online motion coordination. It explicitly incorporates not only the inter-UAV and UAV-obstacle distance constraints, but also the line-of-sight (LOS) connectivity constraint. These constraints are well-known to be nonlinear and often tackled by various approximations. In contrast, this work provides a theoretical guarantee that all agent trajectories are ensured to be collision-free with a team-wise LOS connectivity at all time. Numerous simulations are performed in 3D valley-like environments, while hardware experiments validate its dynamic adaptation when the deployment position changes online.

9.In-Hand Cube Reconfiguration: Simplified

Authors:Sumit Patidar, Adrian Sieler, Oliver Brock

Abstract: We present a simple approach to in-hand cube reconfiguration. By simplifying planning, control, and perception as much as possible, while maintaining robust and general performance, we gain insights into the inherent complexity of in-hand cube reconfiguration. We also demonstrate the effectiveness of combining GOFAI-based planning with the exploitation of environmental constraints and inherently compliant end-effectors in the context of dexterous manipulation. The proposed system outperforms a substantially more complex system for cube reconfiguration based on deep learning and accurate physical simulation, contributing arguments to the discussion about what the most promising approach to general manipulation might be. Project website:

10.A Heuristic Informative-Path-Planning Algorithm for Autonomous Mapping of Unknown Areas

Authors:Mobolaji O. Orisatoki, Mahdi Amouzadi, Arash M. Dizqah

Abstract: Informative path planning algorithms are of paramount importance in applications like disaster management to efficiently gather information through a priori unknown environments. This is, however, a complex problem that involves finding a globally optimal path that gathers the maximum amount of information (e.g., the largest map with a minimum travelling distance) while using partial and uncertain local measurements. This paper addresses this problem by proposing a novel heuristic algorithm that continuously estimates the potential mapping gain for different sub-areas across the partially created map, and then uses these estimations to locally navigate the robot. Furthermore, this paper presents a novel algorithm to calculate a benchmark solution, where the map is a priori known to the planar, to evaluate the efficacy of the developed heuristic algorithm over different test scenarios. The findings indicate that the efficiency of the proposed algorithm, measured in terms of the mapped area per unit of travelling distance, ranges from 70% to 80% of the benchmark solution in various test scenarios. In essence, the algorithm demonstrates the capability to generate paths that come close to the globally optimal path provided by the benchmark solution.

11.Electromagnets Under the Table: an Unobtrusive Magnetic Navigation System for Microsurgery

Authors:Adam Schonewille, Changyan He, Cameron Forbrigger, Nancy Wu, James Drake, Thomas Looi, Eric Diller

Abstract: Miniature magnetic tools have the potential to enable minimally invasive surgical techniques to be applied to space-restricted surgical procedures in areas such as neurosurgery. However, typical magnetic navigation systems, which create the magnetic fields to drive such tools, either cannot generate large enough fields, or surround the patient in a way that obstructs surgeon access to the patient. This paper introduces the design of a magnetic navigation system with eight electromagnets arranged completely under the operating table, to endow the system with maximal workspace accessibility, which allows the patient to lie down on the top surface of the system without any constraints. The found optimal geometric layout of the electromagnets maximizes the field strength and uniformity over a reasonable neurosurgical operating volume. The system can generate non-uniform magnetic fields up to 38 mT along the x and y axes and 47 mT along the z axis at a working distance of 120 mm away from the actuation system workbench, deep enough to deploy magnetic microsurgical tools in the brain. The forces which can be exerted on millimeter-scale magnets used in prototype neurosurgical tools are validated experimentally. Due to its large workspace, this system could be used to control milli-robots in a variety of surgical applications.

12.NimbRo wins ANA Avatar XPRIZE Immersive Telepresence Competition: Human-Centric Evaluation and Lessons Learned

Authors:Christian Lenz, Max Schwarz, Andre Rochow, Bastian Pätzold, Raphael Memmesheimer, Michael Schreiber, Sven Behnke

Abstract: Robotic avatar systems can enable immersive telepresence with locomotion, manipulation, and communication capabilities. We present such an avatar system, based on the key components of immersive 3D visualization and transparent force-feedback telemanipulation. Our avatar robot features an anthropomorphic upper body with dexterous hands. The remote human operator drives the arms and fingers through an exoskeleton-based operator station, which provides force feedback both at the wrist and for each finger. The robot torso is mounted on a holonomic base, providing omnidirectional locomotion on flat floors, controlled using a 3D rudder device. Finally, the robot features a 6D movable head with stereo cameras, which stream images to a VR display worn by the operator. Movement latency is hidden using spherical rendering. The head also carries a telepresence screen displaying an animated image of the operator's face, enabling direct interaction with remote persons. Our system won the \$10M ANA Avatar XPRIZE competition, which challenged teams to develop intuitive and immersive avatar systems that could be operated by briefly trained judges. We analyze our successful participation in the semifinals and finals and provide insight into our operator training and lessons learned. In addition, we evaluate our system in a user study that demonstrates its intuitive and easy usability.

13.Operational requirements for localization in autonomous vehicles

Authors:Arpan Kusari, Satabdi Saha

Abstract: Autonomous vehicles (AVs) need to determine their position and orientation accurately with respect to global coordinate system or local features under different scene geometries, traffic conditions and environmental conditions. \cite{reid2019localization} provides a comprehensive framework for the localization requirements for AVs. However, the framework is too restrictive whereby - (a) only a very small deviation from the lane is tolerated (one every $10^{8}$ hours), (b) all roadway types are considered same without any attention to restriction provided by the environment onto the localization and (c) the temporal nature of the location and orientation is not considered in the requirements. In this research, we present a more practical view of the localization requirement aimed at keeping the AV safe during an operation. We present the following novel contributions - (a) we propose a deviation penalty as a cumulative distribution function of the Weibull distribution which starts from the adjacent lane boundary, (b) we customize the parameters of the deviation penalty according to the current roadway type, particular lane boundary that the ego vehicle is against and roadway curvature and (c) we update the deviation penalty based on the available gap in the adjacent lane. We postulate that this formulation can provide a more robust and achievable view of the localization requirements than previous research while focusing on safety.

1.VIO-DualProNet: Visual-Inertial Odometry with Learning Based Process Noise Covariance

Authors:Dan Solodar, Itzik Klein

Abstract: Visual-inertial odometry (VIO) is a vital technique used in robotics, augmented reality, and autonomous vehicles. It combines visual and inertial measurements to accurately estimate position and orientation. Existing VIO methods assume a fixed noise covariance for the inertial uncertainty. However, accurately determining in real-time the noise variance of the inertial sensors presents a significant challenge as the uncertainty changes throughout the operation leading to suboptimal performance and reduced accuracy. To circumvent this, we propose VIO-DualProNet, a novel approach that utilizes deep learning methods to dynamically estimate the inertial noise uncertainty in real-time. By designing and training a deep neural network to predict inertial noise uncertainty using only inertial sensor measurements, and integrating it into the VINS-Mono algorithm, we demonstrate a substantial improvement in accuracy and robustness, enhancing VIO performance and potentially benefiting other VIO-based systems for precise localization and mapping across diverse conditions.

2.ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

Authors:Bilel Benjdira, Anis Koubaa, Anas M. Ali

Abstract: In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.

3.Faster Optimization in S-Graphs Exploiting Hierarchy

Authors:Hriday Bavle, Jose Luis Sanchez-Lopez, Javier Civera, Holger Voos

Abstract: 3D scene graphs hierarchically represent the environment appropriately organizing different environmental entities in various layers. Our previous work on situational graphs extends the concept of 3D scene graph to SLAM by tightly coupling the robot poses with the scene graph entities, achieving state-of-the-art results. Though, one of the limitations of S-Graphs is scalability in really large environments due to the increased graph size over time, increasing the computational complexity. To overcome this limitation in this work we present an initial research of an improved version of S-Graphs exploiting the hierarchy to reduce the graph size by marginalizing redundant robot poses and their connections to the observations of the same structural entities. Firstly, we propose the generation and optimization of room-local graphs encompassing all graph entities within a room-like structure. These room-local graphs are used to compress the S-Graphs marginalizing the redundant robot keyframes within the given room. We then perform windowed local optimization of the compressed graph at regular time-distance intervals. A global optimization of the compressed graph is performed every time a loop closure is detected. We show similar accuracy compared to the baseline while showing a 39.81% reduction in the computation time with respect to the baseline.

4.Tackling the Curse of Dimensionality in Large-scale Multi-agent LTL Task Planning via Poset Product

Authors:Zesen Liu, Meng Guo, Zhongkui Li

Abstract: Linear Temporal Logic (LTL) formulas have been used to describe complex tasks for multi-agent systems, with both spatial and temporal constraints. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing applications are mostly limited to small artificial cases. To address this issue, a new planning algorithm is proposed for task formulas specified as sc-LTL formulas. It avoids two common bottlenecks in the model-checking-based planning methods, i.e., (i) the direct translation of the complete task formula to the associated B\"uchi automaton; and (ii) the synchronized product between the B\"uchi automaton and the transition models of all agents. In particular, each conjuncted sub-formula is first converted to the associated R-posets as an abstraction of the temporal dependencies among the subtasks. Then, an efficient algorithm is proposed to compute the product of these R-posets, which retains their dependencies and resolves potential conflicts. Furthermore, the proposed approach is applied to dynamic scenes where new tasks are generated online. It is capable of deriving the first valid plan with a polynomial time and memory complexity w.r.t. the system size and the formula length. Our method can plan for task formulas with a length of more than 60 and a system with more than 35 agents, while most existing methods fail at the formula length of 20. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments.

5.Adaptive Graduated Non-Convexity for Pose Graph Optimization

Authors:Seungwon Choi, Wonseok Kang, Jiseong Chung, Jaehyun Kim, Tae-wan Kim

Abstract: We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algorithm, the method demonstrates enhanced efficiency across diverse datasets. Accompanying open-source code aims to encourage further research in this area.

6.Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

Authors:Haechan Mark Bon, Rongge Zhang, Ricardo de Azambuja, Giovanni Beltrame

Abstract: This work targets what we consider to be the foundational step for urban airborne robots, a safe landing. Our attention is directed toward what we deem the most crucial aspect of the safe landing perception stack: segmentation. We present a streamlined reactive UAV system that employs visual servoing by harnessing the capabilities of open vocabulary image segmentation. This approach can adapt to various scenarios with minimal adjustments, bypassing the necessity for extensive data accumulation for refining internal models, thanks to its open vocabulary methodology. Given the limitations imposed by local authorities, our primary focus centers on operations originating from altitudes of 100 meters. This choice is deliberate, as numerous preceding works have dealt with altitudes up to 30 meters, aligning with the capabilities of small stereo cameras. Consequently, we leave the remaining 20m to be navigated using conventional 3D path planning methods. Utilizing monocular cameras and image segmentation, our findings demonstrate the system's capability to successfully execute landing maneuvers at altitudes as low as 20 meters. However, this approach is vulnerable to intermittent and occasionally abrupt fluctuations in the segmentation between frames in a video stream. To address this challenge, we enhance the image segmentation output by introducing what we call a dynamic focus: a masking mechanism that self adjusts according to the current landing stage. This dynamic focus guides the control system to avoid regions beyond the drone's safety radius projected onto the ground, thus mitigating the problems with fluctuations. Through the implementation of this supplementary layer, our experiments have reached improvements in the landing success rate of almost tenfold when compared to global segmentation. All the source code is open source and available online (

7.Towards Autonomous Excavation Planning

Authors:Lorenzo Terenzi, Marco Hutter

Abstract: Excavation plans are crucial in construction projects, dictating the dirt disposal strategy and excavation sequence based on the final geometry and machinery available. While most construction processes rely heavily on coarse sequence planning and local execution planning driven by human expertise and intuition, fully automated planning tools are notably absent from the industry. This paper introduces a fully autonomous excavation planning system. Initially, the site is mapped, followed by user selection of the desired excavation geometry. The system then invokes a global planner to determine the sequence of poses for the excavator, ensuring complete site coverage. For each pose, a local excavation planner decides how to move the soil around the machine, and a digging planner subsequently dictates the sequence of digging trajectories to complete a patch. We showcased our system by autonomously excavating the largest pit documented so far, achieving an average digging cycle time of roughly 30 seconds, comparable to the one of a human operator.

8.A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles

Authors:Yusheng Wang, Yidong Lou, Weiwei Song, Bing Zhan, Feihuang Xia, Qigeng Duan

Abstract: Multi-modal sensor integration has become a crucial prerequisite for the real-world navigation systems. Recent studies have reported successful deployment of such system in many fields. However, it is still challenging for navigation tasks in mine scenes due to satellite signal dropouts, degraded perception, and observation degeneracy. To solve this problem, we propose a LiDAR-inertial odometry method in this paper, utilizing both Kalman filter and graph optimization. The front-end consists of multiple parallel running LiDAR-inertial odometries, where the laser points, IMU, and wheel odometer information are tightly fused in an error-state Kalman filter. Instead of the commonly used feature points, we employ surface elements for registration. The back-end construct a pose graph and jointly optimize the pose estimation results from inertial, LiDAR odometry, and global navigation satellite system (GNSS). Since the vehicle has a long operation time inside the tunnel, the largely accumulated drift may be not fully by the GNSS measurements. We hereby leverage a loop closure based re-initialization process to achieve full alignment. In addition, the system robustness is improved through handling data loss, stream consistency, and estimation error. The experimental results show that our system has a good tolerance to the long-period degeneracy with the cooperation different LiDARs and surfel registration, achieving meter-level accuracy even for tens of minutes running during GNSS dropouts.

9.Four years of multi-modal odometry and mapping on the rail vehicles

Authors:Yusheng Wang, Weiwei Song, Yi Zhang, Fei Huang, Zhiyong Tu, Ruoying Li, Shimin Zhang, Yidong Lou

Abstract: Precise, seamless, and efficient train localization as well as long-term railway environment monitoring is the essential property towards reliability, availability, maintainability, and safety (RAMS) engineering for railroad systems. Simultaneous localization and mapping (SLAM) is right at the core of solving the two problems concurrently. In this end, we propose a high-performance and versatile multi-modal framework in this paper, targeted for the odometry and mapping task for various rail vehicles. Our system is built atop an inertial-centric state estimator that tightly couples light detection and ranging (LiDAR), visual, optionally satellite navigation and map-based localization information with the convenience and extendibility of loosely coupled methods. The inertial sensors IMU and wheel encoder are treated as the primary sensor, which achieves the observations from subsystems to constrain the accelerometer and gyroscope biases. Compared to point-only LiDAR-inertial methods, our approach leverages more geometry information by introducing both track plane and electric power pillars into state estimation. The Visual-inertial subsystem also utilizes the environmental structure information by employing both lines and points. Besides, the method is capable of handling sensor failures by automatic reconfiguration bypassing failure modules. Our proposed method has been extensively tested in the long-during railway environments over four years, including general-speed, high-speed and metro, both passenger and freight traffic are investigated. Further, we aim to share, in an open way, the experience, problems, and successes of our group with the robotics community so that those that work in such environments can avoid these errors. In this view, we open source some of the datasets to benefit the research community.

10.Vision-Based Intelligent Robot Grasping Using Sparse Neural Network

Authors:Priya Shukla, Vandana Kushwaha, G C Nandi

Abstract: In the modern era of Deep Learning, network parameters play a vital role in models efficiency but it has its own limitations like extensive computations and memory requirements, which may not be suitable for real time intelligent robot grasping tasks. Current research focuses on how the model efficiency can be maintained by introducing sparsity but without compromising accuracy of the model in the robot grasping domain. More specifically, in this research two light-weighted neural networks have been introduced, namely Sparse-GRConvNet and Sparse-GINNet, which leverage sparsity in the robotic grasping domain for grasp pose generation by integrating the Edge-PopUp algorithm. This algorithm facilitates the identification of the top K% of edges by considering their respective score values. Both the Sparse-GRConvNet and Sparse-GINNet models are designed to generate high-quality grasp poses in real-time at every pixel location, enabling robots to effectively manipulate unfamiliar objects. We extensively trained our models using two benchmark datasets: Cornell Grasping Dataset (CGD) and Jacquard Grasping Dataset (JGD). Both Sparse-GRConvNet and Sparse-GINNet models outperform the current state-of-the-art methods in terms of performance, achieving an impressive accuracy of 97.75% with only 10% of the weight of GR-ConvNet and 50% of the weight of GI-NNet, respectively, on CGD. Additionally, Sparse-GRConvNet achieve an accuracy of 85.77% with 30% of the weight of GR-ConvNet and Sparse-GINNet achieve an accuracy of 81.11% with 10% of the weight of GI-NNet on JGD. To validate the performance of our proposed models, we conducted extensive experiments using the Anukul (Baxter) hardware cobot.

1.Communicating Robot's Intentions while Assisting Users via Augmented Reality

Authors:Chao Wang, Theodoros Stouraitis, Anna Belardinelli, Stephan Hasler, Michael Gienger

Abstract: This paper explores the challenges faced by assistive robots in effectively cooperating with humans, requiring them to anticipate human behavior, predict their actions' impact, and generate understandable robot actions. The study focuses on a use-case involving a user with limited mobility needing assistance with pouring a beverage, where tasks like unscrewing a cap or reaching for objects demand coordinated support from the robot. Yet, anticipating the robot's intentions can be challenging for the user, which can hinder effective collaboration. To address this issue, we propose an innovative solution that utilizes Augmented Reality (AR) to communicate the robot's intentions and expected movements to the user, fostering a seamless and intuitive interaction.

2.Doppler-aware Odometry from FMCW Scanning Radar

Authors:Fraser Rennie, David Williams, Paul Newman, Daniele De Martini

Abstract: This work explores Doppler information from a millimetre-Wave (mm-W) Frequency-Modulated Continuous-Wave (FMCW) scanning radar to make odometry estimation more robust and accurate. Firstly, doppler information is added to the scan masking process to enhance correlative scan matching. Secondly, we train a Neural Network (NN) for regressing forward velocity directly from a single radar scan; we fuse this estimate with the correlative scan matching estimate and show improved robustness to bad estimates caused by challenging environment geometries, e.g. narrow tunnels. We test our method with a novel custom dataset which is released with this work at

3.Reducing Object Detection Uncertainty from RGB and Thermal Data for UAV Outdoor Surveillance

Authors:Juan Sandino, Peter A. Caccetta, Conrad Sanderson, Frederic Maire, Felipe Gonzalez

Abstract: Recent advances in Unmanned Aerial Vehicles (UAVs) have resulted in their quick adoption for wide a range of civilian applications, including precision agriculture, biosecurity, disaster monitoring and surveillance. UAVs offer low-cost platforms with flexible hardware configurations, as well as an increasing number of autonomous capabilities, including take-off, landing, object tracking and obstacle avoidance. However, little attention has been paid to how UAVs deal with object detection uncertainties caused by false readings from vision-based detectors, data noise, vibrations, and occlusion. In most situations, the relevance and understanding of these detections are delegated to human operators, as many UAVs have limited cognition power to interact autonomously with the environment. This paper presents a framework for autonomous navigation under uncertainty in outdoor scenarios for small UAVs using a probabilistic-based motion planner. The framework is evaluated with real flight tests using a sub 2 kg quadrotor UAV and illustrated in victim finding Search and Rescue (SAR) case study in a forest/bushland. The navigation problem is modelled using a Partially Observable Markov Decision Process (POMDP), and solved in real time onboard the small UAV using Augmented Belief Trees (ABT) and the TAPIR toolkit. Results from experiments using colour and thermal imagery show that the proposed motion planner provides accurate victim localisation coordinates, as the UAV has the flexibility to interact with the environment and obtain clearer visualisations of any potential victims compared to the baseline motion planner. Incorporating this system allows optimised UAV surveillance operations by diminishing false positive readings from vision-based object detectors.

4.Dexterous Soft Hands Linearize Feedback-Control for In-Hand Manipulation

Authors:Adrian Sieler, Oliver Brock

Abstract: This paper presents a feedback-control framework for in-hand manipulation (IHM) with dexterous soft hands that enables the acquisition of manipulation skills in the real-world within minutes. We choose the deformation state of the soft hand as the control variable. To control for a desired deformation state, we use coarsely approximated Jacobians of the actuation-deformation dynamics. These Jacobian are obtained via exploratory actions. This is enabled by the self-stabilizing properties of compliant hands, which allow us to use linear feedback control in the presence of complex contact dynamics. To evaluate the effectiveness of our approach, we show the generalization capabilities for a learned manipulation skill to variations in object size by 100 %, 360 degree changes in palm inclination and to disabling up to 50 % of the involved actuators. In addition, complex manipulations can be obtained by sequencing such feedback-skills.

5.Toward Extending Concentric Tube Robot Kinematics for Large Clearance and Impulse Curvature

Authors:Zhouyu Zhang, Jia Shen, Junhyoung Ha, Yue Chen

Abstract: Concentric Tube Robots (CTRs) have been proposed to operate within the unstructured environment for minimally invasive surgeries. In this letter, we consider the operation scenario where the tubes travel inside the channels with a large clearance or large curvature, such as aortas or industrial pipes. Accurate kinematic modeling of CTRs is required for the development of advanced control and sensing algorithms. To this end, we extended the conventional CTR kinematics model to a more general case with large tube-to-tube clearance and large centerline curvature. Numerical simulations and experimental validations are conducted to compare our model with respect to the conventional CTR kinematic model. In the physical experiments, our proposed model achieved a tip position error of 1.53 mm in the 2D planer case and 4.36 mm in 3D case, outperforming the state-of-the-art model by 71% and 66%, respectively.

6.Structured World Models from Human Videos

Authors:Russell Mendonca, Shikhar Bahl, Deepak Pathak

Abstract: We tackle the problem of learning complex, general behaviors directly in the real world. We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories from many different settings. Inspired by the success of learning from large-scale datasets in the fields of computer vision and natural language, our belief is that in order to efficiently learn, a robot must be able to leverage internet-scale, human video data. Humans interact with the world in many interesting ways, which can allow a robot to not only build an understanding of useful actions and affordances but also how these actions affect the world for manipulation. Our approach builds a structured, human-centric action space grounded in visual affordances learned from human videos. Further, we train a world model on human videos and fine-tune on a small amount of robot interaction data without any task supervision. We show that this approach of affordance-space world models enables different robots to learn various manipulation skills in complex settings, in under 30 minutes of interaction. Videos can be found at

1.Distributed Robust Learning-Based Backstepping Control Aided with Neurodynamics for Consensus Formation Tracking of Underwater Vessels

Authors:Tao Yan, Zhe Xu, Simon X. Yang

Abstract: This paper addresses distributed robust learning-based control for consensus formation tracking of multiple underwater vessels, in which the system parameters of the marine vessels are assumed to be entirely unknown and subject to the modeling mismatch, oceanic disturbances, and noises. Towards this end, graph theory is used to allow us to synthesize the distributed controller with a stability guarantee. Due to the fact that the parameter uncertainties only arise in the vessels' dynamic model, the backstepping control technique is then employed. Subsequently, to overcome the difficulties in handling time-varying and unknown systems, an online learning procedure is developed in the proposed distributed formation control protocol. Moreover, modeling errors, environmental disturbances, and measurement noises are considered and tackled by introducing a neurodynamics model in the controller design to obtain a robust solution. Then, the stability analysis of the overall closed-loop system under the proposed scheme is provided to ensure the robust adaptive performance at the theoretical level. Finally, extensive simulation experiments are conducted to further verify the efficacy of the presented distributed control protocol.

2.Multi-Level Compositional Reasoning for Interactive Instruction Following

Authors:Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi

Abstract: Robotic agents performing domestic chores by natural language directives are required to master the complex job of navigating environment and interacting with objects in the environments. The tasks given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction. We call it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we learn a three-level action policy. At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies. Finally, at the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy. Our approach not only generates human interpretable subgoals but also achieves 2.03% absolute gain to comparable state of the arts in the efficiency metric (PLWSR in unseen set) without using rule-based planning or a semantic spatial memory.

3.Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

Authors:Jiyuan Shi, Chenjia Bai, Haoran He, Lei Han, Dong Wang, Bin Zhao, Xiu Li, Xuelong Li

Abstract: The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at

4.Integrating Expert Guidance for Efficient Learning of Safe Overtaking in Autonomous Driving Using Deep Reinforcement Learning

Authors:Jinxiong Lu, Gokhan Alcan, Ville Kyrki

Abstract: Overtaking on two-lane roads is a great challenge for autonomous vehicles, as oncoming traffic appearing on the opposite lane may require the vehicle to change its decision and abort the overtaking. Deep reinforcement learning (DRL) has shown promise for difficult decision problems such as this, but it requires massive number of data, especially if the action space is continuous. This paper proposes to incorporate guidance from an expert system into DRL to increase its sample efficiency in the autonomous overtaking setting. The guidance system developed in this study is composed of constrained iterative LQR and PID controllers. The novelty lies in the incorporation of a fading guidance function, which gradually decreases the effect of the expert system, allowing the agent to initially learn an appropriate action swiftly and then improve beyond the performance of the expert system. This approach thus combines the strengths of traditional control engineering with the flexibility of learning systems, expanding the capabilities of the autonomous system. The proposed methodology for autonomous vehicle overtaking does not depend on a particular DRL algorithm and three state-of-the-art algorithms are used as baselines for evaluation. Simulation results show that incorporating expert system guidance improves state-of-the-art DRL algorithms greatly in both sample efficiency and driving safety.

5.Pose-Following with Dual Quaternions

Authors:Jon Arrizabalaga, Markus Ryll

Abstract: This work focuses on pose-following, a variant of path-following in which the goal is to steer the system's position and attitude along a path with a moving frame attached to it. Full body motion control, while accounting for the additional freedom to self-regulate the progress along the path, is an appealing trade-off. Towards this end, we extend the well-established dual quaternion-based pose-tracking method into a pose-following control law. Specifically, we derive the equations of motion for the full pose error between the geometric reference and the rigid body in the form of a dual quaternion and dual twist. Subsequently, we formulate an almost globally asymptotically stable control law. The global attractivity of the presented approach is validated in a spatial example, while its benefits over pose-tracking are showcased through a planar case-study.

6.3D Model-free Visual localization System from Essential Matrix under Local Planar Motion

Authors:Yanmei Jiao, Binxin Zhang, Peng Jiang, Rong Xiong, Yue Wang

Abstract: Visual localization plays a critical role in the functionality of low-cost autonomous mobile robots. Current state-of-the-art approaches to accurate visual localization are 3D scene-specific, requiring additional computational and storage resources to construct a 3D scene model when facing a new environment. An alternative approach of directly using a database of 2D images for visual localization offers more flexibility. However, such methods currently suffer from limited localization accuracy. In this paper, we propose a robust and accurate multiple checking-based 3D model-free visual localization system that addresses the aforementioned issues. The core idea is to model the local planar motion characteristic of general ground-moving robots into both essential matrix estimation and triangulation stages to obtain two minimal solutions. By embedding the proposed minimal solutions into the multiple checking scheme, the proposed 3D model-free visual localization framework demonstrates high accuracy and robustness in both simulation and real-world experiments.

7.Towards Human-Robot Collaboration with Parallel Robots by Kinetostatic Analysis, Impedance Control and Contact Detection

Authors:Aran Mohammad, Moritz Schappler, Tobias Ortmaier

Abstract: Parallel robots provide the potential to be leveraged for human-robot collaboration (HRC) due to low collision energies even at high speeds resulting from their reduced moving masses. However, the risk of unintended contact with the leg chains increases compared to the structure of serial robots. As a first step towards HRC, contact cases on the whole parallel robot structure are investigated and a disturbance observer based on generalized momenta and measurements of motor current is applied. In addition, a Kalman filter and a second-order sliding-mode observer based on generalized momenta are compared in terms of error and detection time. Gearless direct drives with low friction improve external force estimation and enable low impedance. The experimental validation is performed with two force-torque sensors and a kinetostatic model. This allows a new identification method of the motor torque constant of an assembled parallel robot to estimate external forces from the motor current and via a dynamics model. A Cartesian impedance control scheme for compliant robot-environmental dynamics with stiffness from 0.1-2N/mm and the force observation for low forces over the entire structure are validated. The observers are used for collisions and clamping at velocities of 0.4-0.9m/s for detection within 9-58ms and a reaction in the form of a zero-g mode.

8.Collision Isolation and Identification Using Proprioceptive Sensing for Parallel Robots to Enable Human-Robot Collaboration

Authors:Aran Mohammad, Moritz Schappler, Tobias Ortmaier

Abstract: Parallel robots (PRs) allow for higher speeds in human-robot collaboration due to their lower moving masses but are more prone to unintended contact. For a safe reaction, knowledge of the location and force of a collision is useful. A novel algorithm for collision isolation and identification with proprioceptive information for a real PR is the scope of this work. To classify the collided body, the effects of contact forces at the links and platform of the PR are analyzed using a kinetostatic projection. This insight enables the derivation of features from the line of action of the estimated external force. The significance of these features is confirmed in experiments for various load cases. A feedforward neural network (FNN) classifies the collided body based on these physically modeled features. Generalization with the FNN to 300k load cases on the whole robot structure in other joint angle configurations is successfully performed with a collision-body classification accuracy of 84% in the experiments. Platform collisions are isolated and identified with an explicit solution, while a particle filter estimates the location and force of a contact on a kinematic chain. Updating the particle filter with estimated external joint torques leads to an isolation error of less than 3cm and an identification error of 4N in a real-world experiment.

9.Safe Collision and Clamping Reaction for Parallel Robots During Human-Robot Collaboration

Authors:Aran Mohammad, Moritz Schappler, Tim-Lukas Habich, Tobias Ortmaier

Abstract: Parallel robots (PRs) offer the potential for safe human-robot collaboration because of their low moving masses. Due to the in-parallel kinematic chains, the risk of contact in the form of collisions and clamping at a chain increases. Ensuring safety is investigated in this work through various contact reactions on a real planar PR. External forces are estimated based on proprioceptive information and a dynamics model, which allows contact detection. Retraction along the direction of the estimated line of action provides an instantaneous response to limit the occurring contact forces within the experiment to 70N at a maximum velocity 0.4m/s. A reduction in the stiffness of a Cartesian impedance control is investigated as a further strategy. For clamping, a feedforward neural network (FNN) is trained and tested in different joint angle configurations to classify whether a collision or clamping occurs with an accuracy of 80%. A second FNN classifies the clamping kinematic chain to enable a subsequent kinematic projection of the clamping joint angle onto the rotational platform coordinates. In this way, a structure opening is performed in addition to the softer retraction movement. The reaction strategies are compared in real-world experiments at different velocities and controller stiffnesses to demonstrate their effectiveness. The results show that in all collision and clamping experiments the PR terminates the contact in less than 130ms.

10.Quantifying Uncertainties of Contact Classifications in a Human-Robot Collaboration with Parallel Robots

Authors:Aran Mohammad, Hendrik Muscheid, Moritz Schappler, Thomas Seel

Abstract: In human-robot collaboration, unintentional physical contacts occur in the form of collisions and clamping, which must be detected and classified separately for a reaction. If certain collision or clamping situations are misclassified, reactions might occur that make the true contact case more dangerous. This work analyzes data-driven modeling based on physically modeled features like estimated external forces for clamping and collision classification with a real parallel robot. The prediction reliability of a feedforward neural network is investigated. Quantification of the classification uncertainty enables the distinction between safe versus unreliable classifications and optimal reactions like a retraction movement for collisions, structure opening for the clamping joint, and a fallback reaction in the form of a zero-g mode. This hypothesis is tested with experimental data of clamping and collision cases by analyzing dangerous misclassifications and then reducing them by the proposed uncertainty quantification. Finally, it is investigated how the approach of this work influences correctly classified clamping and collision scenarios.

11.Towards a Modular Architecture for Science Factories

Authors:Rafael Vescovi, Tobias Ginsburg, Kyle Hippe, Doga Ozgulbas, Casey Stone, Abraham Stroka, Rory Butler, Ben Blaiszik, Tom Brettin, Kyle Chard, Mark Hereld, Arvind Ramanathan, Rick Stevens, Aikaterini Vriza, Jie Xu, Qingteng Zhang, Ian Foster

Abstract: Advances in robotic automation, high-performance computing (HPC), and artificial intelligence (AI) encourage us to conceive of science factories: large, general-purpose computation- and AI-enabled self-driving laboratories (SDLs) with the generality and scale needed both to tackle large discovery problems and to support thousands of scientists. Science factories require modular hardware and software that can be replicated for scale and (re)configured to support many applications. To this end, we propose a prototype modular science factory architecture in which reconfigurable modules encapsulating scientific instruments are linked with manipulators to form workcells, that can themselves be combined to form larger assemblages, and linked with distributed computing for simulation, AI model training and inference, and related tasks. Workflows that perform sets of actions on modules can be specified, and various applications, comprising workflows plus associated computational and data manipulation steps, can be run concurrently. We report on our experiences prototyping this architecture and applying it in experiments involving 15 different robotic apparatus, five applications (one in education, two in biology, two in materials), and a variety of workflows, across four laboratories. We describe the reuse of modules, workcells, and workflows in different applications, the migration of applications between workcells, and the use of digital twins, and suggest directions for future work aimed at yet more generality and scalability. Code and data are available at and in the Supplementary Information

12.DoCRL: Double Critic Deep Reinforcement Learning for Mapless Navigation of a Hybrid Aerial Underwater Vehicle with Medium Transition

Authors:Ricardo B. Grando, Junior C. de Jesus, Victor A. Kich, Alisson H. Kolling, Rodrigo S. Guerra, Paulo L. J. Drews-Jr

Abstract: Deep Reinforcement Learning (Deep-RL) techniques for motion control have been continuously used to deal with decision-making problems for a wide variety of robots. Previous works showed that Deep-RL can be applied to perform mapless navigation, including the medium transition of Hybrid Unmanned Aerial Underwater Vehicles (HUAUVs). These are robots that can operate in both air and water media, with future potential for rescue tasks in robotics. This paper presents new approaches based on the state-of-the-art Double Critic Actor-Critic algorithms to address the navigation and medium transition problems for a HUAUV. We show that double-critic Deep-RL with Recurrent Neural Networks using range data and relative localization solely improves the navigation performance of HUAUVs. Our DoCRL approaches achieved better navigation and transitioning capability, outperforming previous approaches.

13.High Aspect Ratio Multi-stage Ducted Electroaerodynamic Thrusters for Micro Air Vehicle Propulsion

Authors:C. Luke Nelson, Daniel S. Drew

Abstract: Electroaerodynamic propulsion, where force is produced through collisions between electrostatically accelerated ions and neutral air molecules, is an attractive alternative to propeller- and flapping wing-based methods for micro air vehicle (MAV) flight due to its silent and solid-state nature. One major barrier to adoption is its limited thrust efficiency at useful disk loading levels. Ducted actuators comprising multiple serially-integrated acceleration stages are a potential solution, allowing individual stages to operate at higher efficiency while maintaining a useful total thrust, and potentially improving efficiency through various aerodynamic and fluid dynamic mechanisms. In this work, we investigate the effects of duct and emitter electrode geometries on actuator performance, then show how a combination of increasing cross-sectional aspect ratio and serial integration of multiple stages can be used to produce overall thrust densities comparable to commercial propulsors. An optimized five-stage device attains a thrust density of about 18 N/m$^2$ at a thrust efficiency of about 2 mN/W, among the highest values ever measured at this scale. We further show how this type of thruster can be integrated under the wings of a MAV-scale fixed wing platform, pointing towards future use as a distributed propulsion system.

1.Nowhere to Go: Benchmarking Multi-robot Collaboration in Target Trapping Environment

Authors:Hao Zhang, Jiaming Chen, Jiyu Cheng, Yibin Li, Simon X. Yang, Wei Zhang

Abstract: Collaboration is one of the most important factors in multi-robot systems. Considering certain real-world applications and to further promote its development, we propose a new benchmark to evaluate multi-robot collaboration in Target Trapping Environment (T2E). In T2E, two kinds of robots (called captor robot and target robot) share the same space. The captors aim to catch the target collaboratively, while the target will try to escape from the trap. Both the trapping and escaping process can use the environment layout to help achieve the corresponding objective, which requires high collaboration between robots and the utilization of the environment. For the benchmark, we present and evaluate multiple learning-based baselines in T2E, and provide insights into regimes of multi-robot collaboration. We also make our benchmark publicly available and encourage researchers from related robotics disciplines to propose, evaluate, and compare their solutions in this benchmark. Our project is released at

2.Quantifying the biomimicry gap in biohybrid systems

Authors:Vaios Papaspyros, Guy Theraulaz, Clément Sire, Francesco Mondada

Abstract: Biohybrid systems in which robotic lures interact with animals have become compelling tools for probing and identifying the mechanisms underlying collective animal behavior. One key challenge lies in the transfer of social interaction models from simulations to reality, using robotics to validate the modeling hypotheses. This challenge arises in bridging what we term the "biomimicry gap", which is caused by imperfect robotic replicas, communication cues and physics constrains not incorporated in the simulations that may elicit unrealistic behavioral responses in animals. In this work, we used a biomimetic lure of a rummy-nose tetra fish (Hemigrammus rhodostomus) and a neural network (NN) model for generating biomimetic social interactions. Through experiments with a biohybrid pair comprising a fish and the robotic lure, a pair of real fish, and simulations of pairs of fish, we demonstrate that our biohybrid system generates high-fidelity social interactions mirroring those of genuine fish pairs. Our analyses highlight that: 1) the lure and NN maintain minimal deviation in real-world interactions compared to simulations and fish-only experiments, 2) our NN controls the robot efficiently in real-time, and 3) a comprehensive validation is crucial to bridge the biomimicry gap, ensuring realistic biohybrid systems.

3.A Mathematical Characterization of Minimally Sufficient Robot Brains

Authors:Basak Sakcak, Kalle G. Timperi, Vadim Weinstein, Steven M. LaValle

Abstract: This paper addresses the lower limits of encoding and processing the information acquired through interactions between an internal system (robot algorithms or software) and an external system (robot body and its environment) in terms of action and observation histories. Both are modeled as transition systems. We want to know the weakest internal system that is sufficient for achieving passive (filtering) and active (planning) tasks. We introduce the notion of an information transition system for the internal system which is a transition system over a space of information states that reflect a robot's or other observer's perspective based on limited sensing, memory, computation, and actuation. An information transition system is viewed as a filter and a policy or plan is viewed as a function that labels the states of this information transition system. Regardless of whether internal systems are obtained by learning algorithms, planning algorithms, or human insight, we want to know the limits of feasibility for given robot hardware and tasks. We establish, in a general setting, that minimal information transition systems exist up to reasonable equivalence assumptions, and are unique under some general conditions. We then apply the theory to generate new insights into several problems, including optimal sensor fusion/filtering, solving basic planning tasks, and finding minimal representations for modeling a system given input-output relations.

4.Efficient collision avoidance for autonomous vehicles in polygonal domains

Authors:Jiayu Fan, Nikolce Murgovski, Jun Liang

Abstract: This research focuses on trajectory planning problems for autonomous vehicles utilizing numerical optimal control techniques. The study reformulates the constrained optimization problem into a nonlinear programming problem, incorporating explicit collision avoidance constraints. We present three novel, exact formulations to describe collision constraints. The first formulation is derived from a proposition concerning the separation of a point and a convex set. We prove the separating proposition through De Morgan's laws. Then, leveraging the hyperplane separation theorem we propose two efficient reformulations. Compared with the existing dual formulations and the first formulation, they significantly reduce the number of auxiliary variables to be optimized and inequality constraints within the nonlinear programming problem. Finally, the efficacy of the proposed formulations is demonstrated in the context of typical autonomous parking scenarios compared with state of the art. For generality, we design three initial guesses to assess the computational effort required for convergence to solutions when using the different collision formulations. The results illustrate that the scheme employing De Morgan's laws performs equally well with those utilizing dual formulations, while the other two schemes based on hyperplane separation theorem exhibit the added benefit of requiring lower computational resources.

5.Recognizing Intent in Collaborative Manipulation

Authors:Zhanibek Rysbek, Ki Hwan Oh, Milos Zefran

Abstract: Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to match the performance of a human partner it needs to be able to take initiative and lead when appropriate. To achieve such human-like performance, the robot needs to have the ability to (1) determine the intent of the human, (2) clearly express its own intent, and (3) choose its actions so that the dyad reaches consensus. This work proposes a framework for recognizing human intent in collaborative manipulation tasks using force exchanges. Grounded in a dataset collected during a human study, we introduce a set of features that can be computed from the measured signals and report the results of a classifier trained on our collected human-human interaction data. Two metrics are used to evaluate the intent recognizer: overall accuracy and the ability to correctly identify transitions. The proposed recognizer shows robustness against the variations in the partner's actions and the confounding effects due to the variability in grasp forces and dynamic effects of walking. The results demonstrate that the proposed recognizer is well-suited for implementation in a physical interaction control scheme.

6.Versatile Multi-Contact Planning and Control for Legged Loco-Manipulation

Authors:Jean-Pierre Sleiman, Farbod Farshidian, Marco Hutter

Abstract: Loco-manipulation planning skills are pivotal for expanding the utility of robots in everyday environments. These skills can be assessed based on a system's ability to coordinate complex holistic movements and multiple contact interactions when solving different tasks. However, existing approaches have been merely able to shape such behaviors with hand-crafted state machines, densely engineered rewards, or pre-recorded expert demonstrations. Here, we propose a minimally-guided framework that automatically discovers whole-body trajectories jointly with contact schedules for solving general loco-manipulation tasks in pre-modeled environments. The key insight is that multi-modal problems of this nature can be formulated and treated within the context of integrated Task and Motion Planning (TAMP). An effective bilevel search strategy is achieved by incorporating domain-specific rules and adequately combining the strengths of different planning techniques: trajectory optimization and informed graph search coupled with sampling-based planning. We showcase emergent behaviors for a quadrupedal mobile manipulator exploiting both prehensile and non-prehensile interactions to perform real-world tasks such as opening/closing heavy dishwashers and traversing spring-loaded doors. These behaviors are also deployed on the real system using a two-layer whole-body tracking controller.

1.Optimal Kinematic Design of a Robotic Lizard using Four-Bar and Five-Bar Mechanisms

Authors:Rajashekhar V S, Debasish Ghose, Arockia Selvakumar Arockia Doss

Abstract: Designing a mechanism to mimic the motion of a common house gecko is the objective of this work. The body of the robot is designed using four five-bar mechanisms (2-RRRRR and 2-RRPRR) and the leg is designed using four four-bar mechanisms. The 2-RRRRR five-bar mechanisms form the head and tail of the robotic lizard. The 2-RRPRR five-bar mechanisms form the left and right sides of the body in the robotic lizard. The four five-bar mechanisms are actuated by only four rotary actuators. Of these, two actuators control the head movements and the other two control the tail movements. The RRPRR five-bar mechanism is controlled by one actuator from the head five-bar mechanism and the other by the tail five-bar mechanism. A tension spring connects each active link to a link in the four bar mechanism. When the robot is actuated, the head, tail and the body moves, and simultaneously each leg moves accordingly. This kind of actuation where the motion transfer occurs from body of the robot to the leg is the novelty in our design. The dimensional synthesis of the robotic lizard is done and presented. Then the forward and inverse kinematics of the mechanism, and configuration space singularities identification for the robot are presented. The gait exhibited by the gecko is studied and then simulated. A computer aided design of the robotic lizard is created and a prototype is made by 3D printing the parts. The prototype is controlled using Arduino UNO as a micro-controller. The experimental results are finally presented based on the gait analysis that was done earlier. The forward walking, and turning motion are done and snapshots are presented.

2.HyperSNN: A new efficient and robust deep learning model for resource constrained control applications

Authors:Zhanglu Yan, Shida Wang, Kaiwen Tang, Wong-Fai Wong

Abstract: In light of the increasing adoption of edge computing in areas such as intelligent furniture, robotics, and smart homes, this paper introduces HyperSNN, an innovative method for control tasks that uses spiking neural networks (SNNs) in combination with hyperdimensional computing. HyperSNN substitutes expensive 32-bit floating point multiplications with 8-bit integer additions, resulting in reduced energy consumption while enhancing robustness and potentially improving accuracy. Our model was tested on AI Gym benchmarks, including Cartpole, Acrobot, MountainCar, and Lunar Lander. HyperSNN achieves control accuracies that are on par with conventional machine learning methods but with only 1.36% to 9.96% of the energy expenditure. Furthermore, our experiments showed increased robustness when using HyperSNN. We believe that HyperSNN is especially suitable for interactive, mobile, and wearable devices, promoting energy-efficient and robust system design. Furthermore, it paves the way for the practical implementation of complex algorithms like model predictive control (MPC) in real-world industrial scenarios.

3.Detecting Olives with Synthetic or Real Data? Olive the Above

Authors:Yianni Karabatis, Xiaomin Lin, Nitin J. Sanket, Michail G. Lagoudakis, Yiannis Aloimonos

Abstract: Modern robotics has enabled the advancement in yield estimation for precision agriculture. However, when applied to the olive industry, the high variation of olive colors and their similarity to the background leaf canopy presents a challenge. Labeling several thousands of very dense olive grove images for segmentation is a labor-intensive task. This paper presents a novel approach to detecting olives without the need to manually label data. In this work, we present the world's first olive detection dataset comprised of synthetic and real olive tree images. This is accomplished by generating an auto-labeled photorealistic 3D model of an olive tree. Its geometry is then simplified for lightweight rendering purposes. In addition, experiments are conducted with a mix of synthetically generated and real images, yielding an improvement of up to 66% compared to when only using a small sample of real data. When access to real, human-labeled data is limited, a combination of mostly synthetic data and a small amount of real data can enhance olive detection.

4.Robust Autonomous Vehicle Pursuit without Expert Steering Labels

Authors:Jiaxin Pan, Changyao Zhou, Mariia Gladkova, Qadeer Khan, Daniel Cremers

Abstract: In this work, we present a learning method for lateral and longitudinal motion control of an ego-vehicle for vehicle pursuit. The car being controlled does not have a pre-defined route, rather it reactively adapts to follow a target vehicle while maintaining a safety distance. To train our model, we do not rely on steering labels recorded from an expert driver but effectively leverage a classical controller as an offline label generation tool. In addition, we account for the errors in the predicted control values, which can lead to a loss of tracking and catastrophic crashes of the controlled vehicle. To this end, we propose an effective data augmentation approach, which allows to train a network capable of handling different views of the target vehicle. During the pursuit, the target vehicle is firstly localized using a Convolutional Neural Network. The network takes a single RGB image along with cars' velocities and estimates the target vehicle's pose with respect to the ego-vehicle. This information is then fed to a Multi-Layer Perceptron, which regresses the control commands for the ego-vehicle, namely throttle and steering angle. We extensively validate our approach using the CARLA simulator on a wide range of terrains. Our method demonstrates real-time performance and robustness to different scenarios including unseen trajectories and high route completion. The project page containing code and multimedia can be publicly accessed here:

5.The Simplest Walking Robot: A bipedal robot with one actuator and two rigid bodies

Authors:James Kyle, Justin K. Yim, Kendall Hart, Sarah Bergbreiter, Aaron M. Johnson

Abstract: We present the design and experimental results of the first 1-DOF, hip-actuated bipedal robot. While passive dynamic walking is simple by nature, many existing bipeds inspired by this form of walking are complex in control, mechanical design, or both. Our design using only two rigid bodies connected by a single motor aims to enable exploration of walking at smaller sizes where more complex designs cannot be constructed. The walker, "Mugatu", is self-contained and autonomous, open-loop stable over a range of input parameters, able to stop and start from standing, and able to control its heading left and right. We analyze the mechanical design and distill down a set of design rules that enable these behaviors. Experimental evaluations measure speed, energy consumption, and steering.

6.Autoencoding a Soft Touch to Learn Grasping from On-land to Underwater

Authors:Ning Guo, Xudong Han, Xiaobo Liu, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jiansheng Dai, Fang Wan, Chaoyang Song

Abstract: Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of grasping knowledge from on-land to underwater via a vision-based soft robotic finger that learns 6D forces and torques (FT) using a Supervised Variational Autoencoder (SVAE). A high-framerate camera captures the whole-body deformations while a soft robotic finger interacts with physical objects on-land and underwater. Results show that the trained SVAE model learned a series of latent representations of the soft mechanics transferrable from land to water, presenting a superior adaptation to the changing environments against commercial FT sensors. Soft, delicate, and reactive grasping enabled by tactile intelligence enhances the gripper's underwater interaction with improved reliability and robustness at a much-reduced cost, paving the path for learning-based intelligent grasping to support fundamental scientific discoveries in environmental and ocean research.

7.Proprioceptive Learning with Soft Polyhedral Networks

Authors:Xiaobo Liu, Xudong Han, Wei Hong, Fang Wan, Chaoyang Song

Abstract: Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of adaptive kinesthesia and viscoelastic proprioception by learning kinetic features. This design enables passive adaptations to omni-directional interactions, visually captured by a miniature high-speed motion tracking system embedded inside for proprioceptive learning. The results show that the soft network can infer real-time 6D forces and torques with accuracies of 0.25/0.24/0.35 N and 0.025/0.034/0.006 Nm in dynamic interactions. We also incorporate viscoelasticity in proprioception during static adaptation by adding a creep and relaxation modifier to refine the predicted results. The proposed soft network combines simplicity in design, omni-adaptation, and proprioceptive sensing with high accuracy, making it a versatile solution for robotics at a low cost with more than 1 million use cycles for tasks such as sensitive and competitive grasping, and touch-based geometry reconstruction. This study offers new insights into vision-based proprioception for soft robots in adaptive grasping, soft manipulation, and human-robot interaction.

1.Extended Preintegration for Relative State Estimation of Leader-Follower Platform

Authors:Ruican Xia, Hailong Pei

Abstract: Relative state estimation using exteroceptive sensors suffers from limitations of the field of view (FOV) and false detection, that the proprioceptive sensor (IMU) data are usually engaged to compensate. Recently ego-motion constraint obtained by Inertial measurement unit (IMU) preintegration has been extensively used in simultaneous localization and mapping (SLAM) to alleviate the computation burden. This paper introduces an extended preintegration incorporating the IMU preintegration of two platforms to formulate the motion constraint of relative state. One merit of this analytic constraint is that it can be seamlessly integrated into the unified graph optimization framework to implement the relative state estimation in a high-performance real-time tracking thread, another point is a full smoother design with this precise constraint to optimize the 3D coordinate and refine the state for the refinement thread. We compare extensively in simulations the proposed algorithms with two existing approaches to confirm our outperformance. In the real virtual reality (VR) application design with the proposed estimator, we properly realize the visual tracking of the six degrees of freedom (6DoF) controller suitable for almost all scenarios, including the challenging environment with missing features, light mutation, dynamic scenes, etc. The demo video is at For the benefit of the community, we make the source code public.

2.Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

Authors:Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek Pohang, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabasa Gavin Cangan, Bernhard Schölkopf, Georg Martius

Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation. In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, grasping, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike. In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets.

3.Hierarchical generative modelling for autonomous robots

Authors:Kai Yuan, Noor Sajid, Karl Friston, Zhibin Li

Abstract: Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold, for example, delivering an object requires a global plan to contextualise the fast coordination of multiple local movements of limbs. This separation of temporal scales also motivates robotics and control. Specifically, to achieve versatile sensorimotor control, it is advantageous to hierarchically structure the planning and low-level motor control of individual limbs. We use numerical and physical simulation to conduct experiments and to establish the efficacy of this formulation. Using a hierarchical generative model, we show how a humanoid robot can autonomously complete a complex task that necessitates a holistic use of locomotion, manipulation, and grasping. Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities. Our findings demonstrated the effectiveness of using human-inspired motor control algorithms, and our method provides a viable hierarchical architecture for the autonomous completion of challenging goal-directed tasks.

4.Grasp Transfer based on Self-Aligning Implicit Representations of Local Surfaces

Authors:Ahmet Tekden, Marc Peter Deisenroth, Yasemin Bekiroglu

Abstract: Objects we interact with and manipulate often share similar parts, such as handles, that allow us to transfer our actions flexibly due to their shared functionality. This work addresses the problem of transferring a grasp experience or a demonstration to a novel object that shares shape similarities with objects the robot has previously encountered. Existing approaches for solving this problem are typically restricted to a specific object category or a parametric shape. Our approach, however, can transfer grasps associated with implicit models of local surfaces shared across object categories. Specifically, we employ a single expert grasp demonstration to learn an implicit local surface representation model from a small dataset of object meshes. At inference time, this model is used to transfer grasps to novel objects by identifying the most geometrically similar surfaces to the one on which the expert grasp is demonstrated. Our model is trained entirely in simulation and is evaluated on simulated and real-world objects that are not seen during training. Evaluations indicate that grasp transfer to unseen object categories using this approach can be successfully performed both in simulation and real-world experiments. The simulation results also show that the proposed approach leads to better spatial precision and grasp accuracy compared to a baseline approach.

5.The $10 Million ANA Avatar XPRIZE Competition Advanced Immersive Telepresence Systems

Authors:Sven Behnke, Julie A. Adams, David Locke

Abstract: The $10M ANA Avatar XPRIZE aimed to create avatar systems that can transport human presence to remote locations in real time. The participants of this multi-year competition developed robotic systems that allow operators to see, hear, and interact with a remote environment in a way that feels as if they are truly there. On the other hand, people in the remote environment were given the impression that the operator was present inside the avatar robot. At the competition finals, held in November 2022 in Long Beach, CA, USA, the avatar systems were evaluated on their support for remotely interacting with humans, exploring new environments, and employing specialized skills. This article describes the competition stages with tasks and evaluation procedures, reports the results, presents the winning teams' approaches, and discusses lessons learned.

1.RobotKube: Orchestrating Large-Scale Cooperative Multi-Robot Systems with Kubernetes and ROS

Authors:Bastian Lampe, Lennart Reiher, Lukas Zanger, Timo Woopen, Raphael van Kempen, Lutz Eckstein

Abstract: Modern cyber-physical systems (CPS) such as Cooperative Intelligent Transport Systems (C-ITS) are increasingly defined by the software which operates these systems. In practice, microservice architectures can be employed, which may consist of containerized microservices running in a cluster comprised of robots and supporting infrastructure. These microservices need to be orchestrated dynamically according to ever changing requirements posed at the system. Additionally, these systems are embedded in DevOps processes aiming at continually updating and upgrading both the capabilities of CPS components and of the system as a whole. In this paper, we present RobotKube, an approach to orchestrating containerized microservices for large-scale cooperative multi-robot CPS based on Kubernetes. We describe how to automate the orchestration of software across a CPS, and include the possibility to monitor and selectively store relevant accruing data. In this context, we present two main components of such a system: an event detector capable of, e.g., requesting the deployment of additional applications, and an application manager capable of automatically configuring the required changes in the Kubernetes cluster. By combining the widely adopted Kubernetes platform with the Robot Operating System (ROS), we enable the use of standard tools and practices for developing, deploying, scaling, and monitoring microservices in C-ITS. We demonstrate and evaluate RobotKube in an exemplary and reproducible use case that we make publicly available at .

2.RL-based Variable Horizon Model Predictive Control of Multi-Robot Systems using Versatile On-Demand Collision Avoidance

Authors:Shreyash Gupta, Abhinav Kumar, Niladri S. Tripathy, Suril V. Shah

Abstract: Multi-robot systems have become very popular in recent years because of their wide spectrum of applications, ranging from surveillance to cooperative payload transportation. Model Predictive Control (MPC) is a promising controller for multi-robot control because of its preview capability and ability to handle constraints easily. The performance of the MPC widely depends on many parameters, among which the prediction horizon is the major contributor. Increasing the prediction horizon beyond a limit drastically increases the computation cost. Tuning the value of the prediction horizon can be very time-consuming, and the tuning process must be repeated for every task. Moreover, instead of using a fixed horizon for an entire task, a better balance between performance and computation cost can be established if different prediction horizons can be employed for every robot at each time step. Further, for such variable prediction horizon MPC for multiple robots, on-demand collision avoidance is the key requirement. We propose Versatile On-demand Collision Avoidance (VODCA) strategy to comply with the variable horizon model predictive control. We also present a framework for learning the prediction horizon for the multi-robot system as a function of the states of the robots using the Soft Actor-Critic (SAC) RL algorithm. The results are illustrated and validated numerically for different multi-robot tasks.

3.Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

Authors:Eugen Šlapak, Enric Pardo, Matúš Dopiriak, Taras Maksymyuk, Juraj Gazda

Abstract: The proliferation of technologies, such as extended reality (XR), has increased the demand for high-quality three-dimensional (3D) graphical representations. Industrial 3D applications encompass computer-aided design (CAD), finite element analysis (FEA), scanning, and robotics. However, current methods employed for industrial 3D representations suffer from high implementation costs and reliance on manual human input for accurate 3D modeling. To address these challenges, neural radiance fields (NeRFs) have emerged as a promising approach for learning 3D scene representations based on provided training 2D images. Despite a growing interest in NeRFs, their potential applications in various industrial subdomains are still unexplored. In this paper, we deliver a comprehensive examination of NeRF industrial applications while also providing direction for future research endeavors. We also present a series of proof-of-concept experiments that demonstrate the potential of NeRFs in the industrial domain. These experiments include NeRF-based video compression techniques and using NeRFs for 3D motion estimation in the context of collision avoidance. In the video compression experiment, our results show compression savings up to 48\% and 74\% for resolutions of 1920x1080 and 300x168, respectively. The motion estimation experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF) and achieved an average disparity map PSNR of 23 dB and an SSIM of 0.97. The code for our experiments is publicly available at .

4.Enhancing State Estimator for Autonomous Race Car : Leveraging Multi-modal System and Managing Computing Resources

Authors:Daegyu Lee, Hyunwoo Nam, Chanhoe Ryu, Sungwon Nah, Seongwoo Moon, D. Hyunchul Shim

Abstract: This paper introduces an innovative approach to enhance the state estimator for high-speed autonomous race cars, addressing challenges related to unreliable measurements, localization failures, and computing resource management. The proposed robust localization system utilizes a Bayesian-based probabilistic approach to evaluate multimodal measurements, ensuring the use of credible data for accurate and reliable localization, even in harsh racing conditions. To tackle potential localization failures during intense racing, we present a resilient navigation system. This system enables the race car to continue track-following by leveraging direct perception information in planning and execution, ensuring continuous performance despite localization disruptions. Efficient computing resource management is critical to avoid overload and system failure. We optimize computing resources using an efficient LiDAR-based state estimation method. Leveraging CUDA programming and GPU acceleration, we perform nearest points search and covariance computation efficiently, overcoming CPU bottlenecks. Real-world and simulation tests validate the system's performance and resilience. The proposed approach successfully recovers from failures, effectively preventing accidents and ensuring race car safety.

5.Auditory cueing strategy for stride length and cadence modification: a feasibility study with healthy adults

Authors:Tina LY Wu, Anna Murphy, Chao Chen, Dana Kulic

Abstract: People with Parkinson's Disease experience gait impairments that significantly impact their quality of life. Visual, auditory, and tactile cues can alleviate gait impairments, but they can become less effective due to the progressive nature of the disease and changes in people's motor capability. In this study, we develop a human-in-the-loop (HIL) framework that monitors two key gait parameters, stride length and cadence, and continuously learns a person-specific model of how the parameters change in response to the feedback. The model is then used in an optimization algorithm to improve the gait parameters. This feasibility study examines whether auditory cues can be used to influence stride length in people without gait impairments. The results demonstrate the benefits of the HIL framework in maintaining people's stride length in the presence of a secondary task.

6.Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

Authors:Byeonghwi Kim, Jinyeon Kim, Yuyeong Kim, Cheolhong Min, Jonghyun Choi

Abstract: Accomplishing household tasks such as 'bringing a cup of water' requires planning step-by-step actions by maintaining knowledge about the spatial arrangement of objects and the consequences of previous actions. Perception models of the current embodied AI agents, however, often make mistakes due to a lack of such knowledge but rely on imperfect learning of imitating agents or an algorithmic planner without knowledge about the changed environment by the previous actions. To address the issue, we propose CPEM (Context-aware Planner and Environment-aware Memory) to incorporate the contextual information of previous actions for planning and maintaining spatial arrangement of objects with their states (e.g., if an object has been moved or not) in an environment to the perception model for improving both visual navigation and object interaction. We observe that CPEM achieves state-of-the-art task success performance in various metrics using a challenging interactive instruction following benchmark both in seen and unseen environments by large margins (up to +10.70% in unseen env.). CPEM with the templated actions, named ECLAIR, also won the 1st generalist language grounding agents challenge at Embodied AI Workshop in CVPR'23.

7.Efficient Real-time Smoke Filtration with 3D LiDAR for Search and Rescue with Autonomous Heterogeneous Robotic Systems

Authors:Alexander Kyuroson, Anton Koval, George Nikolakopoulos

Abstract: Search and Rescue (SAR) missions in harsh and unstructured Sub-Terranean (Sub-T) environments in the presence of aerosol particles have recently become the main focus in the field of robotics. Aerosol particles such as smoke and dust directly affect the performance of any mobile robotic platform due to their reliance on their onboard perception systems for autonomous navigation and localization in Global Navigation Satellite System (GNSS)-denied environments. Although obstacle avoidance and object detection algorithms are robust to the presence of noise to some degree, their performance directly relies on the quality of captured data by onboard sensors such as Light Detection And Ranging (LiDAR) and camera. Thus, this paper proposes a novel modular agnostic filtration pipeline based on intensity and spatial information such as local point density for removal of detected smoke particles from Point Cloud (PCL) prior to its utilization for collision detection. Furthermore, the efficacy of the proposed framework in the presence of smoke during multiple frontier exploration missions is investigated while the experimental results are presented to facilitate comparison with other methodologies and their computational impact. This provides valuable insight to the research community for better utilization of filtration schemes based on available computation resources while considering the safe autonomous navigation of mobile robots.

8.On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics

Authors:Connor Holmes, Frederike Dümbgen, Timothy D Barfoot

Abstract: In recent years, there has been remarkable progress in the development of so-called certifiable perception methods, which leverage semidefinite, convex relaxations to find global optima of perception problems in robotics. However, many of these relaxations rely on simplifying assumptions that facilitate the problem formulation, such as an isotropic measurement noise distribution. In this paper, we explore the tightness of the semidefinite relaxations of matrix-weighted (anisotropic) state-estimation problems and reveal the limitations lurking therein: matrix-weighted factors can cause convex relaxations to lose tightness. In particular, we show that the semidefinite relaxations of localization problems with matrix weights may be tight only for low noise levels. We empirically explore the factors that contribute to this loss of tightness and demonstrate that redundant constraints can be used to regain tightness, albeit at the expense of real-time performance. As a second technical contribution of this paper, we show that the state-of-the-art relaxation of scalar-weighted SLAM cannot be used when matrix weights are considered. We provide an alternate formulation and show that its SDP relaxation is not tight (even for very low noise levels) unless specific redundant constraints are used. We demonstrate the tightness of our formulations on both simulated and real-world data.

9.Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid

Authors:Alexander Kyuroson, Anton Koval, George Nikolakopoulos

Abstract: LiDAR is currently one of the most utilized sensors to effectively monitor the status of power lines and facilitate the inspection of remote power distribution networks and related infrastructures. To ensure the safe operation of the smart grid, various remote data acquisition strategies, such as Airborne Laser Scanning (ALS), Mobile Laser Scanning (MLS), and Terrestrial Laser Scanning (TSL) have been leveraged to allow continuous monitoring of regional power networks, which are typically surrounded by dense vegetation. In this article, an unsupervised Machine Learning (ML) framework is proposed, to detect, extract and analyze the characteristics of power lines of both high and low voltage, as well as the surrounding vegetation in a Power Line Corridor (PLC) solely from LiDAR data. Initially, the proposed approach eliminates the ground points from higher elevation points based on statistical analysis that applies density criteria and histogram thresholding. After denoising and transforming of the remaining candidate points by applying Principle Component Analysis (PCA) and Kd-tree, power line segmentation is achieved by utilizing a two-stage DBSCAN clustering to identify each power line individually. Finally, all high elevation points in the PLC are identified based on their distance to the newly segmented power lines. Conducted experiments illustrate that the proposed framework is an agnostic method that can efficiently detect the power lines and perform PLC-based hazard analysis.

1.Reachable Set-based Path Planning for Automated Vertical Parking System

Authors:In Hyuk Oh, Ju Won Seo, Jin Sung Kim, Chung Choo Chung

Abstract: This paper proposes a local path planning method with a reachable set for Automated vertical Parking Systems (APS). First, given a parking lot layout with a goal position, we define an intermediate pose for the APS to accomplish reverse parking with a single maneuver, i.e., without changing the gear shift. Then, we introduce a reachable set which is a set of points consisting of the grid points of all possible intermediate poses. Once the APS approaches the goal position, it must select an intermediate pose in the reachable set. A minimization problem was formulated and solved to choose the intermediate pose. We performed various scenarios with different parking lot conditions. We used the Hybrid-A* algorithm for the global path planning to move the vehicle from the starting pose to the intermediate pose and utilized clothoid-based local path planning to move from the intermediate pose to the goal pose. Additionally, we designed a controller to follow the generated path and validated its tracking performance. It was confirmed that the tracking error in the mean root square for the lateral position was bounded within 0.06m and for orientation within 0.01rad.

2.The Impact of Overall Optimization on Warehouse Automation

Authors:Hiroshi Yoshitake, Pieter Abbeel

Abstract: In this study, we propose a novel approach for investigating optimization performance by flexible robot coordination in automated warehouses with multi-agent reinforcement learning (MARL)-based control. Automated systems using robots are expected to achieve efficient operations compared with manual systems in terms of overall optimization performance. However, the impact of overall optimization on performance remains unclear in most automated systems due to a lack of suitable control methods. Thus, we proposed a centralized training-and-decentralized execution MARL framework as a practical overall optimization control method. In the proposed framework, we also proposed a single shared critic, trained with global states and rewards, applicable to a case in which heterogeneous agents make decisions asynchronously. Our proposed MARL framework was applied to the task selection of material handling equipment through automated order picking simulation, and its performance was evaluated to determine how far overall optimization outperforms partial optimization by comparing it with other MARL frameworks and rule-based control methods.

3.User Feedback and Sample Weighting for Ill-Conditioned Hand-Eye Calibration

Authors:Markus Horn, Thomas Wodtko, Michael Buchholz, Klaus Dietmayer

Abstract: Hand-eye calibration is an important and extensively researched method for calibrating rigidly coupled sensors, solely based on estimates of their motion. Due to the geometric structure of this problem, at least two motion estimates with non-parallel rotation axes are required for a unique solution. If the majority of rotation axes are almost parallel, the resulting optimization problem is ill-conditioned. In this paper, we propose an approach to automatically weight the motion samples of such an ill-conditioned optimization problem for improving the conditioning. The sample weights are chosen in relation to the local density of all available rotation axes. Furthermore, we present an approach for estimating the sensitivity and conditioning of the cost function, separated into the translation and the rotation part. This information can be employed as user feedback when recording the calibration data to prevent ill-conditioning in advance. We evaluate and compare our approach on artificially augmented data from the KITTI odometry dataset.

4.Towards a Causal Probabilistic Framework for Prediction, Action-Selection & Explanations for Robot Block-Stacking Tasks

Authors:Ricardo Cannizzaro, Jonathan Routley, Lars Kunze

Abstract: Uncertainties in the real world mean that is impossible for system designers to anticipate and explicitly design for all scenarios that a robot might encounter. Thus, robots designed like this are fragile and fail outside of highly-controlled environments. Causal models provide a principled framework to encode formal knowledge of the causal relationships that govern the robot's interaction with its environment, in addition to probabilistic representations of noise and uncertainty typically encountered by real-world robots. Combined with causal inference, these models permit an autonomous agent to understand, reason about, and explain its environment. In this work, we focus on the problem of a robot block-stacking task due to the fundamental perception and manipulation capabilities it demonstrates, required by many applications including warehouse logistics and domestic human support robotics. We propose a novel causal probabilistic framework to embed a physics simulation capability into a structural causal model to permit robots to perceive and assess the current state of a block-stacking task, reason about the next-best action from placement candidates, and generate post-hoc counterfactual explanations. We provide exemplar next-best action selection results and outline planned experimentation in simulated and real-world robot block-stacking tasks.

1.Multi-Visual-Inertial System: Analysis,Calibration and Estimation

Authors:Yulin Yang, Patrick Geneva, Guoquan Huang

Abstract: In this paper, we study state estimation of multi-visual-inertial systems (MVIS) and develop sensor fusion algorithms to optimally fuse an arbitrary number of asynchronous inertial measurement units (IMUs) or gyroscopes and global and(or) rolling shutter cameras. We are especially interested in the full calibration of the associated visual-inertial sensors, including the IMU or camera intrinsics and the IMU-IMU(or camera) spatiotemporal extrinsics as well as the image readout time of rolling-shutter cameras (if used). To this end, we develop a new analytic combined IMU integration with intrinsics-termed ACI3-to preintegrate IMU measurements, which is leveraged to fuse auxiliary IMUs and(or) gyroscopes alongside a base IMU. We model the multi-inertial measurements to include all the necessary inertial intrinsic and IMU-IMU spatiotemporal extrinsic parameters, while leveraging IMU-IMU rigid-body constraints to eliminate the necessity of auxiliary inertial poses and thus reducing computational complexity. By performing observability analysis of MVIS, we prove that the standard four unobservable directions remain - no matter how many inertial sensors are used, and also identify, for the first time, degenerate motions for IMU-IMU spatiotemporal extrinsics and auxiliary inertial intrinsics. In addition to the extensive simulations that validate our analysis and algorithms, we have built our own MVIS sensor rig and collected over 25 real-world datasets to experimentally verify the proposed calibration against the state-of-the-art calibration method such as Kalibr. We show that the proposed MVIS calibration is able to achieve competing accuracy with improved convergence and repeatability, which is open sourced to better benefit the community.

2.Visibility-Constrained Control of Multirotor via Reference Governor

Authors:Dabin Kim, Matthias Pezzutto, Luca Schenato, H. Jin Kim

Abstract: For safe vision-based control applications, perception-related constraints have to be satisfied in addition to other state constraints. In this paper, we deal with the problem where a multirotor equipped with a camera needs to maintain the visibility of a point of interest while tracking a reference given by a high-level planner. We devise a method based on reference governor that, differently from existing solutions, is able to enforce control-level visibility constraints with theoretically assured feasibility. To this end, we design a new type of reference governor for linear systems with polynomial constraints which is capable of handling time-varying references. The proposed solution is implemented online for the real-time multirotor control with visibility constraints and validated with simulations and an actual hardware experiment.

1.TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths

Authors:Wenbo Wang, Gen Li, Miguel Zamora, Stelian Coros

Abstract: Precisely reconstructing and manipulating crumpled cloths is challenging due to the high dimensionality of the cloth model, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human body reconstruction to template-based reconstruct the crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly indicates the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our template-based reconstruction and target-oriented manipulation (TRTM) system can be applied to daily cloths with similar topologies as our template mesh, but have different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website:

1.Multi-level Map Construction for Dynamic Scenes

Authors:Xinggang Hu

Abstract: In dynamic scenes, both localization and mapping in visual SLAM face significant challenges. In recent years, numerous outstanding research works have proposed effective solutions for the localization problem. However, there has been a scarcity of excellent works focusing on constructing long-term consistent maps in dynamic scenes, which severely hampers map applications. To address this issue, we have designed a multi-level map construction system tailored for dynamic scenes. In this system, we employ multi-object tracking algorithms, DBSCAN clustering algorithm, and depth information to rectify the results of object detection, accurately extract static point clouds, and construct dense point cloud maps and octree maps. We propose a plane map construction algorithm specialized for dynamic scenes, involving the extraction, filtering, data association, and fusion optimization of planes in dynamic environments, thus creating a plane map. Additionally, we introduce an object map construction algorithm targeted at dynamic scenes, which includes object parameterization, data association, and update optimization. Extensive experiments on public datasets and real-world scenarios validate the accuracy of the multi-level maps constructed in this study and the robustness of the proposed algorithms. Furthermore, we demonstrate the practical application prospects of our algorithms by utilizing the constructed object maps for dynamic object tracking.

2.ChatSim: Underwater Simulation with Natural Language Prompting

Authors:Aadi Palnitkar, Rashmi Kapu, Xiaomin Lin, Cheng Liu, Nare Karapetyan, Yiannis Aloimonos

Abstract: Robots are becoming an essential part of many operations including marine exploration or environmental monitoring. However, the underwater environment presents many challenges, including high pressure, limited visibility, and harsh conditions that can damage equipment. Real-world experimentation can be expensive and difficult to execute. Therefore, it is essential to simulate the performance of underwater robots in comparable environments to ensure their optimal functionality within practical real-world contexts.OysterSim generates photo-realistic images and segmentation masks of objects in marine environments, providing valuable training data for underwater computer vision applications. By integrating ChatGPT into underwater simulations, users can convey their thoughts effortlessly and intuitively create desired underwater environments without intricate coding. \invis{Moreover, researchers can realize substantial time and cost savings by evaluating their algorithms across diverse underwater conditions in the simulation.} The objective of ChatSim is to integrate Large Language Models (LLM) with a simulation environment~(OysterSim), enabling direct control of the simulated environment via natural language input. This advancement can greatly enhance the capabilities of underwater simulation, with far-reaching benefits for marine exploration and broader scientific research endeavors.

1.Aggregating Single-wheeled Mobile Robots for Omnidirectional Movements

Authors:Meng Wang, Yao Su, Hang Li, Jiarui Li, Jixiang Liang, Hangxin Liu

Abstract: This paper presents a novel modular robot system that can self-reconfigure to achieve omnidirectional movements for collaborative object transportation. Each robotic module is equipped with a steerable omni-wheel for navigation and is shaped as a regular icositetragon with a permanent magnet installed on each corner for stable docking. After aggregating multiple modules and forming a structure that can cage a target object, we have developed an optimization-based method to compute the distribution of all wheels' heading directions, which enables efficient omnidirectional movements of the structure. By implementing a hierarchical controller on our prototyped system in both simulation and experiment, we validated the trajectory tracking performance of an individual module and a team of six modules in multiple navigation and collaborative object transportation settings. The results demonstrate that the proposed system can maintain a stable caging formation and achieve smooth transportation, indicating the effectiveness of our hardware and locomotion designs.

2.Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots

Authors:Yoshiki Obinata, Naoaki Kanazawa, Kento Kawaharazuka, Iori Yanokura, Soonhyo Kim, Kei Okada, Masayuki Inaba

Abstract: This paper describes a strategy for implementing a robotic system capable of performing General Purpose Service Robot (GPSR) tasks in robocup@home. The GPSR task is that a real robot hears a variety of commands in spoken language and executes a task in a daily life environment. To achieve the task, we integrate foundation models based inference system and a state machine task executable. The foundation models plan the task and detect objects with open vocabulary, and a state machine task executable manages each robot's actions. This system works stable, and we took first place in the RoboCup@home Japan Open 2022's GPSR with 130 points, more than 85 points ahead of the other teams.

3.Robots as AI Double Agents: Privacy in Motion Planning

Authors:Rahul Shome, Zachary Kingston, Lydia E. Kavraki

Abstract: Robotics and automation are poised to change the landscape of home and work in the near future. Robots are adept at deliberately moving, sensing, and interacting with their environments. The pervasive use of this technology promises societal and economic payoffs due to its capabilities - conversely, the capabilities of robots to move within and sense the world around them is susceptible to abuse. Robots, unlike typical sensors, are inherently autonomous, active, and deliberate. Such automated agents can become AI double agents liable to violate the privacy of coworkers, privileged spaces, and other stakeholders. In this work we highlight the understudied and inevitable threats to privacy that can be posed by the autonomous, deliberate motions and sensing of robots. We frame the problem within broader sociotechnological questions alongside a comprehensive review. The privacy-aware motion planning problem is formulated in terms of cost functions that can be modified to induce privacy-aware behavior - preserving, agnostic, or violating. Simulated case studies in manipulation and navigation, with altered cost functions, are used to demonstrate how privacy-violating threats can be easily injected, sometimes with only small changes in performance (solution path lengths). Such functionality is already widely available. This preliminary work is meant to lay the foundations for near-future, holistic, interdisciplinary investigations that can address questions surrounding privacy in intelligent robotic behaviors determined by planning algorithms.

4.Adaptive Patched Grid Mapping

Authors:Thomas Wodtko, Thomas Griebel, Michael Buchholz

Abstract: In this work, we propose a novel adaptive grid mapping approach, the Adaptive Patched Grid Map, which enables a situational aware grid based perception for autonomous vehicles. Its structure allows a flexible representation of the surrounding unstructured environment. By splitting types of information into separate layers less memory is allocated when data is unevenly or sporadically available. However, layers must be resampled during the fusion process to cope with dynamically changing cell sizes. Therefore, we propose a novel spatial cell fusion approach. Together with the proposed fusion framework, dynamically changing external requirements, such as cell resolution specifications and horizon targets, are considered. For our evaluation, real-world data were recorded from an autonomous vehicle driving through various traffic situations. Based on this, the memory efficiency is compared to other approaches, and fusion execution times are determined. The results confirm the adaptation to requirement changes and a significant memory usage reduction.

5.Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Authors:Quentin Rouxel LARSEN, Ruoshi Wen UCL, Zhibin Li UCL, Carlo Tiseo LARSEN, Jean-Baptiste Mouret LARSEN, Serena Ivaldi LARSEN

Abstract: This short paper outlines two recent works on multi-contact teleoperation and the development of the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) framework. SEIKO adapts commands from the operator in real-time and ensures that the reference configuration sent to the underlying controller is feasible. Additionally, an admittance scheme is used to implement physical interaction, which is then combined with the operator's command and retargeted. SEIKO has been applied in simulations on various robots, including humanoid and quadruped robots designed for loco-manipulation. Furthermore, SEIKO has been tested on real hardware for bimanual heavy object carrying tasks.

6.DNFOMP: Dynamic Neural Field Optimal Motion Planner for Navigation of Autonomous Robots in Cluttered Environment

Authors:Maksim Katerishich, Mikhail Kurenkov, Sausar Karaf, Artem Nenashev, Dzmitry Tsetserukou

Abstract: Motion planning in dynamically changing environments is one of the most complex challenges in autonomous driving. Safety is a crucial requirement, along with driving comfort and speed limits. While classical sampling-based, lattice-based, and optimization-based planning methods can generate smooth and short paths, they often do not consider the dynamics of the environment. Some techniques do consider it, but they rely on updating the environment on-the-go rather than explicitly accounting for the dynamics, which is not suitable for self-driving. To address this, we propose a novel method based on the Neural Field Optimal Motion Planner (NFOMP), which outperforms state-of-the-art approaches in terms of normalized curvature and the number of cusps. Our approach embeds previously known moving obstacles into the neural field collision model to account for the dynamics of the environment. We also introduce time profiling of the trajectory and non-linear velocity constraints by adding Lagrange multipliers to the trajectory loss function. We applied our method to solve the optimal motion planning problem in an urban environment using the driving simulator. An autonomous car drove the generated trajectories in three city scenarios while sharing the road with the obstacle vehicle. Our evaluation shows that the maximum acceleration the passenger can experience instantly is -7.5 m/s^2 and that 89.6% of the driving time is devoted to normal driving with accelerations below 3.5 m/s^2. The driving style is characterized by 46.0% and 31.4% of the driving time being devoted to the light rail transit style and the moderate driving style, respectively.

7.Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Authors:Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai Feng, Tao Kong

Abstract: Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are yet to be built. In this paper, we thoroughly investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives: pre-training datasets, model architectures and training methods. Several significant experimental findings are provided that are beneficial for robot learning. Further, we propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning. Concretely, the former employs contrastive learning to acquire underlying patterns from large-scale unlabeled data, while the latter aims learning visual semantics and temporal dynamics. Extensive experiments on robot manipulations in various simulation environments and the real robot demonstrate the superiority of the proposed scheme. Videos and more details can be found on \url{}.

8.MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation

Authors:Taozheng Yang, Ya Jing, Hongtao Wu, Jiafeng Xu, Kuankuan Sima, Guangzeng Chen, Qie Sima, Tao Kong

Abstract: In this paper, we present a novel method for mobile manipulators to perform multiple contact-rich manipulation tasks. While learning-based methods have the potential to generate actions in an end-to-end manner, they often suffer from insufficient action accuracy and robustness against noise. On the other hand, classical control-based methods can enhance system robustness, but at the cost of extensive parameter tuning. To address these challenges, we present MOMA-Force, a visual-force imitation method that seamlessly combines representation learning for perception, imitation learning for complex motion generation, and admittance whole-body control for system robustness and controllability. MOMA-Force enables a mobile manipulator to learn multiple complex contact-rich tasks with high success rates and small contact forces. In a real household setting, our method outperforms baseline methods in terms of task success rates. Moreover, our method achieves smaller contact forces and smaller force variances compared to baseline methods without force imitation. Overall, we offer a promising approach for efficient and robust mobile manipulation in the real world. Videos and more details can be found on \url{}

9.Safe Multimodal Communication in Human-Robot Collaboration

Authors:Davide Ferrari, Andrea Pupa, Alberto Signoretti, Cristian Secchi

Abstract: The new industrial settings are characterized by the presence of human and robots that work in close proximity, cooperating in performing the required job. Such a collaboration, however, requires to pay attention to many aspects. Firstly, it is crucial to enable a communication between this two actors that is natural and efficient. Secondly, the robot behavior must always be compliant with the safety regulations, ensuring always a safe collaboration. In this paper, we propose a framework that enables multi-channel communication between humans and robots by leveraging multimodal fusion of voice and gesture commands while always respecting safety regulations. The framework is validated through a comparative experiment, demonstrating that, thanks to multimodal communication, the robot can extract valuable information for performing the required task and additionally, with the safety layer, the robot can scale its speed to ensure the operator's safety.

10.SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention

Authors:Efimia Panagiotaki, Daniele De Martini, Georgi Pramatarov, Matthew Gadd, Lars Kunze

Abstract: This paper proposes a GNN-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-based keypoint node aggregation GNN network by identifying semantic instance-based relationships, acting as inductive bias to significantly reduce the computational burden of pointcloud registration. By connecting candidate nodes and exploiting cross-graph attention, we identify confidence scores for all potential registration correspondences, estimating the displacement between pointcloud scans. Our pipeline enables introspective analysis of the model's performance by correlating it with the individual contributions of local structures in the environment, providing valuable insights into the system's behaviour. We test our method on the KITTI odometry dataset, achieving competitive accuracy compared to benchmark methods and a higher track smoothness while relying on significantly fewer network parameters.

11.System Identification and Control of Front-Steered Ackermann Vehicles through Differentiable Physics

Authors:Burak M. Gonultas, Pratik Mukherjee, O. Goktug Poyrazoglu, Volkan Isler

Abstract: In this paper, we address the problem of system identification and control of a front-steered vehicle which abides by the Ackermann geometry constraints. This problem arises naturally for on-road and off-road vehicles that require reliable system identification and basic feedback controllers for various applications such as lane keeping and way-point navigation. Traditional system identification requires expensive equipment and is time consuming. In this work we explore the use of differentiable physics for system identification and controller design and make the following contributions: i)We develop a differentiable physics simulator (DPS) to provide a method for the system identification of front-steered class of vehicles whose system parameters are learned using a gradient-based method; ii) We provide results for our gradient-based method that exhibit better sample efficiency in comparison to other gradient-free methods; iii) We validate the learned system parameters by implementing a feedback controller to demonstrate stable lane keeping performance on a real front-steered vehicle, the F1TENTH; iv) Further, we provide results exhibiting comparable lane keeping behavior for system parameters learned using our gradient-based method with lane keeping behavior of the actual system parameters of the F1TENTH.

12.State Estimation of Continuum Robots: A Nonlinear Constrained Moving Horizon Approach

Authors:Hend Abdelaziz, Ayman Nada, Hiroyuki Ishii, Haitham El-Hussieny

Abstract: Continuum robots, made from flexible materials with continuous backbones, have several advantages over traditional rigid robots. Some of them are the ability to navigate through narrow or confined spaces, adapt to irregular or changing environments, and perform tasks in proximity to humans. However, one of the challenges in using continuum robots is the difficulty in accurately estimating their state, such as their tip position and curvature. This is due to the complexity of their kinematics and the inherent uncertainty in their measurement and control. This paper proposes a moving horizon estimation (MHE) approach for estimating the robot's state, including its tip position and shape parameters. Our approach involves minimizing the error between measurement samples from an IMU attached to the robot's tip and the estimated state along the estimation horizon using an inline optimization problem. We demonstrate the effectiveness of our approach through simulation and experimental results. Our approach can potentially improve the accuracy and robustness of state estimation and control for continuum robots. It can be applied to various applications such as surgery, manufacturing, and inspection.

1.World-Model-Based Control for Industrial box-packing of Multiple Objects using NewtonianVAE

Authors:Yusuke Kato, Ryo Okumura, Tadahiro Taniguchi

Abstract: The process of industrial box-packing, which involves the accurate placement of multiple objects, requires high-accuracy positioning and sequential actions. When a robot is tasked with placing an object at a specific location with high accuracy, it is important not only to have information about the location of the object to be placed, but also the posture of the object grasped by the robotic hand. Often, industrial box-packing requires the sequential placement of identically shaped objects into a single box. The robot's action should be determined by the same learned model. In factories, new kinds of products often appear and there is a need for a model that can easily adapt to them. Therefore, it should be easy to collect data to train the model. In this study, we designed a robotic system to automate real-world industrial tasks, employing a vision-based learning control model. We propose in-hand-view-sensitive Newtonian variational autoencoder (ihVS-NVAE), which employs an RGB camera to obtain in-hand postures of objects. We demonstrate that our model, trained for a single object-placement task, can handle sequential tasks without additional training. To evaluate efficacy of the proposed model, we employed a real robot to perform sequential industrial box-packing of multiple objects. Results showed that the proposed model achieved a 100% success rate in industrial box-packing tasks, thereby outperforming the state-of-the-art and conventional approaches, underscoring its superior effectiveness and potential in industrial tasks.

2.Learning to Shape by Grinding: Cutting-surface-aware Model-based Reinforcement Learning

Authors:Takumi Hachimine, Jun Morimoto, Takamitsu Matsubara

Abstract: Object shaping by grinding is a crucial industrial process in which a rotating grinding belt removes material. Object-shape transition models are essential to achieving automation by robots; however, learning such a complex model that depends on process conditions is challenging because it requires a significant amount of data, and the irreversible nature of the removal process makes data collection expensive. This paper proposes a cutting-surface-aware Model-Based Reinforcement Learning (MBRL) method for robotic grinding. Our method employs a cutting-surface-aware model as the object's shape transition model, which in turn is composed of a geometric cutting model and a cutting-surface-deviation model, based on the assumption that the robot action can specify the cutting surface made by the tool. Furthermore, according to the grinding resistance theory, the cutting-surface-deviation model does not require raw shape information, making the model's dimensions smaller and easier to learn than a naive shape transition model directly mapping the shapes. Through evaluation and comparison by simulation and real robot experiments, we confirm that our MBRL method can achieve high data efficiency for learning object shaping by grinding and also provide generalization capability for initial and target shapes that differ from the training data.

3.ExploitFlow, cyber security exploitation routes for Game Theory and AI research in robotics

Authors:Víctor Mayoral-Vilches, Gelei Deng, Yi Liu, Martin Pinzger, Stefan Rass

Abstract: This paper addresses the prevalent lack of tools to facilitate and empower Game Theory and Artificial Intelligence (AI) research in cybersecurity. The primary contribution is the introduction of ExploitFlow (EF), an AI and Game Theory-driven modular library designed for cyber security exploitation. EF aims to automate attacks, combining exploits from various sources, and capturing system states post-action to reason about them and understand potential attack trees. The motivation behind EF is to bolster Game Theory and AI research in cybersecurity, with robotics as the initial focus. Results indicate that EF is effective for exploring machine learning in robot cybersecurity. An artificial agent powered by EF, using Reinforcement Learning, outperformed both brute-force and human expert approaches, laying the path for using ExploitFlow for further research. Nonetheless, we identified several limitations in EF-driven agents, including a propensity to overfit, the scarcity and production cost of datasets for generalization, and challenges in interpreting networking states across varied security settings. To leverage the strengths of ExploitFlow while addressing identified shortcomings, we present Malism, our vision for a comprehensive automated penetration testing framework with ExploitFlow at its core.

4.Automated Vehicle Platform with Connected Driving Capabilities

Authors:Oskars Teikmanis, Aleksandrs Levinskis, Andris Ivars Mackus, Artis Rušiņš, Amr Elkenawy, Marta Tropa, Modris Greitans

Abstract: Augmenting automated vehicles to wirelessly detect and respond to external events before they are detectable by onboard sensors is crucial for developing context-aware driving strategies. To this end, we present an automated vehicle platform, designed with connectivity, ease of use and modularity in mind, both in hardware and software. It is based on the Kia Soul EV with a modified version of the Open-Source Car Control (OSCC) drive-by-wire module, uses the open-source Robot Operating System (ROS and ROS 2) in its software architecture, and provides a straightforward solution for transitioning from simulations to real-world tests. We demonstrate the effectiveness of the platform through a synchronised driving test, where sensor data is exchanged wirelessly, and a model-predictive controller is used to actuate the automated vehicle.

5.Online Obstacle evasion with Space-Filling Curves

Authors:Ashay Wakode, Arpita Sinha

Abstract: The paper presents a strategy for robotic exploration problems using Space-Filling curves (SFC). The region of interest is first tessellated, and the tiles/cells are connected using some SFC. A robot follows the SFC to explore the entire area. However, there could be obstacles that block the systematic movement of the robot. We overcome this problem by providing an evading technique that avoids the blocked tiles while ensuring all the free ones are visited at least once. The proposed strategy is online, implying that prior knowledge of the obstacles is not mandatory. It works for all SFCs, but for the sake of demonstration, we use Hilbert curve. We present the completeness of the algorithm and discuss its desirable properties with examples. We also address the non-uniform coverage problem using our strategy.

6.Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints

Authors:Yasunori Toshimitsu, Benedek Forrai, Barnabas Gavin Cangan, Ulrich Steger, Manuel Knecht, Stefan Weirich, Robert K. Katzschmann

Abstract: Biomimetic, dexterous robotic hands have the potential to replicate much of the tasks that a human can do, and to achieve status as a general manipulation platform. Recent advances in reinforcement learning (RL) frameworks have achieved remarkable performance in quadrupedal locomotion and dexterous manipulation tasks. Combined with GPU-based highly parallelized simulations capable of simulating thousands of robots in parallel, RL-based controllers have become more scalable and approachable. However, in order to bring RL-trained policies to the real world, we require training frameworks that output policies that can work with physical actuators and sensors as well as a hardware platform that can be manufactured with accessible materials yet is robust enough to run interactive policies. This work introduces the biomimetic tendon-driven Faive Hand and its system architecture, which uses tendon-driven rolling contact joints to achieve a 3D printable, robust high-DoF hand design. We model each element of the hand and integrate it into a GPU simulation environment to train a policy with RL, and achieve zero-shot transfer of a dexterous in-hand sphere rotation skill to the physical robot hand.

7.Nonprehensile Planar Manipulation through Reinforcement Learning with Multimodal Categorical Exploration

Authors:Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar

Abstract: Developing robot controllers capable of achieving dexterous nonprehensile manipulation, such as pushing an object on a table, is challenging. The underactuated and hybrid-dynamics nature of the problem, further complicated by the uncertainty resulting from the frictional interactions, requires sophisticated control behaviors. Reinforcement Learning (RL) is a powerful framework for developing such robot controllers. However, previous RL literature addressing the nonprehensile pushing task achieves low accuracy, non-smooth trajectories, and only simple motions, i.e. without rotation of the manipulated object. We conjecture that previously used unimodal exploration strategies fail to capture the inherent hybrid-dynamics of the task, arising from the different possible contact interaction modes between the robot and the object, such as sticking, sliding, and separation. In this work, we propose a multimodal exploration approach through categorical distributions, which enables us to train planar pushing RL policies for arbitrary starting and target object poses, i.e. positions and orientations, and with improved accuracy. We show that the learned policies are robust to external disturbances and observation noise, and scale to tasks with multiple pushers. Furthermore, we validate the transferability of the learned policies, trained entirely in simulation, to a physical robot hardware using the KUKA iiwa robot arm. See our supplemental video:

1.Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

Authors:Yang Wenkai Ji Ruihang Zhang Yuxiang Lei Hao, Zhao Zijie

Abstract: This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offline training, which is of some significance for other navigation training work. The pre-training and prioritized expert experience are proposed to reduce 80\% training time and has been verified to improve the 2 times reward of DRL. The advanced simulation gazebo with real physical modelling and dynamic equations reduce the gap between sim-to-real. We train our model a corridor environment, and evaluate the model in different environment getting the same effect. Compared to traditional method navigation, we can confirm the trained model can be directly applied into different scenarios and have the ability to no collision navigate. It was demonstrated that our DRL model have universal general capacity in different environment.

2.Uncertainty analysis for accurate ground truth trajectories with robotic total stations

Authors:Maxime Vaidis, William Dubois, Effie Daum, Damien LaRocque, François Pomerleau

Abstract: In the context of robotics, accurate ground truth positioning is essential for the development of Simultaneous Localization and Mapping (SLAM) and control algorithms. Robotic Total Stations (RTSs) provide accurate and precise reference positions in different types of outdoor environments, especially when compared to the limited accuracy of Global Navigation Satellite System (GNSS) in cluttered areas. Three RTSs give the possibility to obtain the six-Degrees Of Freedom (DOF) reference pose of a robotic platform. However, the uncertainty of every pose is rarely computed for trajectory evaluation. As evaluation algorithms are getting increasingly precise, it becomes crucial to take into account this uncertainty. We propose a method to compute this six-DOF uncertainty from the fusion of three RTSs based on Monte Carlo (MC) methods. This solution relies on point-to-point minimization to propagate the noise of RTSs on the pose of the robotic platform. Five main noise sources are identified to model this uncertainty: noise inherent to the instrument, tilt noise, atmospheric factors, time synchronization noise, and extrinsic calibration noise. Based on extensive experimental work, we compare the impact of each noise source on the prism uncertainty and the final estimated pose. Tested on more than 50 km of trajectories, our comparison highlighted the importance of the calibration noise and the measurement distance, which should be ideally under 75 m. Moreover, it has been noted that the uncertainty on the pose of the robot is not prominently affected by one particular noise source, compared to the others.

3.Mani-GPT: A Generative Model for Interactive Robotic Manipulation

Authors:Zhe Zhang, Wei Chaid, Jiankun Wang

Abstract: In real-world scenarios, human dialogues are multi-round and diverse. Furthermore, human instructions can be unclear and human responses are unrestricted. Interactive robots face difficulties in understanding human intents and generating suitable strategies for assisting individuals through manipulation. In this article, we propose Mani-GPT, a Generative Pre-trained Transformer (GPT) for interactive robotic manipulation. The proposed model has the ability to understand the environment through object information, understand human intent through dialogues, generate natural language responses to human input, and generate appropriate manipulation plans to assist the human. This makes the human-robot interaction more natural and humanized. In our experiment, Mani-GPT outperforms existing algorithms with an accuracy of 84.6% in intent recognition and decision-making for actions. Furthermore, it demonstrates satisfying performance in real-world dialogue tests with users, achieving an average response accuracy of 70%.

4.Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models

Authors:Joao Carvalho, An T. Le, Mark Baierl, Dorothea Koert, Jan Peters

Abstract: Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.

5.Active Acoustic Sensing for Robot Manipulation

Authors:Shihan Lu, Heather Culbertson

Abstract: Perception in robot manipulation has been actively explored with the goal of advancing and integrating vision and touch for global and local feature extraction. However, it is difficult to perceive certain object internal states, and the integration of visual and haptic perception is not compact and is easily biased. We propose to address these limitations by developing an active acoustic sensing method for robot manipulation. Active acoustic sensing relies on the resonant properties of the object, which are related to its material, shape, internal structure, and contact interactions with the gripper and environment. The sensor consists of a vibration actuator paired with a piezo-electric microphone. The actuator generates a waveform, and the microphone tracks the waveform's propagation and distortion as it travels through the object. This paper presents the sensing principles, hardware design, simulation development, and evaluation of physical and simulated sensory data under different conditions as a proof-of-concept. This work aims to provide fundamentals on a useful tool for downstream robot manipulation tasks using active acoustic sensing, such as object recognition, grasping point estimation, object pose estimation, and external contact formation detection.

6.Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters using Residual Reinforcement Learning

Authors:Yu Ishihara, Yuichi Hazama, Kousuke Suzuki, Jerry Jun Yokono, Kohtaro Sabe, Kenta Kawamoto

Abstract: Wind resistance control is an essential feature for quadcopters to maintain their position to avoid deviation from target position and prevent collisions with obstacles. Conventionally, cascaded PID controller is used for the control of quadcopters for its simplicity and ease of tuning its parameters. However, it is weak against wind disturbances and the quadcopter can easily deviate from target position. In this work, we propose a residual reinforcement learning based approach to build a wind resistance controller of a quadcopter. By learning only the residual that compensates the disturbance, we can continue using the cascaded PID controller as the base controller of the quadcopter but improve its performance against wind disturbances. To avoid unexpected crashes and destructions of quadcopters, our method does not require real hardware for data collection and training. The controller is trained only on a simulator and directly applied to the target hardware without extra finetuning process. We demonstrate the effectiveness of our approach through various experiments including an experiment in an outdoor scene with wind speed greater than 13 m/s. Despite its simplicity, our controller reduces the position deviation by approximately 50% compared to the quadcopter controlled with the conventional cascaded PID controller. Furthermore, trained controller is robust and preserves its performance even though the quadcopter's mass and propeller's lift coefficient is changed between 50% to 150% from original training time.

7.Towards a Safe Real-Time Motion Planning Framework for Autonomous Driving Systems: An MPPI Approach

Authors:Mehdi Testouri, Gamal Elghazaly, Raphael Frank

Abstract: Planning safe trajectories in Autonomous Driving Systems (ADS) is a complex problem to solve in real-time. The main challenge to solve this problem arises from the various conditions and constraints imposed by road geometry, semantics and traffic rules, as well as the presence of dynamic agents. Recently, Model Predictive Path Integral (MPPI) has shown to be an effective framework for optimal motion planning and control in robot navigation in unstructured and highly uncertain environments. In this paper, we formulate the motion planning problem in ADS as a nonlinear stochastic dynamic optimization problem that can be solved using an MPPI strategy. The main technical contribution of this work is a method to handle obstacles within the MPPI formulation safely. In this method, obstacles are approximated by circles that can be easily integrated into the MPPI cost formulation while considering safety margins. The proposed MPPI framework has been efficiently implemented in our autonomous vehicle and experimentally validated using three different primitive scenarios. Experimental results show that generated trajectories are safe, feasible and perfectly achieve the planning objective. The video results as well as the open-source implementation are available at:

8.Modelling and simulation of a commercially available dielectric elastomer actuator

Authors:Lukas Sohlbach, Hamza Hobbani, Chistopher Blase, Fernando Perez-Peña, Karsten Schmidt

Abstract: In order to fully harness the potential of dielectric elastomer actu-ators (DEAs) in soft robots, advanced control methods are need-ed. An important groundwork for this is the development of a control-oriented model that can adequately describe the underly-ing dynamics of a DEA. A common feature of existing models is that always custom-made DEAs were investigated. This makes the modelling process easier, as all specifications and the struc-ture of the actuator are well known. In the case of a commercial actuator, however, only the information from the manufacturer is available and must be checked or completed during the modelling process. The aim of this paper is to explore how a commercial stacked silicone-based DEA can be modelled and how complex the model should be to properly replicate the features of the actu-ator. The static description has demonstrated the suitability of Hooke's law. In the case of dynamic description, it is shown that no viscoelastic model is needed for control-oriented modelling. However, if all features of the DEA are considered, the general-ized Kelvin-Maxwell model with three Maxwell elements shows good results, stability and computational efficiency.

9.Joint Out-of-Distribution Detection and Uncertainty Estimation for Trajectory Predictio

Authors:Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Dietmayer, Vasileios Belagiannis

Abstract: Despite the significant research efforts on trajectory prediction for automated driving, limited work exists on assessing the prediction reliability. To address this limitation we propose an approach that covers two sources of error, namely novel situations with out-of-distribution (OOD) detection and the complexity in in-distribution (ID) situations with uncertainty estimation. We introduce two modules next to an encoder-decoder network for trajectory prediction. Firstly, a Gaussian mixture model learns the probability density function of the ID encoder features during training, and then it is used to detect the OOD samples in regions of the feature space with low likelihood. Secondly, an error regression network is applied to the encoder, which learns to estimate the trajectory prediction error in supervised training. During inference, the estimated prediction error is used as the uncertainty. In our experiments, the combination of both modules outperforms the prior work in OOD detection and uncertainty estimation, on the Shifts robust trajectory prediction dataset by $2.8 \%$ and $10.1 \%$, respectively. The code is publicly available.

10.NeuroSwarm: Multi-Agent Neural 3D Scene Reconstruction and Segmentation with UAV for Optimal Navigation of Quadruped Robot

Authors:Iana Zhura, Denis Davletshin, Nipun Dhananjaya Weerakkodi Mudalige, Aleksey Fedoseev, Robinroy Peter, Dzmitry Tsetserukou

Abstract: Quadruped robots have the distinct ability to adapt their body and step height to navigate through cluttered environments. Nonetheless, for these robots to utilize their full potential in real-world scenarios, they require awareness of their environment and obstacle geometry. We propose a novel multi-agent robotic system that incorporates cutting-edge technologies. The proposed solution features a 3D neural reconstruction algorithm that enables navigation of a quadruped robot in both static and semi-static environments. The prior areas of the environment are also segmented according to the quadruped robots' abilities to pass them. Moreover, we have developed an adaptive neural field optimal motion planner (ANFOMP) that considers both collision probability and obstacle height in 2D space.Our new navigation and mapping approach enables quadruped robots to adjust their height and behavior to navigate under arches and push through obstacles with smaller dimensions. The multi-agent mapping operation has proven to be highly accurate, with an obstacle reconstruction precision of 82%. Moreover, the quadruped robot can navigate with 3D obstacle information and the ANFOMP system, resulting in a 33.3% reduction in path length and a 70% reduction in navigation time.

11.A Compliant Robotic Leg Based on Fibre Jamming

Authors:Lois Liow, James Brett, Josh Pinskier, Lauren Hanson, Louis Tidswell, Navinda Kottege, David Howard

Abstract: Humans possess a remarkable ability to react to sudden and unpredictable perturbations through immediate mechanical responses, which harness the visco-elastic properties of muscles to perform auto-corrective movements to maintain balance. In this paper, we propose a novel design of a robotic leg inspired by this mechanism. We develop multi-material fibre jammed tendons, and demonstrate their use as passive compliant mechanisms to achieve variable joint stiffness and improve stability. Through numerical simulations and extensive experimentation, we demonstrate the ability for our system to achieve a wide range of potentially beneficial compliance regimes. We show the role and contribution of each tendon quantitatively by evaluating their individual force contribution in resisting rotational perturbations. We also perform walking experiments with programmed bioinspired gaits that varying the stiffness of the tendons throughout the gait cycle, demonstrating a stable and consistent behaviour. We show the potential of such systems when integrated into legged robots, where compliance and shock absorption can be provided entirely through the morphological properties of the leg.

12.Not All Actions Are Created Equal: Bayesian Optimal Experimental Design for Safe and Optimal Nonlinear System Identification

Authors:Parker Ewen, Gitesh Gunjal, Joey Wilson, Jinsun Liu, Challen Enninful Adu, Ram Vasudevan

Abstract: Uncertainty in state or model parameters is common in robotics and typically handled by acquiring system measurements that yield information about the uncertain quantities of interest. Inputs to a nonlinear dynamical system yield outcomes that produce varying amounts of information about the underlying uncertain parameters of the system. To maximize information gained with respect to these uncertain parameters we present a Bayesian approach to data collection for system identification called Bayesian Optimal Experimental Design (BOED). The formulation uses parameterized trajectories and cubature to compute maximally informative system trajectories which obtain as much information as possible about unknown system parameters while also ensuring safety under mild assumptions. The proposed method is applicable to non-linear and non-Gaussian systems and is applied to a high-fidelity vehicle model from the literature. It is shown the proposed approach requires orders of magnitude fewer samples compared to state-of-the-art BOED algorithms from the literature while simultaneously providing safety guarantees.

13.Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter

Authors:Luca Crupi, Elia Cereda, Alessandro Giusti, Daniele Palossi

Abstract: Nano-quadcopters are versatile platforms attracting the interest of both academia and industry. Their tiny form factor, i.e., $\,$10 cm diameter, makes them particularly useful in narrow scenarios and harmless in human proximity. However, these advantages come at the price of ultra-constrained onboard computational and sensorial resources for autonomous operations. This work addresses the task of estimating human pose aboard nano-drones by fusing depth and images in a novel CNN exclusively trained in simulation yet capable of robust predictions in the real world. We extend a commercial off-the-shelf (COTS) Crazyflie nano-drone -- equipped with a 320$\times$240 px camera and an ultra-low-power System-on-Chip -- with a novel multi-zone (8$\times$8) depth sensor. We design and compare different deep-learning models that fuse depth and image inputs. Our models are trained exclusively on simulated data for both inputs, and transfer well to the real world: field testing shows an improvement of 58% and 51% of our depth+camera system w.r.t. a camera-only State-of-the-Art baseline on the horizontal and angular mean pose errors, respectively. Our prototype is based on COTS components, which facilitates reproducibility and adoption of this novel class of systems.

1.Reward Shaping for Building Trustworthy Robots in Sequential Human-Robot Interaction

Authors:Yaohui Guo, X. Jessie Yang, Cong Shi

Abstract: Trust-aware human-robot interaction (HRI) has received increasing research attention, as trust has been shown to be a crucial factor for effective HRI. Research in trust-aware HRI discovered a dilemma -- maximizing task rewards often leads to decreased human trust, while maximizing human trust would compromise task performance. In this work, we address this dilemma by formulating the HRI process as a two-player Markov game and utilizing the reward-shaping technique to improve human trust while limiting performance loss. Specifically, we show that when the shaping reward is potential-based, the performance loss can be bounded by the potential functions evaluated at the final states of the Markov game. We apply the proposed framework to the experience-based trust model, resulting in a linear program that can be efficiently solved and deployed in real-world applications. We evaluate the proposed framework in a simulation scenario where a human-robot team performs a search-and-rescue mission. The results demonstrate that the proposed framework successfully modifies the robot's optimal policy, enabling it to increase human trust at a minimal task performance cost.

2.Height Change Feature Based Free Space Detection

Authors:Steven Schreck, Hannes Reichert, Manuel Hetzel, Konrad Doll, Bernhard Sick

Abstract: In the context of autonomous forklifts, ensuring non-collision during travel, pick, and place operations is crucial. To accomplish this, the forklift must be able to detect and locate areas of free space and potential obstacles in its environment. However, this is particularly challenging in highly dynamic environments, such as factory sites and production halls, due to numerous industrial trucks and workers moving throughout the area. In this paper, we present a novel method for free space detection, which consists of the following steps. We introduce a novel technique for surface normal estimation relying on spherical projected LiDAR data. Subsequently, we employ the estimated surface normals to detect free space. The presented method is a heuristic approach that does not require labeling and can ensure real-time application due to high processing speed. The effectiveness of the proposed method is demonstrated through its application to a real-world dataset obtained on a factory site both indoors and outdoors, and its evaluation on the Semantic KITTI dataset [2]. We achieved a mean Intersection over Union (mIoU) score of 50.90 % on the benchmark dataset, with a processing speed of 105 Hz. In addition, we evaluated our approach on our factory site dataset. Our method achieved a mIoU score of 63.30 % at 54 Hz

3.Grasp Stability Assessment Through Attention-Guided Cross-Modality Fusion and Transfer Learning

Authors:Zhuangzhuang Zhang, Zhenning Zhou, Haili Wang, Zhinan Zhang, Huang Huang, Qixin Cao

Abstract: Extensive research has been conducted on assessing grasp stability, a crucial prerequisite for achieving optimal grasping strategies, including the minimum force grasping policy. However, existing works employ basic feature-level fusion techniques to combine visual and tactile modalities, resulting in the inadequate utilization of complementary information and the inability to model interactions between unimodal features. This work proposes an attention-guided cross-modality fusion architecture to comprehensively integrate visual and tactile features. This model mainly comprises convolutional neural networks (CNNs), self-attention, and cross-attention mechanisms. In addition, most existing methods collect datasets from real-world systems, which is time-consuming and high-cost, and the datasets collected are comparatively limited in size. This work establishes a robotic grasping system through physics simulation to collect a multimodal dataset. To address the sim-to-real transfer gap, we propose a migration strategy encompassing domain randomization and domain adaptation techniques. The experimental results demonstrate that the proposed fusion framework achieves markedly enhanced prediction performance (approximately 10%) compared to other baselines. Moreover, our findings suggest that the trained model can be reliably transferred to real robotic systems, indicating its potential to address real-world challenges.

4.Push to know! -- Visuo-Tactile based Active Object Parameter Inference with Dual Differentiable Filtering

Authors:Anirvan Dutta, Etienne Burdet, Mohsen Kaboli

Abstract: For robotic systems to interact with objects in dynamic environments, it is essential to perceive the physical properties of the objects such as shape, friction coefficient, mass, center of mass, and inertia. This not only eases selecting manipulation action but also ensures the task is performed as desired. However, estimating the physical properties of especially novel objects is a challenging problem, using either vision or tactile sensing. In this work, we propose a novel framework to estimate key object parameters using non-prehensile manipulation using vision and tactile sensing. Our proposed active dual differentiable filtering (ADDF) approach as part of our framework learns the object-robot interaction during non-prehensile object push to infer the object's parameters. Our proposed method enables the robotic system to employ vision and tactile information to interactively explore a novel object via non-prehensile object push. The novel proposed N-step active formulation within the differentiable filtering facilitates efficient learning of the object-robot interaction model and during inference by selecting the next best exploratory push actions (where to push? and how to push?). We extensively evaluated our framework in simulation and real-robotic scenarios, yielding superior performance to the state-of-the-art baseline.

5.Ethical Decision-making for Autonomous Driving based on LSTM Trajectory Prediction Network

Authors:Wen Wei, Jiankun Wang

Abstract: The development of autonomous vehicles has brought a great impact and changes to the transportation industry, offering numerous benefits in terms of safety and efficiency. However, one of the key challenges that autonomous driving faces is how to make ethical decisions in complex situations. To address this issue, in this article, a novel trajectory prediction method is proposed to achieve ethical decision-making for autonomous driving. Ethical considerations are integrated into the decision-making process of autonomous vehicles by quantifying the utility principle and incorporating them into mathematical formulas. Furthermore, trajectory prediction is optimized using LSTM network with an attention module, resulting in improved accuracy and reliability in trajectory planning and selection. Through extensive simulation experiments, we demonstrate the effectiveness of the proposed method in making ethical decisions and selecting optimal trajectories.

6.A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles' Riskiness

Authors:Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli

Abstract: Autonomous Vehicles (AVs) have the potential to provide numerous societal benefits, such as decreased road accidents and increased overall transportation efficiency. However, quantifying the risk associated with AVs is challenging due to the lack of historical data and the rapidly evolving technology. This paper presents a data-driven framework for comparing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to find the most critical scenarios but also to assess the frequency and severity of risk of AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown -- through worst- and best-case analyses -- making the method useful also to external third-party risk assessors. Our experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and aids in addressing legislative and insurance concerns surrounding this emerging technology.

7.Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making

Authors:Stanislav Kikot

Abstract: In this paper we show how rule-based decision making can be combined with traditional motion planning techniques to achieve human-like behavior of a self-driving vehicle in complex traffic situations. We give and discuss examples of decision rules in autonomous driving. We draw on these examples to illustrate that developing techniques for spatial awareness of robots is an exciting activity which deserves more attention from spatial reasoning community that it had received so far.

8.Optimization-Based Motion Planning for Autonomous Agricultural Vehicles Turning in Constrained Headlands

Authors:Chen Peng, Peng Wei, Zhenghao Fei, Yuankai Zhu, Stavros G. Vougioukas

Abstract: Headland maneuvering is a crucial aspect of unmanned field operations for autonomous agricultural vehicles (AAVs). While motion planning for headland turning in open fields has been extensively studied and integrated into commercial auto-guidance systems, the existing methods primarily address scenarios with ample headland space and thus may not work in more constrained headland geometries. Commercial orchards often contain narrow and irregularly shaped headlands, which may include static obstacles,rendering the task of planning a smooth and collision-free turning trajectory difficult. To address this challenge, we propose an optimization-based motion planning algorithm for headland turning under geometrical constraints imposed by field geometry and obstacles.

9.Virtual Reality Based Robot Teleoperation via Human-Scene Interaction

Authors:Lingxiao Meng, Jiangshan Liu, Wei Chai, Jiankun Wang, Max Q. -H. Meng

Abstract: Robot teleoperation gains great success in various situations, including chemical pollution rescue, disaster relief, and long-distance manipulation. In this article, we propose a virtual reality (VR) based robot teleoperation system to achieve more efficient and natural interaction with humans in different scenes. A user-friendly VR interface is designed to help users interact with a desktop scene using their hands efficiently and intuitively. To improve user experience and reduce workload, we simulate the process in the physics engine to help build a preview of the scene after manipulation in the virtual scene before execution. We conduct experiments with different users and compare our system with a direct control method across several teleoperation tasks. The user study demonstrates that the proposed system enables users to perform operations more instinctively with a lighter mental workload. Users can perform pick-and-place and object-stacking tasks in a considerably short time, even for beginners. Our code is available at

1.Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation

Authors:Yunze Hu, Jiaao Chen, Kangjie Zhou, Han Gao, Yutong Li, Chang Liu

Abstract: Parking occupancy estimation holds significant potential in facilitating parking resource management and mitigating traffic congestion. Existing approaches employ robotic systems to detect the occupancy status of individual parking spaces and primarily focus on enhancing detection accuracy through perception pipelines. However, these methods often overlook the crucial aspect of robot path planning, which can hinder the accurate estimation of the entire parking area. In light of these limitations, we introduce the problem of informative path planning for parking occupancy estimation using autonomous vehicles and formulate it as a Partially Observable Markov Decision Process (POMDP) task. Then, we develop an occupancy state transition model and introduce a Bayes filter to estimate occupancy based on noisy sensor measurements. Subsequently, we propose the Monte Carlo Bayes Filter Tree, a computationally efficient algorithm that leverages progressive widening to generate informative paths. We demonstrate that the proposed approach outperforms the benchmark methods in diverse simulation environments, effectively striking a balance between optimality and computational efficiency.

2.Advancing Frame-Dropping in Multi-Object Tracking-by-Detection Systems Through Event-Based Detection Triggering

Authors:Matti Henning, Michael Buchholz, Klaus Dietmayer

Abstract: With rising computational requirements modern automated vehicles (AVs) often consider trade-offs between energy consumption and perception performance, potentially jeopardizing their safe operation. Frame-dropping in tracking-by-detection perception systems presents a promising approach, although late traffic participant detection might be induced. In this paper, we extend our previous work on frame-dropping in tracking-by-detection perception systems. We introduce an additional event-based triggering mechanism using camera object detections to increase both the system's efficiency, as well as its safety. Evaluating both single and multi-modal tracking methods we show that late object detections are mitigated while the potential for reduced energy consumption is significantly increased, reaching nearly 60 Watt per reduced point in HOTA score.

3.Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

Authors:Yun Chen, Jiaping Xiao

Abstract: Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.

4.Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches

Authors:Pia Hanfeld, Khaled Wahba, Marina M. -C. Höhne, Michael Bussmann, Wolfgang Hönig

Abstract: Autonomous flying robots, such as multirotors, often rely on deep learning models that makes predictions based on a camera image, e.g. for pose estimation. These models can predict surprising results if applied to input images outside the training domain. This fault can be exploited by adversarial attacks, for example, by computing small images, so-called adversarial patches, that can be placed in the environment to manipulate the neural network's prediction. We introduce flying adversarial patches, where multiple images are mounted on at least one other flying robot and therefore can be placed anywhere in the field of view of a victim multirotor. By introducing the attacker robots, the system is extended to an adversarial multi-robot system. For an effective attack, we compare three methods that simultaneously optimize multiple adversarial patches and their position in the input image. We show that our methods scale well with the number of adversarial patches. Moreover, we demonstrate physical flights with two robots, where we employ a novel attack policy that uses the computed adversarial patches to kidnap a robot that was supposed to follow a human.

5.DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

Authors:Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li

Abstract: End-to-end autonomous driving aims to build a fully differentiable system that takes raw sensor data as inputs and directly outputs the planned trajectory or control signals of the ego vehicle. State-of-the-art methods usually follow the `Teacher-Student' paradigm. The Teacher model uses privileged information (ground-truth states of surrounding agents and map elements) to learn the driving strategy. The student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model. By eliminating the noise of the perception part during planning learning, state-of-the-art works could achieve better performance with significantly less data compared to those coupled ones. However, under the current Teacher-Student paradigm, the student model still needs to learn a planning head from scratch, which could be challenging due to the redundant and noisy nature of raw sensor inputs and the casual confusion issue of behavior cloning. In this work, we aim to explore the possibility of directly adopting the strong teacher model to conduct planning while letting the student model focus more on the perception part. We find that even equipped with a SOTA perception model, directly letting the student model learn the required inputs of the teacher model leads to poor driving performance, which comes from the large distribution gap between predicted privileged inputs and the ground-truth. To this end, we propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules. Additionally, since the pure learning-based teacher model itself is imperfect and occasionally breaks safety rules, we propose a method of action-guided feature learning with a mask for those imperfect teacher features to further inject the priors of hand-crafted rules into the learning process.

6.DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes

Authors:Philipp Blättner, Johannes Brand, Gerhard Neumann, Ngo Anh Vien

Abstract: Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulations. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting their versatility and efficiency. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with two main contributions to address this challenge. Firstly, a novel neural grasp planner is proposed, which predicts a new grasp representation to enable versatile and dense grasp predictions. Secondly, a scene creation and label mapping method is developed for dense labeling of multi-fingered robotic hands, which allows a dense association of ground truth grasps. The proposed approach is evaluated through simulation studies and compared to existing approaches. The results demonstrate the effectiveness of the proposed approach in predicting versatile and dense grasps, and in advancing the field of robotic grasping.

7.UVIO: An UWB-Aided Visual-Inertial Odometry Framework with Bias-Compensated Anchors Initialization

Authors:Giulio Delama, Farhad Shamsfakhr, Stephan Weiss, Daniele Fontanelli, Alessandro Fornasier

Abstract: This paper introduces UVIO, a multi-sensor framework that leverages Ultra Wide Band (UWB) technology and Visual-Inertial Odometry (VIO) to provide robust and low-drift localization. In order to include range measurements in state estimation, the position of the UWB anchors must be known. This study proposes a multi-step initialization procedure to map multiple unknown anchors by an Unmanned Aerial Vehicle (UAV), in a fully autonomous fashion. To address the limitations of initializing UWB anchors via a random trajectory, this paper uses the Geometric Dilution of Precision (GDOP) as a measure of optimality in anchor position estimation, to compute a set of optimal waypoints and synthesize a trajectory that minimizes the mapping uncertainty. After the initialization is complete, the range measurements from multiple anchors, including measurement biases, are tightly integrated into the VIO system. While in range of the initialized anchors, the VIO drift in position and heading is eliminated. The effectiveness of UVIO and our initialization procedure has been validated through a series of simulations and real-world experiments.

8.Understanding URDF: A Dataset and Analysis

Authors:Daniella Tola, Peter Corke

Abstract: As the complexity of robot systems increases, it becomes more effective to simulate them before deployment. To do this, a model of the robot's kinematics or dynamics is required, and the most commonly used format is the Unified Robot Description Format (URDF). This article presents, to our knowledge, the first dataset of URDF files from various industrial and research organizations, with metadata describing each robot, its type, manufacturer, and the source of the model. The dataset contains 322 URDF files of which 195 are unique robot models, meaning the excess URDFs are either of a robot that is multiply defined across sources or URDF variants of the same robot. We analyze the files in the dataset, where we, among other things, provide information on how they were generated, which mesh file types are most commonly used, and compare models of multiply defined robots. The intention of this article is to build a foundation of knowledge on URDF and how it is used based on publicly available URDF files. Publishing the dataset, analysis, and the scripts and tools used enables others using, researching or developing URDFs to easily access this data and use it in their own work.

9.AOSoar: Autonomous Orographic Soaring of a Micro Air Vehicle

Authors:Sunyou Hwang, Bart D. W. Remes, Guido C. H. E. de Croon

Abstract: Utilizing wind hovering techniques of soaring birds can save energy expenditure and improve the flight endurance of micro air vehicles (MAVs). Here, we present a novel method for fully autonomous orographic soaring without a priori knowledge of the wind field. Specifically, we devise an Incremental Nonlinear Dynamic Inversion (INDI) controller with control allocation, adapting it for autonomous soaring. This allows for both soaring and the use of the throttle if necessary, without changing any gain or parameter during the flight. Furthermore, we propose a simulated-annealing-based optimization method to search for soaring positions. This enables for the first time an MAV to autonomously find a feasible soaring position while minimizing throttle usage and other control efforts. Autonomous orographic soaring was performed in the wind tunnel. The wind speed and incline of a ramp were changed during the soaring flight. The MAV was able to perform autonomous orographic soaring for flight times of up to 30 minutes. The mean throttle usage was only 0.25% for the entire soaring flight, whereas normal powered flight requires 38%. Also, it was shown that the MAV can find a new soaring spot when the wind field changes during the flight.

10.Enhancing Sample Efficiency and Uncertainty Compensation in Learning-based Model Predictive Control for Aerial Robots

Authors:Kong Yao Chee, Thales C. Silva, M. Ani Hsieh, George J. Pappas

Abstract: The recent increase in data availability and reliability has led to a surge in the development of learning-based model predictive control (MPC) frameworks for robot systems. Despite attaining substantial performance improvements over their non-learning counterparts, many of these frameworks rely on an offline learning procedure to synthesize a dynamics model. This implies that uncertainties encountered by the robot during deployment are not accounted for in the learning process. On the other hand, learning-based MPC methods that learn dynamics models online are computationally expensive and often require a significant amount of data. To alleviate these shortcomings, we propose a novel learning-enhanced MPC framework that incorporates components from $\mathcal{L}_1$ adaptive control into learning-based MPC. This integration enables the accurate compensation of both matched and unmatched uncertainties in a sample-efficient way, enhancing the control performance during deployment. In our proposed framework, we present two variants and apply them to the control of a quadrotor system. Through simulations and physical experiments, we demonstrate that the proposed framework not only allows the synthesis of an accurate dynamics model on-the-fly, but also significantly improves the closed-loop control performance under a wide range of spatio-temporal uncertainties.

11.Sliding Touch-based Exploration for Modeling Unknown Object Shape with Multi-fingered Hands

Authors:Yiting Chen, Ahmet Ercan Tekden, Marc Peter Deisenroth, Yasemin Bekiroglu

Abstract: Efficient and accurate 3D object shape reconstruction contributes significantly to the success of a robot's physical interaction with its environment. Acquiring accurate shape information about unknown objects is challenging, especially in unstructured environments, e.g. the vision sensors may only be able to provide a partial view. To address this issue, tactile sensors could be employed to extract local surface information for more robust unknown object shape estimation. In this paper, we propose a novel approach for efficient unknown 3D object shape exploration and reconstruction using a multi-fingered hand equipped with tactile sensors and a depth camera only providing a partial view. We present a multi-finger sliding touch strategy for efficient shape exploration using a Bayesian Optimization approach and a single-leader-multi-follower strategy for multi-finger smooth local surface perception. We evaluate our proposed method by estimating the 3D shape of objects from the YCB and OCRTOC datasets based on simulation and real robot experiments. The proposed approach yields successful reconstruction results relying on only a few continuous sliding touches. Experimental results demonstrate that our method is able to model unknown objects in an efficient and accurate way.

12.Epistemic Planning for Heterogeneous Robotic Systems

Authors:Lauren Bramblett, Nicola Bezzo

Abstract: In applications such as search and rescue or disaster relief, heterogeneous multi-robot systems (MRS) can provide significant advantages for complex objectives that require a suite of capabilities. However, within these application spaces, communication is often unreliable, causing inefficiencies or outright failures to arise in most MRS algorithms. Many researchers tackle this problem by requiring all robots to either maintain communication using proximity constraints or assuming that all robots will execute a predetermined plan over long periods of disconnection. The latter method allows for higher levels of efficiency in a MRS, but failures and environmental uncertainties can have cascading effects across the system, especially when a mission objective is complex or time-sensitive. To solve this, we propose an epistemic planning framework that allows robots to reason about the system state, leverage heterogeneous system makeups, and optimize information dissemination to disconnected neighbors. Dynamic epistemic logic formalizes the propagation of belief states, and epistemic task allocation and gossip is accomplished via a mixed integer program using the belief states for utility predictions and planning. The proposed framework is validated using simulations and experiments with heterogeneous vehicles.

13.VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes

Authors:Yuhao Lu, Yixuan Fan, Beixing Deng, Fangfu Liu, Yali Li, Shengjin Wang

Abstract: Robotic grasping faces new challenges in human-robot-interaction scenarios. We consider the task that the robot grasps a target object designated by human's language directives. The robot not only needs to locate a target based on vision-and-language information, but also needs to predict the reasonable grasp pose candidate at various views and postures. In this work, we propose a novel interactive grasp policy, named Visual-Lingual-Grasp (VL-Grasp), to grasp the target specified by human language. First, we build a new challenging visual grounding dataset to provide functional training data for robotic interactive perception in indoor environments. Second, we propose a 6-Dof interactive grasp policy combined with visual grounding and 6-Dof grasp pose detection to extend the universality of interactive grasping. Third, we design a grasp pose filter module to enhance the performance of the policy. Experiments demonstrate the effectiveness and extendibility of the VL-Grasp in real world. The VL-Grasp achieves a success rate of 72.5\% in different indoor scenes. The code and dataset is available at

1.Part-level Scene Reconstruction Affords Robot Interaction

Authors:Zeyu Zhang, Lexing Zhang, Zaijin Wang, Ziyuan Jiao, Muzhi Han, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

Abstract: Existing methods for reconstructing interactive scenes primarily focus on replacing reconstructed objects with CAD models retrieved from a limited database, resulting in significant discrepancies between the reconstructed and observed scenes. To address this issue, our work introduces a part-level reconstruction approach that reassembles objects using primitive shapes. This enables us to precisely replicate the observed physical scenes and simulate robot interactions with both rigid and articulated objects. By segmenting reconstructed objects into semantic parts and aligning primitive shapes to these parts, we assemble them as CAD models while estimating kinematic relations, including parent-child contact relations, joint types, and parameters. Specifically, we derive the optimal primitive alignment by solving a series of optimization problems, and estimate kinematic relations based on part semantics and geometry. Our experiments demonstrate that part-level scene reconstruction outperforms object-level reconstruction by accurately capturing finer details and improving precision. These reconstructed part-level interactive scenes provide valuable kinematic information for various robotic applications; we showcase the feasibility of certifying mobile manipulation planning in these interactive scenes before executing tasks in the physical world.

2.Model-free Grasping with Multi-Suction Cup Grippers for Robotic Bin Picking

Authors:Philipp Schillinger, Miroslav Gabriel, Alexander Kuss, Hanna Ziesche, Ngo Anh Vien

Abstract: This paper presents a novel method for model-free prediction of grasp poses for suction grippers with multiple suction cups. Our approach is agnostic to the design of the gripper and does not require gripper-specific training data. In particular, we propose a two-step approach, where first, a neural network predicts pixel-wise grasp quality for an input image to indicate areas that are generally graspable. Second, an optimization step determines the optimal gripper selection and corresponding grasp poses based on configured gripper layouts and activation schemes. In addition, we introduce a method for automated labeling for supervised training of the grasp quality network. Experimental evaluations on a real-world industrial application with bin picking scenes of varying difficulty demonstrate the effectiveness of our method.

3.Learning Generalizable Tool Use with Non-rigid Grasp-pose Registration

Authors:Malte Mosbach, Sven Behnke

Abstract: Tool use, a hallmark feature of human intelligence, remains a challenging problem in robotics due the complex contacts and high-dimensional action space. In this work, we present a novel method to enable reinforcement learning of tool use behaviors. Our approach provides a scalable way to learn the operation of tools in a new category using only a single demonstration. To this end, we propose a new method for generalizing grasping configurations of multi-fingered robotic hands to novel objects. This is used to guide the policy search via favorable initializations and a shaped reward signal. The learned policies solve complex tool use tasks and generalize to unseen tools at test time. Visualizations and videos of the trained policies are available at

4.Value-Informed Skill Chaining for Policy Learning of Long-Horizon Tasks with Surgical Robot

Authors:Tao Huang, Kai Chen, Wang Wei, Jianan Li, Yonghao Long, Qi Dou

Abstract: Reinforcement learning is still struggling with solving long-horizon surgical robot tasks which involve multiple steps over an extended duration of time due to the policy exploration challenge. Recent methods try to tackle this problem by skill chaining, in which the long-horizon task is decomposed into multiple subtasks for easing the exploration burden and subtask policies are temporally connected to complete the whole long-horizon task. However, smoothly connecting all subtask policies is difficult for surgical robot scenarios. Not all states are equally suitable for connecting two adjacent subtasks. An undesired terminate state of the previous subtask would make the current subtask policy unstable and result in a failed execution. In this work, we introduce value-informed skill chaining (ViSkill), a novel reinforcement learning framework for long-horizon surgical robot tasks. The core idea is to distinguish which terminal state is suitable for starting all the following subtask policies. To achieve this target, we introduce a state value function that estimates the expected success probability of the entire task given a state. Based on this value function, a chaining policy is learned to instruct subtask policies to terminate at the state with the highest value so that all subsequent policies are more likely to be connected for accomplishing the task. We demonstrate the effectiveness of our method on three complex surgical robot tasks from SurRoL, a comprehensive surgical simulation platform, achieving high task success rates and execution efficiency. Code is available at $\href{}{\text{}}$.

5.Human Preferences and Robot Constraints Aware Shared Control for Smooth Follower Motion Execution

Authors:Qibin Chen, Yaonan Zhu, Kay Hansel, Tadayoshi Aoyama, Yasuhisa Hasegawa

Abstract: With the continuous advancement of robot teleoperation technology, shared control is used to reduce the physical and mental load of the operator in teleoperation system. This paper proposes an alternating shared control framework for object grasping that considers both operator's preferences through their manual manipulation and the constraints of the follower robot. The switching between manual mode and automatic mode enables the operator to intervene the task according to their wishes. The generation of the grasping pose takes into account the current state of the operator's hand pose, as well as the manipulability of the robot. The object grasping experiment indicates that the use of the proposed grasping pose selection strategy leads to smoother follower movements when switching from manual mode to automatic mode.

6.Extraction of Road Users' Behavior From Realistic Data According to Assumptions in Safety-Related Models for Automated Driving Systems

Authors:Novel Certad, Sebastian Tschernuth, Cristina Olaverri-Monreal

Abstract: In this work, we utilized the methodology outlined in the IEEE Standard 2846-2022 for "Assumptions in Safety-Related Models for Automated Driving Systems" to extract information on the behavior of other road users in driving scenarios. This method includes defining high-level scenarios, determining kinematic characteristics, evaluating safety relevance, and making assumptions on reasonably predictable behaviors. The assumptions were expressed as kinematic bounds. The numerical values for these bounds were extracted using Python scripts to process realistic data from the UniD dataset. The resulting information enables Automated Driving Systems designers to specify the parameters and limits of a road user's state in a specific scenario. This information can be utilized to establish starting conditions for testing a vehicle that is equipped with an Automated Driving System in simulations or on actual roads.

7.An Overconstrained Vertical Darboux Mechanism

Authors:Johannes Siegele, Martin Pfurner

Abstract: In this article, we will construct an overconstrained closed-loop linkage consisting of four revolute and one cylindrical joint. It is obtained by factorization of a prescribed vertical Darboux motion. We will investigate the kinematic behaviour of the obtained mechanism, which turns out to have multiple operation modes. Under certain conditions on the design parameters, two of the operation modes will correspond to vertical Darboux motions. It turns out, that for these design parameters, there also exists a second assembly mode.

8.Poly-MOT: A Polyhedral Framework For 3D Multi-Object Tracking

Authors:Xiaoyu Li, Tao Xie, Dedong Liu, Jinghan Gao, Kun Dai, Zhiqiang Jiang, Lijun Zhao, Ke Wang

Abstract: 3D Multi-object tracking (MOT) empowers mobile robots to accomplish well-informed motion planning and navigation tasks by providing motion trajectories of surrounding objects. However, existing 3D MOT methods typically employ a single similarity metric and physical model to perform data association and state estimation for all objects. With large-scale modern datasets and real scenes, there are a variety of object categories that commonly exhibit distinctive geometric properties and motion patterns. In this way, such distinctions would enable various object categories to behave differently under the same standard, resulting in erroneous matches between trajectories and detections, and jeopardizing the reliability of downstream tasks (navigation, etc.). Towards this end, we propose Poly-MOT, an efficient 3D MOT method based on the Tracking-By-Detection framework that enables the tracker to choose the most appropriate tracking criteria for each object category. Specifically, Poly-MOT leverages different motion models for various object categories to characterize distinct types of motion accurately. We also introduce the constraint of the rigid structure of objects into a specific motion model to accurately describe the highly nonlinear motion of the object. Additionally, we introduce a two-stage data association strategy to ensure that objects can find the optimal similarity metric from three custom metrics for their categories and reduce missing matches. On the NuScenes dataset, our proposed method achieves state-of-the-art performance with 75.4\% AMOTA. The code is available at

9.End-to-End Reinforcement Learning for Torque Based Variable Height Hopping

Authors:Raghav Soni, Daniel Harnack, Hauke Isermann, Sotaro Fushimi, Shivesh Kumar, Frank Kirchner

Abstract: Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hopping is a challenging dynamic task involving a flight phase and has the potential to increase the traversability of legged robots. Model based control for hopping typically relies on accurate detection of different jump phases, such as lift-off or touch down, and using different controllers for each phase. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases, removing the need to provide manual heuristics for state detection. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot after training without parameter tuning.

10.Bi-Level Image-Guided Ergodic Exploration with Applications to Planetary Rovers

Authors:Elena Wittemyer, Ian Abraham

Abstract: We present a method for image-guided exploration for mobile robotic systems. Our approach extends ergodic exploration methods, a recent exploration approach that prioritizes complete coverage of a space, with the use of a learned image classifier that automatically detects objects and updates an information map to guide further exploration and localization of objects. Additionally, to improve outcomes of the information collected by our robot's visual sensor, we present a decomposition of the ergodic optimization problem as bi-level coarse and fine solvers, which act respectively on the robot's body and the robot's visual sensor. Our approach is applied to geological survey and localization of rock formations for Mars rovers, with real images from Mars rovers used to train the image classifier. Results demonstrate 1) improved localization of rock formations compared to naive approaches while 2) minimizing the path length of the exploration through the bi-level exploration.

11.Learning whom to trust in navigation: dynamically switching between classical and neural planning

Authors:Sombit Dey, Assem Sadek, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

Abstract: Navigation of terrestrial robots is typically addressed either with localization and mapping (SLAM) followed by classical planning on the dynamically created maps, or by machine learning (ML), often through end-to-end training with reinforcement learning (RL) or imitation learning (IL). Recently, modular designs have achieved promising results, and hybrid algorithms that combine ML with classical planning have been proposed. Existing methods implement these combinations with hand-crafted functions, which cannot fully exploit the complementary nature of the policies and the complex regularities between scene structure and planning performance. Our work builds on the hypothesis that the strengths and weaknesses of neural planners and classical planners follow some regularities, which can be learned from training data, in particular from interactions. This is grounded on the assumption that, both, trained planners and the mapping algorithms underlying classical planning are subject to failure cases depending on the semantics of the scene and that this dependence is learnable: for instance, certain areas, objects or scene structures can be reconstructed easier than others. We propose a hierarchical method composed of a high-level planner dynamically switching between a classical and a neural planner. We fully train all neural policies in simulation and evaluate the method in both simulation and real experiments with a LoCoBot robot, showing significant gains in performance, in particular in the real environment. We also qualitatively conjecture on the nature of data regularities exploited by the high-level planner.

12.Multi Agent Navigation in Unconstrained Environments using a Centralized Attention based Graphical Neural Network Controller

Authors:Yining Ma, Qadeer Khan, Daniel Cremers

Abstract: In this work, we propose a learning based neural model that provides both the longitudinal and lateral control commands to simultaneously navigate multiple vehicles. The goal is to ensure that each vehicle reaches a desired target state without colliding with any other vehicle or obstacle in an unconstrained environment. The model utilizes an attention based Graphical Neural Network paradigm that takes into consideration the state of all the surrounding vehicles to make an informed decision. This allows each vehicle to smoothly reach its destination while also evading collision with the other agents. The data and corresponding labels for training such a network is obtained using an optimization based procedure. Experimental results demonstrates that our model is powerful enough to generalize even to situations with more vehicles than in the training data. Our method also outperforms comparable graphical neural network architectures. Project page which includes the code and supplementary information can be found at

13.Deep Reinforcement Learning of Dexterous Pre-grasp Manipulation for Human-like Functional Categorical Grasping

Authors:Dmytro Pavlichenko, Sven Behnke

Abstract: Many objects such as tools and household items can be used only if grasped in a very specific way - grasped functionally. Often, a direct functional grasp is not possible, though. We propose a method for learning a dexterous pre-grasp manipulation policy to achieve human-like functional grasps using deep reinforcement learning. We introduce a dense multi-component reward function that enables learning a single policy, capable of dexterous pre-grasp manipulation of novel instances of several known object categories with an anthropomorphic hand. The policy is learned purely by means of reinforcement learning from scratch, without any expert demonstrations, and implicitly learns to reposition and reorient objects of complex shapes to achieve given functional grasps. Learning is done on a single GPU in less than three hours.

14.Recovery Policies for Safe Exploration of Lunar Permanently Shadowed Regions by a Solar-Powered Rover

Authors:Olivier Lamarre, Shantanu Malhotra, Jonathan Kelly

Abstract: The success of a multi-kilometre drive by a solar-powered rover at the lunar south pole depends upon careful planning in space and time due to highly dynamic solar illumination conditions. An additional challenge is that real-world robots may be subject to random faults that can temporarily delay long-range traverses. The majority of existing global spatiotemporal planners assume a deterministic rover-environment model and do not account for random faults. In this paper, we consider a random fault profile with a known, average spatial fault rate. We introduce a methodology to compute recovery policies that maximize the probability of survival of a solar-powered rover from different start states. A recovery policy defines a set of recourse actions to reach a location with sufficient battery energy remaining, given the local solar illumination conditions. We solve a stochastic reach-avoid problem using dynamic programming to find such optimal recovery policies. Our focus, in part, is on the implications of state space discretization, which is often required in practical implementations. We propose a modified dynamic programming algorithm that conservatively accounts for approximation errors. To demonstrate the benefits of our approach, we compare against existing methods in scenarios where a solar-powered rover seeks to safely exit from permanently shadowed regions in the Cabeus area at the lunar south pole. We also highlight the relevance of our methodology for mission formulation and trade safety analysis by empirically comparing different rover mobility models in simulated recovery drives from the LCROSS crash region.

15.Congestion Analysis for the DARPA OFFSET CCAST Swarm

Authors:Robert Brown, Julie A. Adams

Abstract: The Defense Advanced Research Projects Agency (DARPA) OFFensive Swarm-Enabled Tactics program's goal of launching 250 unmanned aerial and ground vehicles from a limited sized launch zone was a daunting challenge. The swarm's aerial vehicles were primarily multirotor platforms, which can efficiently be launched en masse. Each field exercise expected the deployment of an even larger swarm. While the launch zone's spatial area increased with each field exercise, the relative space for each vehicle was not necessarily increased, considering the increasing size of the swarm and the vehicles' associated GPS error; however, safe mission deployment and execution were expected. At the same time, achieving the mission goals required maximizing efficiency of the swarm's performance by reducing congestion that blocked vehicles from completing tactic assignments. Congestion analysis conducted before the final field exercise focused on adjusting various constraints to optimize the swarm's deployment without reducing safety. During the field exercise, data was collected that permitted analyzing the number and durations of individual vehicle blockages' impact on the resulting congestion. After the field exercise, additional analyses used the mission plan to validate the use of simulation for analyzing congestion.

16.Uncertainty-aware Gaussian Mixture Model for UWB Time Difference of Arrival Localization in Cluttered Environments

Authors:Wenda Zhao, Abhishek Goudar, Mingliang Tang, Xinyuan Qiao, Angela P. Schoellig

Abstract: Ultra-wideband (UWB) time difference of arrival(TDOA)-based localization has emerged as a low-cost and scalable indoor positioning solution. However, in cluttered environments, the performance of UWB TDOA-based localization deteriorates due to the biased and non-Gaussian noise distributions induced by obstacles. In this work, we present a bi-level optimization-based joint localization and noise model learning algorithm to address this problem. In particular, we use a Gaussian mixture model (GMM) to approximate the measurement noise distribution. We explicitly incorporate the estimated state's uncertainty into the GMM noise model learning, referred to as uncertainty-aware GMM, to improve both noise modeling and localization performance. We first evaluate the GMM noise model learning and localization performance in numerous simulation scenarios. We then demonstrate the effectiveness of our algorithm in extensive real-world experiments using two different cluttered environments. We show that our algorithm provides accurate position estimates with low-cost UWB sensors, no prior knowledge about the obstacles in the space, and a significant amount of UWB radios occluded.

17.Data-Based MHE for Agile Quadrotor Flight

Authors:Wonoo Choo, Erkan Kayacan

Abstract: This paper develops a data-based moving horizon estimation (MHE) method for agile quadrotors. Accurate state estimation of the system is paramount for precise trajectory control for agile quadrotors; however, the high level of aerodynamic forces experienced by the quadrotors during high-speed flights make this task extremely challenging. These complex turbulent effects are difficult to model and the unmodelled dynamics introduce inaccuracies in the state estimation. In this work, we propose a method to model these aerodynamic effects using Gaussian Processes which we integrate into the MHE to achieve efficient and accurate state estimation with minimal computational burden. Through extensive simulation and experimental studies, this method has demonstrated significant improvement in state estimation performance displaying superior robustness to poor state measurements.

18.Discovering Adaptable Symbolic Algorithms from Scratch

Authors:Stephen Kelly, Daniel S. Park, Xingyou Song, Mitchell McIntire, Pranav Nashikkar, Ritam Guha, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti, Jie Tan, Esteban Real

Abstract: Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.

1.Robust Visual Sim-to-Real Transfer for Robotic Manipulation

Authors:Ricardo Garcia, Robin Strudel, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid

Abstract: Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. Code, simulation environment, real robot datasets and trained models are available at

2.Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors

Authors:Masashi Okada, Mayumi Komatsu, Ryo Okumura, Tadahiro Taniguchi

Abstract: Rather than traditional position control, impedance control is preferred to ensure the safe operation of industrial robots programmed from demonstrations. However, variable stiffness learning studies have focused on task performance rather than safety (or compliance). Thus, this paper proposes a novel stiffness learning method to satisfy both task performance and compliance requirements. The proposed method optimizes the task and compliance objectives (T/C objectives) simultaneously via multi-objective Bayesian optimization. We define the stiffness search space by segmenting a demonstration into task phases, each with constant responsible stiffness. The segmentation is performed by identifying impedance control-aware switching linear dynamics (IC-SLD) from the demonstration. We also utilize the stiffness obtained by proposed IC-SLD as priors for efficient optimization. Experiments on simulated tasks and a real robot demonstrate that IC-SLD-based segmentation and the use of priors improve the optimization efficiency compared to existing baseline methods.

3.Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Authors:Nicole Robinson, Brendan Tidd, Dylan Campbell, Dana Kulić, Peter Corke

Abstract: Robotic vision for human-robot interaction and collaboration is a critical process for robots to collect and interpret detailed information related to human actions, goals, and preferences, enabling robots to provide more useful services to people. This survey and systematic review presents a comprehensive analysis on robotic vision in human-robot interaction and collaboration over the last 10 years. From a detailed search of 3850 articles, systematic extraction and evaluation was used to identify and explore 310 papers in depth. These papers described robots with some level of autonomy using robotic vision for locomotion, manipulation and/or visual communication to collaborate or interact with people. This paper provides an in-depth analysis of current trends, common domains, methods and procedures, technical processes, data sets and models, experimental testing, sample populations, performance metrics and future challenges. This manuscript found that robotic vision was often used in action and gesture recognition, robot movement in human spaces, object handover and collaborative actions, social communication and learning from demonstration. Few high-impact and novel techniques from the computer vision field had been translated into human-robot interaction and collaboration. Overall, notable advancements have been made on how to develop and deploy robots to assist people.

4.On the Design of Region-Avoiding Metrics for Collision-Safe Motion Generation on Riemannian Manifolds

Authors:Holger Klein, Noémie Jaquier, Andre Meixner, Tamim Asfour

Abstract: The generation of energy-efficient and dynamic-aware robot motions that satisfy constraints such as joint limits, self-collisions, and collisions with the environment remains a challenge. In this context, Riemannian geometry offers promising solutions by identifying robot motions with geodesics on the so-called configuration space manifold. While this manifold naturally considers the intrinsic robot dynamics, constraints such as joint limits, self-collisions, and collisions with the environment remain overlooked. In this paper, we propose a modification of the Riemannian metric of the configuration space manifold allowing for the generation of robot motions as geodesics that efficiently avoid given regions. We introduce a class of Riemannian metrics based on barrier functions that guarantee strict region avoidance by systematically generating accelerations away from no-go regions in joint and task space. We evaluate the proposed Riemannian metric to generate energy-efficient, dynamic-aware, and collision-free motions of a humanoid robot as geodesics and sequences thereof.

5.We are all Individuals: The Role of Robot Personality and Human Traits in Trustworthy Interaction

Authors:Mei Yii Lim, José David Aguas Lopes, David A. Robb, Bruce W. Wilson, Meriam Moujahid, Emanuele De Pellegrin, Helen Hastie

Abstract: As robots take on roles in our society, it is important that their appearance, behaviour and personality are appropriate for the job they are given and are perceived favourably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality but, importantly, with respect to individual human traits. Firstly, we show that we can accurately portray personality in a social robot, in terms of extroversion-introversion using vocal cues and linguistic features. Secondly, through garnering preferences and trust ratings for these different robot personalities, we establish that, for a Robo-Barista, an extrovert robot is preferred and trusted more than an introvert robot, regardless of the subject's own personality. Thirdly, we find that individual attitudes and predispositions towards robots do impact trust in the Robo-Baristas, and are therefore important considerations in addition to robot personality, roles and interaction context when designing any human-robot interaction study.

6.Learning to Open Doors with an Aerial Manipulator

Authors:Eugenio Cuniato, Ismail Geles, Weixuan Zhang, Olov Andersson, Marco Tognon, Roland Siegwart

Abstract: The field of aerial manipulation has seen rapid advances, transitioning from push-and-slide tasks to interaction with articulated objects. So far, when more complex actions are performed, the motion trajectory is usually handcrafted or a result of online optimization methods like Model Predictive Control (MPC) or Model Predictive Path Integral (MPPI) control. However, these methods rely on heuristics or model simplifications to efficiently run on onboard hardware, producing results in acceptable amounts of time. Moreover, they can be sensitive to disturbances and differences between the real environment and its simulated counterpart. In this work, we propose a Reinforcement Learning (RL) approach to learn motion behaviors for a manipulation task while producing policies that are robust to disturbances and modeling errors. Specifically, we train a policy to perform a door-opening task with an Omnidirectional Micro Aerial Vehicle (OMAV). The policy is trained in a physics simulator and experiments are presented both in simulation and running onboard the real platform, investigating the simulation to real world transfer. We compare our method against a state-of-the-art MPPI solution, showing a considerable increase in robustness and speed.

7.High-speed electrical connector assembly by structured compliance in a finray-effect gripper

Authors:Richard Hartisch, Kevin Haninger

Abstract: Fine assembly tasks such as electrical connector insertion have tight tolerances and sensitive components, requiring compensation of alignment errors while applying sufficient force in the insertion direction, ideally at high speeds and while grasping a range of components. Vision, tactile, or force sensors can compensate alignment errors, but have limited bandwidth, limiting the safe assembly speed. Passive compliance such as silicone-based fingers can reduce collision forces and grasp a range of components, but often cannot provide the accuracy or assembly forces required. To support high-speed mechanical search and self-aligning insertion, this paper proposes monolithic additively manufactured fingers which realize a moderate, structured compliance directly proximal to the gripped object. The geometry of finray-effect fingers are adapted to add form-closure features and realize a directionally-dependent stiffness at the fingertip, with a high stiffness to apply insertion forces and lower transverse stiffness to support alignment. Design parameters and mechanical properties of the fingers are investigated with FEM and empirical studies, analyzing the stiffness, maximum load, and viscoelastic effects. The fingers realize a remote center of compliance, which is shown to depend on the rib angle, and a directional stiffness ratio of $14-36$. The fingers are applied to a plug insertion task, realizing a tolerance window of $7.5$ mm and approach speeds of $1.3$ m/s.

8.Estimating Properties of Solid Particles Inside Container Using Touch Sensing

Authors:Xiaofeng Guo, Hung-Jui Huang, Wenzhen Yuan

Abstract: Solid particles, such as rice and coffee beans, are commonly stored in containers and are ubiquitous in our daily lives. Understanding those particles' properties could help us make later decisions or perform later manipulation tasks such as pouring. Humans typically interact with the containers to get an understanding of the particles inside them, but it is still a challenge for robots to achieve that. This work utilizes tactile sensing to estimate multiple properties of solid particles enclosed in the container, specifically, content mass, content volume, particle size, and particle shape. We design a sequence of robot actions to interact with the container. Based on physical understanding, we extract static force/torque value from the F/T sensor, vibration-related features and topple-related features from the newly designed high-speed GelSight tactile sensor to estimate those four particle properties. We test our method on $37$ very different daily particles, including powder, rice, beans, tablets, etc. Experiments show that our approach is able to estimate content mass with an error of $1.8$ g, content volume with an error of $6.1$ ml, particle size with an error of $1.1$ mm, and achieves an accuracy of $75.6$% for particle shape estimation. In addition, our method can generalize to unseen particles with unknown volumes. By estimating these particle properties, our method can help robots to better perceive the granular media and help with different manipulation tasks in daily life and industry.

1.Borinot: an open thrust-torque-controlled robot for research on agile aerial-contact motion

Authors:Josep Martí-Saumell, Hugo Duarte, Patrick Grosch, Juan Andrade-Cetto, Angel Santamaria-Navarro, Joan Solà

Abstract: This paper introduces Borinot, an open-source aerial robotic platform designed to conduct research on hybrid agile locomotion and manipulation using flight and contacts. This platform features an agile and powerful hexarotor that can be outfitted with torque-actuated limbs of diverse architecture, allowing for whole-body dynamic control. As a result, Borinot can perform agile tasks such as aggressive or acrobatic maneuvers with the participation of the whole-body dynamics. The limbs attached to Borinot can be utilized in various ways; during contact, they can be used as legs to create contact-based locomotion, or as arms to manipulate objects. In free flight, they can be used as tails to contribute to dynamics, mimicking the movements of many animals. This allows for any hybridization of these dynamic modes, making Borinot an ideal open-source platform for research on hybrid aerial-contact agile motion. To demonstrate the key capabilities of Borinot in terms of agility with hybrid motion modes, we have fitted a planar 2DoF limb and implemented a whole-body torque-level model-predictive-control. The result is a capable and adaptable platform that, we believe, opens up new avenues of research in the field of agile robotics. Interesting links\footnote{Documentation: \url{}}\footnote{Video: \url{}}.

2.Singularity Distance Computations of 3-RPR Manipulators Using Intrinsic Metrics

Authors:Aditya Kapilavai, Georg Nawratil

Abstract: We present an efficient algorithm for computing the closest singular configuration to each non-singular pose of a 3-RPR planar manipulator performing a 1-parametric motion. By considering a 3-RPR manipulator as a planar framework, one can use methods from rigidity theory to compute the singularity distance with respect to an intrinsic metric. There are different design options as the platform/base can be seen as a triangular plate or as a pin-jointed triangular bar structure. Moreover, we also allow the additional possibility of pinning down the base/platform triangle to the fixed/moving system thus it cannot be deformed. For the resulting nine interpretations, we compute the corresponding intrinsic metrics based on the total elastic strain energy density of the framework using the physical concept of Green-Lagrange strain. The global optimization problem of finding the closest singular configuration with respect to these metrics is solved by using tools from numerical algebraic geometry. The proposed algorithm is demonstrated based on an example.

3.Robust Task-Space Quadratic Programming for Kinematic-Controlled Robots

Authors:Mohamed Djeha, Pierre Gergondet, Abderrahmane Kheddar

Abstract: Task-space quadratic programming (QP) is an elegant approach for controlling robots subject to constraints. Yet, in the case of kinematic-controlled (i.e., high-gains position or velocity) robots, closed-loop QP control scheme can be prone to instability depending on how the gains related to the tasks or the constraints are chosen. In this paper, we address such instability shortcomings. First, we highlight the non-robustness of the closed-loop system against non-modeled dynamics, such as those relative to joint-dynamics, flexibilities, external perturbations, etc. Then, we propose a robust QP control formulation based on high-level integral feedback terms in the task-space including the constraints. The proposed method is formally proved to ensure closed-loop robust stability and is intended to be applied to any kinematic-controlled robots under practical assumptions. We assess our approach through experiments on a fixed-base robot performing stable fast motions, and a floating-base humanoid robot robustly reacting to perturbations to keep its balance.

4.Fast Convex Visual Foothold Adaptation for Quadrupedal Locomotion

Authors:Shafeef Omar, Lorenzo Amatucci, Giulio Turrisi, Victor Barasuol, Claudio Semini

Abstract: This extended abstract provides a short introduction on our recently developed perception-based controller for quadrupedal locomotion. Compared to our previous approach based on Visual Foothold Adaptation (VFA) and Model Predictive Control (MPC), our new framework combines a fast approximation of the safe foothold regions based on Neural Network regression, followed by a convex decomposition routine in order to generate safe landing areas where the controller can freely optimize the footholds location. The aforementioned framework, which combines prediction, convex decomposition, and MPC solution, is tested in simulation on our 140kg hydraulic quadruped robot (HyQReal).

5.Disturbance Preview for Nonlinear Model Predictive Trajectory Tracking of Underwater Vehicles in Wave Dominated Environments

Authors:Kyle L. Walker, Francesco Giorgio-Serchi

Abstract: Operating in the near-vicinity of marine energy devices poses significant challenges to the control of underwater vehicles, predominantly due to the presence of large magnitude wave disturbances causing hazardous state perturbations. Approaches to tackle this problem have varied, but one promising solution is to adopt predictive control methods. Given the predictable nature of ocean waves, the potential exists to incorporate disturbance estimations directly within the plant model; this requires inclusion of a wave predictor to provide online preview information. To this end, this paper presents a Nonlinear Model Predictive Controller with an integrated Deterministic Sea Wave Predictor for trajectory tracking of underwater vehicles. State information is obtained through an Extended Kalman Filter, forming a complete closed-loop strategy and facilitating online wave load estimations. The strategy is compared to a similar feed-forward disturbance mitigation scheme, showing mean performance improvements of 51% in positional error and 44.5% in attitude error. The preliminary results presented here provide strong evidence of the proposed method's high potential to effectively mitigate disturbances, facilitating accurate tracking performance even in the presence of high wave loading.

1.Formal Verification of Robotic Contact Tasks via Reachability Analysis

Authors:Chencheng Tang, Matthias Althoff

Abstract: Verifying the correct behavior of robots in contact tasks is challenging due to model uncertainties associated with contacts. Standard methods for testing often fall short since all (uncountable many) solutions cannot be obtained. Instead, we propose to formally and efficiently verify robot behaviors in contact tasks using reachability analysis, which enables checking all the reachable states against user-provided specifications. To this end, we extend the state of the art in reachability analysis for hybrid (mixed discrete and continuous) dynamics subject to discrete-time input trajectories. In particular, we present a novel and scalable guard intersection approach to reliably compute the complex behavior caused by contacts. We model robots subject to contacts as hybrid automata in which crucial time delays are included. The usefulness of our approach is demonstrated by verifying safe human-robot interaction in the presence of constrained collisions, which was out of reach for existing methods.

2.METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation

Authors:Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak

Abstract: Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.

3.Research on Inertial Navigation Technology of Unmanned Aerial Vehicles with Integrated Reinforcement Learning Algorithm

Authors:Longcheng Guo

Abstract: We first define appropriate state representation and action space, and then design an adjustment mechanism based on the actions selected by the intelligent agent. The adjustment mechanism outputs the next state and reward value of the agent. Additionally, the adjustment mechanism calculates the error between the adjusted state and the unadjusted state. Furthermore, the intelligent agent stores the acquired experience samples containing states and reward values in a buffer and replays the experiences during each iteration to learn the dynamic characteristics of the environment. We name the improved algorithm as the DQM algorithm. Experimental results demonstrate that the intelligent agent using our proposed algorithm effectively reduces the accumulated errors of inertial navigation in dynamic environments. Although our research provides a basis for achieving autonomous navigation of unmanned aerial vehicles, there is still room for significant optimization. Further research can include testing unmanned aerial vehicles in simulated environments, testing unmanned aerial vehicles in real-world environments, optimizing the design of reward functions, improving the algorithm workflow to enhance convergence speed and performance, and enhancing the algorithm's generalization ability.

4.Active Robot Vision for Distant Object Change Detection: A Lightweight Training Simulator Inspired by Multi-Armed Bandits

Authors:Kouki Terashima, Kanji Tanaka, Ryogo Yamamoto, Jonathan Tay Yu Liang

Abstract: In ground-view object change detection, the recently emerging map-less navigation has great potential as a means of navigating a robot to distantly detected objects and identifying their changing states (appear/disappear/no-change) with high resolution imagery. However, the brute-force naive action strategy of navigating to every distant object requires huge sense/plan/action costs proportional to the number of objects. In this work, we study this new problem of ``Which distant objects should be prioritized for map-less navigation?" and in order to speed up the R{\&}D cycle, propose a highly-simplified approach that is easy to implement and easy to extend. In our approach, a new layer called map-based navigation is added on top of the map-less navigation, which constitutes a hierarchical planner. First, a dataset consisting of $N$ view sequences is acquired by a real robot via map-less navigation. Then, an environment simulator was built to simulate a simple action planning problem: ``Which view sequence should the robot select next?". Then, a solver was built inspired by the analogy to the multi-armed bandit problem: ``Which arm should the player select next?". Finally, the effectiveness of the proposed framework was verified using the semantically non-trivial scenario ``sofa as bookshelf".

5.Reinforced Potential Field for Multi-Robot Motion Planning in Cluttered Environments

Authors:Dengyu Zhang, Xinyu Zhang, Zheng Zhang, Bo Zhu, Qingrui Zhang

Abstract: Motion planning is challenging for multiple robots in cluttered environments without communication, especially in view of real-time efficiency, motion safety, distributed computation, and trajectory optimality, etc. In this paper, a reinforced potential field method is developed for distributed multi-robot motion planning, which is a synthesized design of reinforcement learning and artificial potential fields. An observation embedding with a self-attention mechanism is presented to model the robot-robot and robot-environment interactions. A soft wall-following rule is developed to improve the trajectory smoothness. Our method belongs to reactive planning, but environment properties are implicitly encoded. The total amount of robots in our method can be scaled up to any number. The performance improvement over a vanilla APF and RL method has been demonstrated via numerical simulations. Experiments are also performed using quadrotors to further illustrate the competence of our method.

6.Multi-IMU Proprioceptive State Estimator for Humanoid Robots

Authors:Fabio Elnecave Xavier, Guillaume Burger, Marine Pétriaux, Jean-Emmanuel Deschaud, François Goulette

Abstract: Algorithms for state estimation of humanoid robots usually assume that the feet remain flat and in a constant position while in contact with the ground. However, this hypothesis is easily violated while walking, especially for human-like gaits with heel-toe motion. This reduces the time during which the contact assumption can be used, or requires higher variances to account for errors. In this paper, we present a novel state estimator based on the extended Kalman filter that can properly handle any contact configuration. We consider multiple inertial measurement units (IMUs) distributed throughout the robot's structure, including on both feet, which are used to track multiple bodies of the robot. This multi-IMU instrumentation setup also has the advantage of allowing the deformations in the robot's structure to be estimated, improving the kinematic model used in the filter. The proposed approach is validated experimentally on the exoskeleton Atalante and is shown to present low drift, performing better than similar single-IMU filters. The obtained trajectory estimates are accurate enough to construct elevation maps that have little distortion with respect to the ground truth.

7.MorphoLander: Reinforcement Learning Based Landing of a Group of Drones on the Adaptive Morphogenetic UAV

Authors:Sausar Karaf, Aleksey Fedoseev, Mikhail Martynov, Zhanibek Darush, Aleksei Shcherbak, Dzmitry Tsetserukou

Abstract: This paper focuses on a novel robotic system MorphoLander representing heterogeneous swarm of drones for exploring rough terrain environments. The morphogenetic leader drone is capable of landing on uneven terrain, traversing it, and maintaining horizontal position to deploy smaller drones for extensive area exploration. After completing their tasks, these drones return and land back on the landing pads of MorphoGear. The reinforcement learning algorithm was developed for a precise landing of drones on the leader robot that either remains static during their mission or relocates to the new position. Several experiments were conducted to evaluate the performance of the developed landing algorithm under both even and uneven terrain conditions. The experiments revealed that the proposed system results in high landing accuracy of 0.5 cm when landing on the leader drone under even terrain conditions and 2.35 cm under uneven terrain conditions. MorphoLander has the potential to significantly enhance the efficiency of the industrial inspections, seismic surveys, and rescue missions in highly cluttered and unstructured environments.

8.Towards Continuous Time Finite Horizon LQR Control in SE(3)

Authors:Shivesh Kumar, Andreas Mueller, Patrick Wensing, Frank Kirchner

Abstract: The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained optimization problem and hence are unable to respect the manifold constraints imposed by the group structure of SE(3). This may lead to small errors, singularity problems and double cover issues depending on the choice of coordinates to model the floating base motion. In this paper, we propose the use of canonical exponential coordinates of SE(3) and the associated Exponential map along with its differentials to embed this structure in the theory of finite horizon LQR controllers.

9.Soft Air Pocket Force Sensors for Large Scale Flexible Robots

Authors:Michael R. Mitchell, Ciera McFarland, Margaret M. Coad

Abstract: Flexible robots have advantages over rigid robots in their ability to conform physically to their environment and to form a wide variety of shapes. Sensing the force applied by or to flexible robots is useful for both navigation and manipulation tasks, but it is challenging due to the need for the sensors to withstand the robots' shape change without encumbering their functionality. Also, for robots with long or large bodies, the number of sensors required to cover the entire surface area of the robot body can be prohibitive due to high cost and complexity. We present a novel soft air pocket force sensor that is highly flexible, lightweight, relatively inexpensive, and easily scalable to various sizes. Our sensor produces a change in internal pressure that is linear with the applied force. We present results of experimental testing of how uncontrollable factors (contact location and contact area) and controllable factors (initial internal pressure, thickness, size, and number of interior seals) affect the sensitivity. We demonstrate our sensor applied to a vine robot-a soft inflatable robot that "grows" from the tip via eversion-and we show that the robot can successfully grow and steer towards an object with which it senses contact.

10.Evolving Multi-Objective Neural Network Controllers for Robot Swarms

Authors:Karl Mason, Sabine Hauert

Abstract: Many swarm robotics tasks consist of multiple conflicting objectives. This research proposes a multi-objective evolutionary neural network approach to developing controllers for swarms of robots. The swarm robot controllers are trained in a low-fidelity Python simulator and then tested in a high-fidelity simulated environment using Webots. Simulations are then conducted to test the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots. The results presented demonstrate that the proposed approach can effectively control each of the robots. The robot swarm exhibits different behaviours as the weighting for each objective is adjusted. The results also confirm that multi-objective neural network controllers evolved in a low-fidelity simulator can be transferred to high-fidelity simulated environments and that the controllers can scale to environments with a larger number of robots without further retraining needed.

11.CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor

Authors:Alexandros Filotheou

Abstract: Navigation of a mobile robot is conditioned on the knowledge of its pose. In observer-based localisation configurations its initial pose may not be knowable in advance, leading to the need of its estimation. Solutions to the problem of global localisation are either robust against noise and environment arbitrariness but require motion and time, which may (need to) be economised on, or require minimal estimation time but assume environmental structure, may be sensitive to noise, and demand preprocessing and tuning. This article proposes a method that retains the strengths and avoids the weaknesses of the two approaches. The method leverages properties of the Cumulative Absolute Error per Ray metric with respect to the errors of pose estimates of a 2D LIDAR sensor, and utilises scan--to--map-scan matching for fine(r) pose approximations. A large number of tests, in real and simulated conditions, involving disparate environments and sensor properties, illustrate that the proposed method outperforms state-of-the-art methods of both classes of solutions in terms of pose discovery rate and execution time. The source code is available for download.

12.Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing

Authors:Max Yang, Yijiong Lin, Alex Church, John Lloyd, Dandan Zhang, David A. W. Barton, Nathan F. Lepora

Abstract: Object pushing presents a key non-prehensile manipulation problem that is illustrative of more complex robotic manipulation tasks. While deep reinforcement learning (RL) methods have demonstrated impressive learning capabilities using visual input, a lack of tactile sensing limits their capability for fine and reliable control during manipulation. Here we propose a deep RL approach to object pushing using tactile sensing without visual input, namely tactile pushing. We present a goal-conditioned formulation that allows both model-free and model-based RL to obtain accurate policies for pushing an object to a goal. To achieve real-world performance, we adopt a sim-to-real approach. Our results demonstrate that it is possible to train on a single object and a limited sample of goals to produce precise and reliable policies that can generalize to a variety of unseen objects and pushing scenarios without domain randomization. We experiment with the trained agents in harsh pushing conditions, and show that with significantly more training samples, a model-free policy can outperform a model-based planner, generating shorter and more reliable pushing trajectories despite large disturbances. The simplicity of our training environment and effective real-world performance highlights the value of rich tactile information for fine manipulation. Code and videos are available at

13.LiDAR-based drone navigation with reinforcement learning

Authors:Pawel Miera, Hubert Szolc, Tomasz Kryjak

Abstract: Reinforcement learning is of increasing importance in the field of robot control and simulation plays a~key role in this process. In the unmanned aerial vehicles (UAVs, drones), there is also an increase in the number of published scientific papers involving this approach. In this work, an autonomous drone control system was prepared to fly forward (according to its coordinates system) and pass the trees encountered in the forest based on the data from a rotating LiDAR sensor. The Proximal Policy Optimization (PPO) algorithm, an example of reinforcement learning (RL), was used to prepare it. A custom simulator in the Python language was developed for this purpose. The Gazebo environment, integrated with the Robot Operating System (ROS), was also used to test the resulting control algorithm. Finally, the prepared solution was implemented in the Nvidia Jetson Nano eGPU and verified in the real tests scenarios. During them, the drone successfully completed the set task and was able to repeatably avoid trees and fly through the forest.

14.Waypoint-Based Imitation Learning for Robotic Manipulation

Authors:Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

Abstract: While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at

1.Towards Sim2Real Transfer of Autonomy Algorithms using AutoDRIVE Ecosystem

Authors:Chinmay Vilas Samak, Tanmay Vilas Samak, Venkat Krovi

Abstract: The engineering community currently encounters significant challenges in the development of intelligent transportation algorithms that can be transferred from simulation to reality with minimal effort. This can be achieved by robustifying the algorithms using domain adaptation methods and/or by adopting cutting-edge tools that help support this objective seamlessly. This work presents AutoDRIVE, an openly accessible digital twin ecosystem designed to facilitate synergistic development, simulation and deployment of cyber-physical solutions pertaining to autonomous driving technology; and focuses on bridging the autonomy-oriented simulation-to-reality (sim2real) gap using the proposed ecosystem. In this paper, we extensively explore the modeling and simulation aspects of the ecosystem and substantiate its efficacy by demonstrating the successful transition of two candidate autonomy algorithms from simulation to reality to help support our claims: (i) autonomous parking using probabilistic robotics approach; (ii) behavioral cloning using deep imitation learning. The outcomes of these case studies further strengthen the credibility of AutoDRIVE as an invaluable tool for advancing the state-of-the-art in autonomous driving technology.

2.Learning Autonomous Ultrasound via Latent Task Representation and Robotic Skills Adaptation

Authors:Xutian Deng, Junnan Jiang, Wen Cheng, Miao Li

Abstract: As medical ultrasound is becoming a prevailing examination approach nowadays, robotic ultrasound systems can facilitate the scanning process and prevent professional sonographers from repetitive and tedious work. Despite the recent progress, it is still a challenge to enable robots to autonomously accomplish the ultrasound examination, which is largely due to the lack of a proper task representation method, and also an adaptation approach to generalize learned skills across different patients. To solve these problems, we propose the latent task representation and the robotic skills adaptation for autonomous ultrasound in this paper. During the offline stage, the multimodal ultrasound skills are merged and encapsulated into a low-dimensional probability model through a fully self-supervised framework, which takes clinically demonstrated ultrasound images, probe orientations, and contact forces into account. During the online stage, the probability model will select and evaluate the optimal prediction. For unstable singularities, the adaptive optimizer fine-tunes them to near and stable predictions in high-confidence regions. Experimental results show that the proposed approach can generate complex ultrasound strategies for diverse populations and achieve significantly better quantitative results than our previous method.

3.A behavioural transformer for effective collaboration between a robot and a non-stationary human

Authors:Ruaridh Mon-Williams, Theodoros Stouraitis, Sethu Vijayakumar

Abstract: A key challenge in human-robot collaboration is the non-stationarity created by humans due to changes in their behaviour. This alters environmental transitions and hinders human-robot collaboration. We propose a principled meta-learning framework to explore how robots could better predict human behaviour, and thereby deal with issues of non-stationarity. On the basis of this framework, we developed Behaviour-Transform (BeTrans). BeTrans is a conditional transformer that enables a robot agent to adapt quickly to new human agents with non-stationary behaviours, due to its notable performance with sequential data. We trained BeTrans on simulated human agents with different systematic biases in collaborative settings. We used an original customisable environment to show that BeTrans effectively collaborates with simulated human agents and adapts faster to non-stationary simulated human agents than SOTA techniques.

4.Preliminary Design of the Dragonfly Navigation Filter

Authors:Ben Schilling, Timothy G. McGee, Ryan Mitch, Ryan Watson

Abstract: Dragonfly is scheduled to begin exploring Titan by 2034 using a series of multi-kilometer surface flights. This paper outlines the preliminary design of the navigation filter for the Dragonfly Mobility subsystem. The software architecture and filter formulation for lidar, visual odometry, pressure sensors, and redundant IMUs are described in detail. Special discussion is given to developments to achieve multi-kilometer surface flights, including optimizing sequential image baselines, modeling correlating image processing errors, and an efficient approximation to the Simultaneous Localization and Mapping (SLAM) problem.

5.A Soft Robotic Gripper with Active Palm for In-Hand Object Reorientation

Authors:Thomas Mack, Ketao Zhang, Kaspar Althoefer

Abstract: The human hand has an inherent ability to manipulate and re-orientate objects without external assistance. As a consequence, we are able to operate tools and perform an array of actions using just one hand, without having to continuously re-grasp objects. Emulating this functionality in robotic end-effectors remains a key area of study with efforts being made to create advanced control systems that could be used to operate complex manipulators. In this paper, a three fingered soft gripper with an active rotary palm is presented as a simpler, alternative method of performing in-hand rotations. The gripper, complete with its pneumatic suction cup to prevent object slippage, was tested and found to be able to effectively grasp and rotate a variety of objects both quickly and precisely.

6.A Comprehensive Review of Recent Research Trends on UAVs

Authors:Kaled Telli, Okba Kraa, Yassine Himeur, Abdelmalik Ouamane, Mohamed Boumehraz, Shadi Atalla, Wathiq Mansoor

Abstract: The growing interest in unmanned aerial vehicles (UAVs) from both scientific and industrial sectors has attracted a wave of new researchers and substantial investments in this expansive field. However, due to the wide range of topics and subdomains within UAV research, newcomers may find themselves overwhelmed by the numerous options available. It is therefore crucial for those involved in UAV research to recognize its interdisciplinary nature and its connections with other disciplines. This paper presents a comprehensive overview of the UAV field, highlighting recent trends and advancements. Drawing on recent literature reviews and surveys, the review begins by classifying UAVs based on their flight characteristics. It then provides an overview of current research trends in UAVs, utilizing data from the Scopus database to quantify the number of scientific documents associated with each research direction and their interconnections. The paper also explores potential areas for further development in UAVs, including communication, artificial intelligence, remote sensing, miniaturization, swarming and cooperative control, and transformability. Additionally, it discusses the development of aircraft control, commonly used control techniques, and appropriate control algorithms in UAV research. Furthermore, the paper addresses the general hardware and software architecture of UAVs, their applications, and the key issues associated with them. It also provides an overview of current open-source software and hardware projects in the UAV field. By presenting a comprehensive view of the UAV field, this paper aims to enhance understanding of this rapidly evolving and highly interdisciplinary area of research.

1.BonnBot-I: A Precise Weed Management and Crop Monitoring Platform

Authors:Alireza Ahmadi, Michael Halstead, Chris McCool

Abstract: Cultivation and weeding are two of the primary tasks performed by farmers today. A recent challenge for weeding is the desire to reduce herbicide and pesticide treatments while maintaining crop quality and quantity. In this paper we introduce BonnBot-I a precise weed management platform which can also performs field monitoring. Driven by crop monitoring approaches which can accurately locate and classify plants (weed and crop) we further improve their performance by fusing the platform available GNSS and wheel odometry. This improves tracking accuracy of our crop monitoring approach from a normalized average error of 8.3% to 3.5%, evaluated on a new publicly available corn dataset. We also present a novel arrangement of weeding tools mounted on linear actuators evaluated in simulated environments. We replicate weed distributions from a real field, using the results from our monitoring approach, and show the validity of our work-space division techniques which require significantly less movement (a 50% reduction) to achieve similar results. Overall, BonnBot-I is a significant step forward in precise weed management with a novel method of selectively spraying and controlling weeds in an arable field

2.Multi-Shooting Differential Dynamic Programming for Hybrid Systems using Analytical Derivatives

Authors:Shubham Singh, Ryan P. Russell, Patrick M. Wensing

Abstract: Differential Dynamic Programming (DDP) is a popular technique used to generate motion for dynamic-legged robots in the recent past. However, in most cases, only the first-order partial derivatives of the underlying dynamics are used, resulting in the iLQR approach. Neglecting the second-order terms often slows down the convergence rate compared to full DDP. Multi-Shooting is another popular technique to improve robustness, especially if the dynamics are highly non-linear. In this work, we consider Multi-Shooting DDP for trajectory optimization of a bounding gait for a simplified quadruped model. As the main contribution, we develop Second-Order analytical partial derivatives of the rigid-body contact dynamics, extending our previous results for fixed/floating base models with multi-DoF joints. Finally, we show the benefits of a novel Quasi-Newton method for approximating second-order derivatives of the dynamics, leading to order-of-magnitude speedups in the convergence compared to the full DDP method.

3.SafeSteps: Learning Safer Footstep Planning Policies for Legged Robots via Model-Based Priors

Authors:Shafeef Omar, Lorenzo Amatucci, Victor Barasuol, Giulio Turrisi, Claudio Semini

Abstract: We present a footstep planning policy for quadrupedal locomotion that is able to directly take into consideration a-priori safety information in its decisions. At its core, a learning process analyzes terrain patches, classifying each landing location by its kinematic feasibility, shin collision, and terrain roughness. This information is then encoded into a small vector representation and passed as an additional state to the footstep planning policy, which furthermore proposes only safe footstep location by applying a masked variant of the Proximal Policy Optimization (PPO) algorithm. The performance of the proposed approach is shown by comparative simulations on an electric quadruped robot walking in different rough terrain scenarios. We show that violations of the above safety conditions are greatly reduced both during training and the successive deployment of the policy, resulting in an inherently safer footstep planner. Furthermore, we show how, as a byproduct, fewer reward terms are needed to shape the behavior of the policy, which in return is able to achieve both better final performances and sample efficiency

4.DawnIK: Decentralized Collision-Aware Inverse Kinematics Solver for Heterogeneous Multi-Arm Systems

Authors:Salih Marangoz, Rohit Menon, Nils Dengler, Maren Bennewitz

Abstract: Although inverse kinematics of serial manipulators is a well studied problem, challenges still exist in finding smooth feasible solutions that are also collision aware. Furthermore, with collaborative and service robots gaining traction, different robotic systems have to work in close proximity. This means that the current inverse kinematics approaches have to not only avoid collisions with themselves but also collisions with other robot arms. Therefore, we present a novel approach to compute inverse kinematics for serial manipulators that take into account different constraints while trying to reach a desired end-effector position and/or orientation that avoids collisions with themselves and other arms. Unlike other constraint based approaches, we neither perform expensive inverse Jacobian computations nor do we require arms with redundant degrees of freedom. Instead, we formulate different constraints as weighted cost functions to be optimized by a non-linear optimization solver. Our approach is superior to the state-of-the-art CollisionIK in terms of collision avoidance in the presence of multiple arms in confined spaces with no detected collisions at all in all the experimental scenarios. When the probability of collision is low, our approach shows better performance at trajectory tracking as well. Additionally, our approach is capable of simultaneous yet decentralized control of multiple arms for trajectory tracking in intersecting workspace without any collisions.

5.GNSS-stereo-inertial SLAM for arable farming

Authors:Javier Cremona, Javier Civera, Ernesto Kofman, Taihú Pire

Abstract: The accelerating pace in the automation of agricultural tasks demands highly accurate and robust localization systems for field robots. Simultaneous Localization and Mapping (SLAM) methods inevitably accumulate drift on exploratory trajectories and primarily rely on place revisiting and loop closing to keep a bounded global localization error. Loop closure techniques are significantly challenging in agricultural fields, as the local visual appearance of different views is very similar and might change easily due to weather effects. A suitable alternative in practice is to employ global sensor positioning systems jointly with the rest of the robot sensors. In this paper we propose and implement the fusion of global navigation satellite system (GNSS), stereo views, and inertial measurements for localization purposes. Specifically, we incorporate, in a tightly coupled manner, GNSS measurements into the stereo-inertial ORB-SLAM3 pipeline. We thoroughly evaluate our implementation in the sequences of the Rosario data set, recorded by an autonomous robot in soybean fields, and our own in-house data. Our data includes measurements from a conventional GNSS, rarely included in evaluations of state-of-the-art approaches. We characterize the performance of GNSS-stereo-inertial SLAM in this application case, reporting pose error reductions between 10% and 30% compared to visual-inertial and loosely coupled GNSS-stereo-inertial baselines. In addition to such analysis, we also release the code of our implementation as open source.

6.Authoring and Operating Humanoid Behaviors On the Fly using Coactive Design Principles

Authors:Duncan Calvert, Dexton Anderson, Tomasz Bialek, Stephen McCrory, Luigi Penco, Jerry Pratt, Robert Griffin

Abstract: Humanoid robots have the potential to perform useful tasks in a world built for humans. However, communicating intention and teaming with a humanoid robot is a multi-faceted and complex problem. In this paper, we tackle the problems associated with quickly and interactively authoring new robot behavior that works on real hardware. We bring the powerful concepts of Affordance Templates and Coactive Design methodology to this problem to attempt to solve and explain it. In our approach we use interactive stance and hand pose goals along with other types of actions to author humanoid robot behavior on the fly. We then describe how our operator interface works to author behaviors on the fly and provide interdependence analysis charts for task approach and door opening. We present timings from real robot performances for traversing a push door and doing a pick and place task on our Nadia humanoid robot.

1.Direct and inverse modeling of soft robots by learning a condensed FEM model

Authors:Etienne Ménager, Tanguy Navez, Olivier Goury, Christian Duriez

Abstract: The Finite Element Method (FEM) is a powerful modeling tool for predicting the behavior of soft robots. However, its use for control can be difficult for non-specialists of numerical computation: it requires an optimization of the computation to make it real-time. In this paper, we propose a learning-based approach to obtain a compact but sufficiently rich mechanical representation. Our choice is based on nonlinear compliance data in the actuator/effector space provided by a condensation of the FEM model. We demonstrate that this compact model can be learned with a reasonable amount of data and, at the same time, be very efficient in terms of modeling, since we can deduce the direct and inverse kinematics of the robot. We also show how to couple some models learned individually in particular on an example of a gripper composed of two soft fingers. Other results are shown by comparing the inverse model derived from the full FEM model and the one from the compact learned version. This work opens new perspectives, namely for the embedded control of soft robots, but also for their design. These perspectives are also discussed in the paper.

2.BatMobility: Towards Flying Without Seeing for Autonomous Drones

Authors:Emerson Sie, Zikun Liu, Deepak Vasisht

Abstract: Unmanned aerial vehicles (UAVs) rely on optical sensors such as cameras and lidar for autonomous operation. However, such optical sensors are error-prone in bad lighting, inclement weather conditions including fog and smoke, and around textureless or transparent surfaces. In this paper, we ask: is it possible to fly UAVs without relying on optical sensors, i.e., can UAVs fly without seeing? We present BatMobility, a lightweight mmWave radar-only perception system for UAVs that eliminates the need for optical sensors. BatMobility enables two core functionalities for UAVs -- radio flow estimation (a novel FMCW radar-based alternative for optical flow based on surface-parallel doppler shift) and radar-based collision avoidance. We build BatMobility using commodity sensors and deploy it as a real-time system on a small off-the-shelf quadcopter running an unmodified flight controller. Our evaluation shows that BatMobility achieves comparable or better performance than commercial-grade optical sensors across a wide range of scenarios.

3.Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots

Authors:Mihir Kulkarni, Huan Nguyen, Kostas Alexis

Abstract: This paper contributes a novel and modularized learning-based method for aerial robots navigating cluttered environments containing hard-to-perceive thin obstacles without assuming access to a map or the full pose estimation of the robot. The proposed solution builds upon a semantically-enhanced Variational Autoencoder that is trained with both real-world and simulated depth images to compress the input data, while preserving semantically-labeled thin obstacles and handling invalid pixels in the depth sensor's output. This compressed representation, in addition to the robot's partial state involving its linear/angular velocities and its attitude are then utilized to train an uncertainty-aware 3D Collision Prediction Network in simulation to predict collision scores for candidate action sequences in a predefined motion primitives library. A set of simulation and experimental studies in cluttered environments with various sizes and types of obstacles, including multiple hard-to-perceive thin objects, were conducted to evaluate the performance of the proposed method and compare against an end-to-end trained baseline. The results demonstrate the benefits of the proposed semantically-enhanced deep collision prediction for learning-based autonomous navigation.

4.Solving Pallet loading Problem with Real-World Constraints

Authors:Marko Švaco, Filip Šuligoj, Bojan Šekoranja, Josip Vidaković, Pietro Kristović

Abstract: Efficient cargo packing and transport unit stacking play a vital role in enhancing logistics efficiency and reducing costs in the field of logistics. This article focuses on the challenging problem of loading transport units onto pallets, which belongs to the class of NP-hard problems. We propose a novel method for solving the pallet loading problem using a branch and bound algorithm, where there is a loading order of transport units. The derived algorithm considers only a heuristically favourable subset of possible positions of the transport units, which has a positive effect on computability. Furthermore, it is ensured that the pallet configuration meets real-world constraints, such as the stability of the position of transport units under the influence of transport inertial forces and gravity.

5.CycleIK: Neuro-inspired Inverse Kinematics

Authors:Jan-Gerrit Habekost, Erik Strahl, Philipp Allgeuer, Matthias Kerzel, Stefan Wermter

Abstract: The paper introduces CycleIK, a neuro-robotic approach that wraps two novel neuro-inspired methods for the inverse kinematics (IK) task, a Generative Adversarial Network (GAN), and a Multi-Layer Perceptron architecture. These methods can be used in a standalone fashion, but we also show how embedding these into a hybrid neuro-genetic IK pipeline allows for further optimization via sequential least-squares programming (SLSQP) or a genetic algorithm (GA). The models are trained and tested on dense datasets that were collected from random robot configurations of the new Neuro-Inspired COLlaborator (NICOL), a semi-humanoid robot with two redundant 8-DoF manipulators. We utilize the weighted multi-objective function from the state-of-the-art BioIK method to support the training process and our hybrid neuro-genetic architecture. We show that the neural models can compete with state-of-the-art IK approaches, which allows for deployment directly to robotic hardware. Additionally, it is shown that the incorporation of the genetic algorithm improves the precision while simultaneously reducing the overall runtime.

6.Control- & Task-Aware Optimal Design of Actuation System for Legged Robots using Binary Integer Linear Programming

Authors:Youngwoo Sim, Guillermo Colin, Joao Ramos

Abstract: Athletic robots demand a whole-body actuation system design that utilizes motors up to the boundaries of their performance. However, creating such robots poses challenges of integrating design principles and reasoning of practical design choices. This paper presents a design framework that guides designers to find optimal design choices to create an actuation system that can rapidly generate torques and velocities required to achieve a given set of tasks, by minimizing inertia and leveraging cooperation between actuators. The framework serves as an interactive tool for designers who are in charge of providing design rules and candidate components such as motors, reduction mechanism, and coupling mechanisms between actuators and joints. A binary integer linear optimization explores design combinations to find optimal components that can achieve a set of tasks. The framework is demonstrated with 200 optimal design studies of a biped with 5-degree-of-freedom (DoF) legs, focusing on the effect of achieving multiple tasks (walking, lifting), constraining the mass budget of all motors in the system and the use of coupling mechanisms. The result provides a comprehensive view of how design choices and rules affect reflected inertia, copper loss of motors, and force capability of optimal actuation systems.

7.A Benchmarking Study on Vision-Based Grasp Synthesis Algorithms

Authors:Bharath K Rameshbabu, Sumukh S Balakrishna, Berk Calli

Abstract: In this paper, we present a benchmarking study of vision-based grasp synthesis algorithms, each with distinct approaches, and provide a comparative analysis of their performance under different experimental conditions. In particular, we compare two machine-learning-based and two analytical algorithms to determine their strengths and weaknesses in different scenarios. In addition, we provide an open-source benchmarking tool developed from state-of-the-art benchmarking procedures and protocols to systematically evaluate different grasp synthesis algorithms. Our findings offer insights into the performance of the evaluated algorithms, which can aid in selecting the most appropriate algorithm for different scenarios.

8.Online Monocular Lane Mapping Using Catmull-Rom Spline

Authors:Zhijian Qiao, Zehuan Yu, Huan Yin, Shaojie Shen

Abstract: In this study, we introduce an online monocular lane mapping approach that solely relies on a single camera and odometry for generating spline-based maps. Our proposed technique models the lane association process as an assignment issue utilizing a bipartite graph, and assigns weights to the edges by incorporating Chamfer distance, pose uncertainty, and lateral sequence consistency. Furthermore, we meticulously design control point initialization, spline parameterization, and optimization to progressively create, expand, and refine splines. In contrast to prior research that assessed performance using self-constructed datasets, our experiments are conducted on the openly accessible OpenLane dataset. The experimental outcomes reveal that our suggested approach enhances lane association and odometry precision, as well as overall lane map quality. We have open-sourced our code1 for this project.

9.3D Skeletonization of Complex Grapevines for Robotic Pruning

Authors:Eric Schneider, Sushanth Jayanth, Abhisesh Silwal, George Kantor

Abstract: Robotic pruning of dormant grapevines is an area of active research in order to promote vine balance and grape quality, but so far robotic efforts have largely focused on planar, simplified vines not representative of commercial vineyards. This paper aims to advance the robotic perception capabilities necessary for pruning in denser and more complex vine structures by extending plant skeletonization techniques. The proposed pipeline generates skeletal grapevine models that have lower reprojection error and higher connectivity than baseline algorithms. We also show how 3D and skeletal information enables prediction accuracy of pruning weight for dense vines surpassing prior work, where pruning weight is an important vine metric influencing pruning site selection.

10.GP-Frontier for Local Mapless Navigation

Authors:Mahmoud Ali, Lantao Liu

Abstract: We propose a new frontier concept called the Gaussian Process Frontier (GP-Frontier) that can be used to locally navigate a robot towards a goal without building a map. The GP-Frontier is built on the uncertainty assessment of an efficient variant of sparse Gaussian Process. Based only on local ranging sensing measurement, the GP-Frontier can be used for navigation in both known and unknown environments. The proposed method is validated through intensive evaluations, and the results show that the GP-Frontier can navigate the robot in a safe and persistent way, i.e., the robot moves in the most open space (thus reducing the risk of collision) without relying on a map or a path planner.

1.Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation

Authors:Christopher Funk, Ofer Dagan, Benjamin Noack, Nisar R. Ahmed

Abstract: A key challenge in Bayesian decentralized data fusion is the `rumor propagation' or `double counting' phenomenon, where previously sent data circulates back to its sender. It is often addressed by approximate methods like covariance intersection (CI) which takes a weighted average of the estimates to compute the bound. The problem is that this bound is not tight, i.e. the estimate is often over-conservative. In this paper, we show that by exploiting the probabilistic independence structure in multi-agent decentralized fusion problems a tighter bound can be found using (i) an expansion to the CI algorithm that uses multiple (non-monolithic) weighting factors instead of one (monolithic) factor in the original CI and (ii) a general optimization scheme that is able to compute optimal bounds and fully exploit an arbitrary dependency structure. We compare our methods and show that on a simple problem, they converge to the same solution. We then test our new non-monolithic CI algorithm on a large-scale target tracking simulation and show that it achieves a tighter bound and a more accurate estimate compared to the original monolithic CI.

2.GPRL: Gaussian Processes-Based Relative Localization for Multi-Robot Systems

Authors:Ehsan Latif, Ramviyas Parasuraman

Abstract: Relative localization is crucial for multi-robot systems to perform cooperative tasks, especially in GPS-denied environments. Current techniques for multi-robot relative localization rely on expensive or short-range sensors such as cameras and LIDARs. As a result, these algorithms face challenges such as high computational complexity, dependencies on well-structured environments, etc. To overcome these limitations, we propose a new distributed approach to perform relative localization using a Gaussian Processes map of the Radio Signal Strength Indicator (RSSI) values from a single wireless Access Point (AP) to which the robots are connected. Our approach, Gaussian Processes-based Relative Localization (GPRL), combines two pillars. First, the robots locate the AP w.r.t. their local reference frames using novel hierarchical inferencing that significantly reduces computational complexity. Secondly, the robots obtain relative positions of neighbor robots with an AP-oriented vector transformation. The approach readily applies to resource-constrained devices and relies only on the ubiquitously-available RSSI measurement. We extensively validate the performance of the two pillars of the proposed GRPL in Robotarium simulations. We also demonstrate the applicability of GPRL through a multi-robot rendezvous task with a team of three real-world robots. The results demonstrate that GPRL outperformed state-of-the-art approaches regarding accuracy, computation, and real-time performance.

3.Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots

Authors:Shimian Zhang

Abstract: As the advent of artificial general intelligence (AGI) progresses at a breathtaking pace, the application of large language models (LLMs) as AI Agents in robotics remains in its nascent stage. A significant concern that hampers the seamless integration of these AI Agents into robotics is the unpredictability of the content they generate, a phenomena known as ``hallucination''. Drawing inspiration from biological neural systems, we propose a novel, layered architecture for autonomous robotics, bridging AI agent intelligence and robot instinct. In this context, we define Robot Instinct as the innate or learned set of responses and priorities in an autonomous robotic system that ensures survival-essential tasks, such as safety assurance and obstacle avoidance, are carried out in a timely and effective manner. This paradigm harmoniously combines the intelligence of LLMs with the instinct of robotic behaviors, contributing to a more safe and versatile autonomous robotic system. As a case study, we illustrate this paradigm within the context of a mobile robot, demonstrating its potential to significantly enhance autonomous robotics and enabling a future where robots can operate independently and safely across diverse environments.

4.Modeling and analysis of pHRI with Differential Game Theory

Authors:Paolo Franceschi, Manuel Beschi, Nicola Pedrocchi, Anna Valente

Abstract: Applications involving humans and robots working together are spreading nowadays. Alongside, modeling and control techniques that allow physical Human-Robot Interaction (pHRI) are widely investigated. To better understand its potential application in pHRI, this work investigates the Cooperative Differential Game Theory modeling of pHRI in a cooperative reaching task, specifically for reference tracking. The proposed controller based on Collaborative Game Theory is deeply analyzed and compared in simulations with two other techniques, Linear Quadratic Regulator (LQR) and Non-Cooperative Game-Theoretic Controller. The set of simulations shows how different tuning of control parameters affects the system response and control efforts of both the players for the three controllers, suggesting the use of Cooperative GT in the case the robot should assist the human, while Non-Cooperative GT represents a better choice in the case the robot should lead the action. Finally, preliminary tests with a trained human are performed to extract useful information on the real applicability and limitations of the proposed method.

5.Predicting human motion intention for pHRI assistive control

Authors:Paolo Franceschi, Fabio Bertini, Francesco Braghin, Loris Roveda, Nicola Pedrocchi, Manuel Beschi

Abstract: This work addresses human intention identification during physical Human-Robot Interaction (pHRI) tasks to include this information in an assistive controller. To this purpose, human intention is defined as the desired trajectory that the human wants to follow over a finite rolling prediction horizon so that the robot can assist in pursuing it. This work investigates a Recurrent Neural Network (RNN), specifically, Long-Short Term Memory (LSTM) cascaded with a Fully Connected layer. In particular, we propose an iterative training procedure to adapt the model. Such an iterative procedure is powerful in reducing the prediction error. Still, it has the drawback that it is time-consuming and does not generalize to different users or different co-manipulated objects. To overcome this issue, Transfer Learning (TL) adapts the pre-trained model to new trajectories, users, and co-manipulated objects by freezing the LSTM layer and fine-tuning the last FC layer, which makes the procedure faster. Experiments show that the iterative procedure adapts the model and reduces prediction error. Experiments also show that TL adapts to different users and to the co-manipulation of a large object. Finally, to check the utility of adopting the proposed method, we compare the proposed controller enhanced by the intention prediction with the other two standard controllers of pHRI.

6.A Hybrid Adaptive Controller for Soft Robot Interchangeability

Authors:Zixi Chen, Xuyang Ren, Matteo Bernabei, Vanessa Mainardi, Gastone Ciuti, Cesare Stefanini

Abstract: Soft robots have been leveraged in considerable areas like surgery, rehabilitation, and bionics due to their softness, flexibility, and safety. However, it is challenging to produce two same soft robots even with the same mold and manufacturing process owing to the complexity of soft materials. Meanwhile, widespread usage of a system requires the ability to fabricate replaceable components, which is interchangeability. Due to the necessity of this property, a hybrid adaptive controller is introduced to achieve interchangeability from the perspective of control approaches. This method utilizes an offline trained recurrent neural network controller to cope with the nonlinear and delayed response from soft robots. Furthermore, an online optimizing kinematics controller is applied to decrease the error caused by the above neural network controller. Soft pneumatic robots with different deformation properties but the same mold have been included for validation experiments. In the experiments, the systems with different actuation configurations and the different robots follow the desired trajectory with errors of 0.040 and 0.030 compared with the working space length, respectively. Such an adaptive controller also shows good performance on different control frequencies and desired velocities. This controller endows soft robots with the potential for wide application, and future work may include different offline and online controllers. A weight parameter adjusting strategy may also be proposed in the future.

7.Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Authors:Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, Bin He

Abstract: Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.

8.Control Input Inference of Mobile Agents under Unknown Objective

Authors:Chendi Qu, Jianping He, Xiaoming Duan, Shukun Wu

Abstract: Trajectory and control secrecy is an important issue in robotics security. This paper proposes a novel algorithm for the control input inference of a mobile agent without knowing its control objective. Specifically, the algorithm first estimates the target state by applying external perturbations. Then we identify the objective function based on the inverse optimal control, providing the well-posedness proof and the identifiability analysis. Next, we obtain the optimal estimate of the control horizon using binary search. Finally, the agent's control optimization problem is reconstructed and solved to predict its input. Simulation illustrates the efficiency and the performance of the algorithm.

9.A Survey on Dialogue Management in Human-Robot Interaction

Authors:Merle M. Reimann, Florian A. Kunneman, Catharine Oertel, Koen V. Hindriks

Abstract: As social robots see increasing deployment within the general public, improving the interaction with those robots is essential. Spoken language offers an intuitive interface for the human-robot interaction (HRI), with dialogue management (DM) being a key component in those interactive systems. Yet, to overcome current challenges and manage smooth, informative and engaging interaction a more structural approach to combining HRI and DM is needed. In this systematic review, we analyse the current use of DM in HRI and focus on the type of dialogue manager used, its capabilities, evaluation methods and the challenges specific to DM in HRI. We identify the challenges and current scientific frontier related to the DM approach, interaction domain, robot appearance, physical situatedness and multimodality.

10.Soft-tissue Driven Craniomaxillofacial Surgical Planning

Authors:Xi Fang, Daeseung Kim, Xuanang Xu, Tianshu Kuang, Nathan Lampen, Jungwook Lee, Hannah H. Deng, Jaime Gateno, Michael A. K. Liebschner, James J. Xia, Pingkun Yan

Abstract: In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach.

1.ProNav: Proprioceptive Traversability Estimation for Autonomous Legged Robot Navigation in Outdoor Environments

Authors:Mohamed Elnoor, Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Dinesh Manocha

Abstract: We propose a novel method, ProNav, which uses proprioceptive signals for traversability estimation in challenging outdoor terrains for autonomous legged robot navigation. Our approach uses sensor data from a legged robot's joint encoders, force, and current sensors to measure the joint positions, forces, and current consumption respectively to accurately assess a terrain's stability, resistance to the robot's motion, risk of entrapment, and crash. Based on these factors, we compute the appropriate robot trajectories and gait to maximize stability and minimize energy consumption. Our approach can also be used to predict imminent crashes in challenging terrains and execute behaviors to preemptively avoid them. We integrate ProNav with a method to navigate dense vegetation and demonstrate our method's benefits in real-world terrains with dense bushes, high granularity, negative obstacles, etc. Our method shows an improvement up to 50% in terms of success rate and up to 35% in terms of energy efficiency.

2.Online Continual Learning for Robust Indoor Object Recognition

Authors:Umberto Michieli, Mete Ozay

Abstract: Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e.g., horizontal or vertical) and environments (e.g., day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed.

3.Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Authors:Changyu Lee, Dongha Chung, Jonghwi Kim, Jinwhan Kim

Abstract: In this paper, we describe the development process of autonomous navigation capabilities of a small cruise boat operating in a canal environment and present the results of a field experiment conducted in the Pohang Canal, South Korea. Nonlinear model predictive control (NMPC) was used for the online trajectory planning and tracking control of the cruise boat in a narrow passage in the canal. To consider the nonlinear characteristics of boat dynamics, system identification was performed using experimental data from various test maneuvers, such as acceleration-deceleration and zigzag trials. To efficiently represent the obstacle structures in the canal environment, we parameterized the canal walls as line segments with point cloud data, captured by an onboard LiDAR sensor, and considered them as constraints for obstacle avoidance. The proposed method was implemented in a single NMPC layer, and its real-world performance was verified through experimental runs in the Pohang Canal.

4.Agricultural Robotic System: The Automation of Detection and Speech Control

Authors:Yang Wenkai, Ji Ruihang, Yue Yiran, Gu Zhonghan, Shu Wanyang, Sam Ge Shuzhi

Abstract: Agriculture industries often face challenges in manual tasks such as planting, harvesting, fertilizing, and detection, which can be time consuming and prone to errors. The "Agricultural Robotic System" project addresses these issues through a modular design that integrates advanced visual, speech recognition, and robotic technologies. This system is comprised of separate but interconnected modules for vision detection and speech recognition, creating a flexible and adaptable solution. The vision detection module uses computer vision techniques, trained on YOLOv5 and deployed on the Jetson Nano in TensorRT format, to accurately detect and identify different items. A robotic arm module then precisely controls the picking up of seedlings or seeds, and arranges them in specific locations. The speech recognition module enhances intelligent human robot interaction, allowing for efficient and intuitive control of the system. This modular approach improves the efficiency and accuracy of agricultural tasks, demonstrating the potential of robotics in the agricultural industry.

5.A Shared Control Approach Based on First-Order Dynamical Systems and Closed-Loop Variable Stiffness Control

Authors:Haotian Xue, Youssef Michel, Dongheui Lee

Abstract: In this paper, we present a novel learning-based shared control framework. This framework deploys first-order Dynamical Systems (DS) as motion generators providing the desired reference motion, and a Variable Stiffness Dynamical Systems (VSDS) \cite{chen2021closed} for haptic guidance. We show how to shape several features of our controller in order to achieve authority allocation, local motion refinement, in addition to the inherent ability of the controller to automatically synchronize with the human state during joint task execution. We validate our approach in a teleoperated task scenario, where we also showcase the ability of our framework to deal with situations that require updating task knowledge due to possible changes in the task scenario, or changes in the environment. Finally, we conduct a user study to compare the performance of our VSDS controller for guidance generation to two state-of-the-art controllers in a target reaching task. The result shows that our VSDS controller has the highest successful rate of task execution among all conditions. Besides, our VSDS controller helps reduce the execution time and task load significantly, and was selected as the most favorable controller by participants.

6.XSkill: Cross Embodiment Skill Discovery

Authors:Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song

Abstract: Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework. The performance of XSkill is best understood from the anonymous website:

7.Optimizing the extended Fourier Mellin Transformation Algorithm

Authors:Wenqing Jiang, Chengqian Li, Jinyue Cao, Sören Schwertfeger

Abstract: With the increasing application of robots, stable and efficient Visual Odometry (VO) algorithms are becoming more and more important. Based on the Fourier Mellin Transformation (FMT) algorithm, the extended Fourier Mellin Transformation (eFMT) is an image registration approach that can be applied to downward-looking cameras, for example on aerial and underwater vehicles. eFMT extends FMT to multi-depth scenes and thus more application scenarios. It is a visual odometry method which estimates the pose transformation between three overlapping images. On this basis, we develop an optimized eFMT algorithm that improves certain aspects of the method and combines it with back-end optimization for the small loop of three consecutive frames. For this we investigate the extraction of uncertainty information from the eFMT registration, the related objective function and the graph-based optimization. Finally, we design a series of experiments to investigate the properties of this approach and compare it with other VO and SLAM (Simultaneous Localization and Mapping) algorithms. The results show the superior accuracy and speed of our o-eFMT approach, which is published as open source.

8.RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023

Authors:Aline Lima de Oliveira, Cauê Addae da Silva Gomes, Cecília Virginia Santos da Silva, Charles Matheus de Sousa Alves, Danilo Andrade Martins de Souza, Driele Pires Ferreira Araújo Xavier, Edgleyson Pereira da Silva, Felipe Bezerra Martins, Lucas Henrique Cavalcanti Santos, Lucas Dias Maciel, Matheus Paixão Gumercindo dos Santos, Matheus Lafayette Vasconcelos, Matheus Vinícius Teotonio do Nascimento Andrade, João Guilherme Oliveira Carvalho de Melo, João Pedro Souza Pereira de Moura, José Ronald da Silva, José Victor Silva Cruz, Pedro Henrique Santana de Morais, Pedro Paulo Salman de Oliveira, Riei Joaquim Matos Rodrigues, Roberto Costa Fernandes, Ryan Vinicius Santos Morais, Tamara Mayara Ramos Teobaldo, Washington Igor dos Santos Silva, Edna Natividade Silva Barros

Abstract: Rob\^oCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Our team has successfully published 2 articles related to SSL at two high-impact conferences: the 25th RoboCup International Symposium and the 19th IEEE Latin American Robotics Symposium (LARS 2022). Over the last year, we have been continuously migrating from our past codebase to Unification. We will describe the new architecture implemented and some points of software and AI refactoring. In addition, we discuss the process of integrating machined components into the mechanical system, our development for participating in the vision blackout challenge last year and what we are preparing for this year.

9.BERRY: Bit Error Robustness for Energy-Efficient Reinforcement Learning-Based Autonomous Systems

Authors:Zishen Wan, Nandhini Chandramoorthy, Karthik Swaminathan, Pin-Yu Chen, Vijay Janapa Reddi, Arijit Raychowdhury

Abstract: Autonomous systems, such as Unmanned Aerial Vehicles (UAVs), are expected to run complex reinforcement learning (RL) models to execute fully autonomous position-navigation-time tasks within stringent onboard weight and power constraints. We observe that reducing onboard operating voltage can benefit the energy efficiency of both the computation and flight mission, however, it can also result in on-chip bit failures that are detrimental to mission safety and performance. To this end, we propose BERRY, a robust learning framework to improve bit error robustness and energy efficiency for RL-enabled autonomous systems. BERRY supports robust learning, both offline and on-board the UAV, and for the first time, demonstrates the practicality of robust low-voltage operation on UAVs that leads to high energy savings in both compute-level operation and system-level quality-of-flight. We perform extensive experiments on 72 autonomous navigation scenarios and demonstrate that BERRY generalizes well across environments, UAVs, autonomy policies, operating voltages and fault patterns, and consistently improves robustness, efficiency and mission performance, achieving up to 15.62% reduction in flight energy, 18.51% increase in the number of successful missions, and 3.43x processing energy reduction.

10.Object-centric Representations for Interactive Online Learning with Non-Parametric Methods

Authors:Nikhil U. Shinde, Jacob Johnson, Sylvia Herbert, Michael C. Yip

Abstract: Large offline learning-based models have enabled robots to successfully interact with objects for a wide variety of tasks. However, these models rely on fairly consistent structured environments. For more unstructured environments, an online learning component is necessary to gather and estimate information about objects in the environment in order to successfully interact with them. Unfortunately, online learning methods like Bayesian non-parametric models struggle with changes in the environment, which is often the desired outcome of interaction-based tasks. We propose using an object-centric representation for interactive online learning. This representation is generated by transforming the robot's actions into the object's coordinate frame. We demonstrate how switching to this task-relevant space improves our ability to reason with the training data collected online, enabling scalable online learning of robot-object interactions. We showcase our method by successfully navigating a manipulator arm through an environment with multiple unknown objects without violating interaction-based constraints.

11.Scientific Exploration of Challenging Planetary Analog Environments with a Team of Legged Robots

Authors:Philip Arm, Gabriel Waibel, Jan Preisig, Turcan Tuna, Ruyi Zhou, Valentin Bickel, Gabriela Ligeza, Takahiro Miki, Florian Kehl, Hendrik Kolvenbach, Marco Hutter

Abstract: The interest in exploring planetary bodies for scientific investigation and in-situ resource utilization is ever-rising. Yet, many sites of interest are inaccessible to state-of-the-art planetary exploration robots because of the robots' inability to traverse steep slopes, unstructured terrain, and loose soil. Additionally, current single-robot approaches only allow a limited exploration speed and a single set of skills. Here, we present a team of legged robots with complementary skills for exploration missions in challenging planetary analog environments. We equipped the robots with an efficient locomotion controller, a mapping pipeline for online and post-mission visualization, instance segmentation to highlight scientific targets, and scientific instruments for remote and in-situ investigation. Furthermore, we integrated a robotic arm on one of the robots to enable high-precision measurements. Legged robots can swiftly navigate representative terrains, such as granular slopes beyond 25 degrees, loose soil, and unstructured terrain, highlighting their advantages compared to wheeled rover systems. We successfully verified the approach in analog deployments at the BeyondGravity ExoMars rover testbed, in a quarry in Switzerland, and at the Space Resources Challenge in Luxembourg. Our results show that a team of legged robots with advanced locomotion, perception, and measurement skills, as well as task-level autonomy, can conduct successful, effective missions in a short time. Our approach enables the scientific exploration of planetary target sites that are currently out of human and robotic reach.

12.Eversion Robots for Mapping Radiation in Pipes

Authors:Thomas Mack, Mohammed Al-Dubooni, Kaspar Althoefer

Abstract: A system and testing rig were designed and built to simulate the use of an eversion robot equipped with a radiation sensor to characterise an irradiated pipe prior to decommissioning. The magnets were used as dummy radiation sources which were detected by a hall effect sensor mounted in the interior of the robot. The robot successfully navigated a simple structure with sharp 45{\deg} and 90{\deg} swept bends as well as constrictions that were used to model partial blockages.

13.Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

Authors:Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim

Abstract: The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.

14.Contact-aware Shaping and Maintenance of Deformable Linear Objects With Fixtures

Authors:Kejia Chen, Zhenshan Bing, Fan Wu, Yuan Meng, Andre Kraft, Sami Haddadin, Alois Knoll

Abstract: Studying the manipulation of deformable linear objects has significant practical applications in industry, including car manufacturing, textile production, and electronics automation. However, deformable linear object manipulation poses a significant challenge in developing planning and control algorithms, due to the precise and continuous control required to effectively manipulate the deformable nature of these objects. In this paper, we propose a new framework to control and maintain the shape of deformable linear objects with two robot manipulators utilizing environmental contacts. The framework is composed of a shape planning algorithm which automatically generates appropriate positions to place fixtures, and an object-centered skill engine which includes task and motion planning to control the motion and force of both robots based on the object status. The status of the deformable linear object is estimated online utilizing visual as well as force information. The framework manages to handle a cable routing task in real-world experiments with two Panda robots and especially achieves contact-aware and flexible clip fixing with challenging fixtures.

15.Robust Driving Policy Learning with Guided Meta Reinforcement Learning

Authors:Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer

Abstract: Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.

1.3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving

Authors:Qipeng Li, Yuan Zhuang, Yiwen Chen, Jianzhu Huai, Miao Li, Tianbing Ma, Yufei Tang, Xinlian Liang

Abstract: For the SLAM system in robotics and autonomous driving, the accuracy of front-end odometry and back-end loop-closure detection determine the whole intelligent system performance. But the LiDAR-SLAM could be disturbed by current scene moving objects, resulting in drift errors and even loop-closure failure. Thus, the ability to detect and segment moving objects is essential for high-precision positioning and building a consistent map. In this paper, we address the problem of moving object segmentation from 3D LiDAR scans to improve the odometry and loop-closure accuracy of SLAM. We propose a novel 3D Sequential Moving-Object-Segmentation (3D-SeqMOS) method that can accurately segment the scene into moving and static objects, such as moving and static cars. Different from the existing projected-image method, we process the raw 3D point cloud and build a 3D convolution neural network for MOS task. In addition, to make full use of the spatio-temporal information of point cloud, we propose a point cloud residual mechanism using the spatial features of current scan and the temporal features of previous residual scans. Besides, we build a complete SLAM framework to verify the effectiveness and accuracy of 3D-SeqMOS. Experiments on SemanticKITTI dataset show that our proposed 3D-SeqMOS method can effectively detect moving objects and improve the accuracy of LiDAR odometry and loop-closure detection. The test results show our 3D-SeqMOS outperforms the state-of-the-art method by 12.4%. We extend the proposed method to the SemanticKITTI: Moving Object Segmentation competition and achieve the 2nd in the leaderboard, showing its effectiveness.

2.Implementation and Evaluation of Networked Model Predictive Control System on Universal Robot

Authors:Mahsa Noroozi, Kai Wang

Abstract: Networked control systems are closed-loop feedback control systems containing system components that may be distributed geographically in different locations and interconnected via a communication network such as the Internet. The quality of network communication is a crucial factor that significantly affects the performance of remote control. This is due to the fact that network uncertainties can occur in the transmission of packets in the forward and backward channels of the system. The two most significant among these uncertainties are network time delay and packet loss. To overcome these challenges, the networked predictive control system has been proposed to provide improved performance and robustness using predictive controllers and compensation strategies. In particular, the model predictive control method is well-suited as an advanced approach compared to conventional methods. In this paper, a networked model predictive control system consisting of a model predictive control method and compensation strategies is implemented to control and stabilize a robot arm as a physical system. In particular, this work aims to analyze the performance of the system under the influence of network time delay and packet loss. Using appropriate performance and robustness metrics, an in-depth investigation of the impacts of these network uncertainties is performed. Furthermore, the forward and backward channels of the network are examined in detail in this study.

3.Sampling-based Model Predictive Control Leveraging Parallelizable Physics Simulations

Authors:Corrado Pezzato, Chadi Salmi, Max Spahn, Elia Trevisan, Javier Alonso-Mora, Carlos Hernandez Corbato

Abstract: We present a method for sampling-based model predictive control that makes use of a generic physics simulator as the dynamical model. In particular, we propose a Model Predictive Path Integral controller (MPPI), that uses the GPU-parallelizable IsaacGym simulator to compute the forward dynamics of a problem. By doing so, we eliminate the need for manual encoding of robot dynamics and interactions among objects and allow one to effortlessly solve complex navigation and contact-rich tasks. Since no explicit dynamic modeling is required, the method is easily extendable to different objects and robots. We demonstrate the effectiveness of this method in several simulated and real-world settings, among which mobile navigation with collision avoidance, non-prehensile manipulation, and whole-body control for high-dimensional configuration spaces. This method is a powerful and accessible tool to solve a large variety of contact-rich motion planning tasks.

4.Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model

Authors:Suresh Guttikonda, Jan Achterhold, Haolong Li, Joschka Boedecker, Joerg Stueckler

Abstract: In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.

5.Patrolling Grids with a Bit of Memory

Authors:Michael Amir, Dmitry Rabinovich, Alfred M. Bruckstein

Abstract: We study the following problem in elementary robotics: can a mobile agent with $b$ bits of memory, which is able to sense only locations at Manhattan distance $V$ or less from itself, patrol a $d$-dimensional grid graph? We show that it is impossible to patrol some grid graphs with $0$ bits of memory, regardless of $V$, and give an exact characterization of those grid graphs that can be patrolled with $0$ bits of memory and visibility range $V$. On the other hand, we show that, surprisingly, an algorithm exists using $1$ bit of memory and $V=1$ that patrols any $d$-dimensional grid graph.

6.Task Space Control of Hydraulic Construction Machines using Reinforcement Learning

Authors:Hyung Joo Lee, Sigrid Brell-Cokcan

Abstract: Teleoperation is vital in the construction industry, allowing safe machine manipulation from a distance. However, controlling machines at a joint level requires extensive training due to their complex degrees of freedom. Task space control offers intuitive maneuvering, but precise control often requires dynamic models, posing challenges for hydraulic machines. To address this, we use a data-driven actuator model to capture machine dynamics in real-world operations. By integrating this model into simulation and reinforcement learning, an optimal control policy for task space control is obtained. Experiments with Brokk 170 validate the framework, comparing it to a well-known Jacobian-based approach.

7.Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization

Authors:Yajia Zhang, Hongyi Sun, Ruizhi Chai, Daike Kang, Shan Li, Liyun Li

Abstract: Vehicle trajectory planning is a key component for an autonomous driving system. A practical system not only requires the component to compute a feasible trajectory, but also a comfortable one given certain comfort metrics. Nevertheless, computation efficiency is critical for the system to be deployed as a commercial product. In this paper, we present a novel trajectory planning algorithm based on nonlinear optimization. The algorithm computes a kinematically feasible and comfort-optimal trajectory that achieves collision avoidance with static obstacles. Furthermore, the algorithm is time efficient. It generates an 6-second trajectory within 10 milliseconds on an Intel i7 machine or 20 milliseconds on an Nvidia Drive Orin platform.

1.Adaptive Compliant Robot Control with Failure Recovery for Object Press-Fitting

Authors:Ekansh Sharma, Christoph Henke, Alex Mitrevski, Paul G. Plöger

Abstract: Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we monitor the execution through distance and force feedback, detect collisions while the robot is performing the press-fit task, and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.

2.GHACPP: Genetic-based Human-Aware Coverage Path Planning Algorithm for Autonomous Disinfection Robot

Authors:Stepan Perminov, Ivan Kalinov, Dzmitry Tsetserukou

Abstract: Numerous mobile robots with mounted Ultraviolet-C (UV-C) lamps were developed recently, yet they cannot work in the same space as humans without irradiating them by UV-C. This paper proposes a novel modular and scalable Human-Aware Genetic-based Coverage Path Planning algorithm (GHACPP), that aims to solve the problem of disinfecting of unknown environments by UV-C irradiation and preventing human eyes and skin from being harmed. The proposed genetic-based algorithm alternates between the stages of exploring a new area, generating parts of the resulting disinfection trajectory, called mini-trajectories, and updating the current state around the robot. The system performance in effectiveness and human safety is validated and compared with one of the latest state-of-the-art online coverage path planning algorithms called SimExCoverage-STC. The experimental results confirmed both the high level of safety for humans and the efficiency of the developed algorithm in terms of decrease of path length (by 37.1%), number (39.5%) and size (35.2%) of turns, and time (7.6%) to complete the disinfection task, with a small loss in the percentage of area covered (0.6%), in comparison with the state-of-the-art approach.

3.Building Volumetric Beliefs for Dynamic Environments Exploiting Map-Based Moving Object Segmentation

Authors:Benedikt Mersch, Tiziano Guadagnino, Xieyuanli Chen, Ignacio Vizzo, Jens Behley, Cyrill Stachniss

Abstract: Mobile robots that navigate in unknown environments need to be constantly aware of the dynamic objects in their surroundings for mapping, localization, and planning. It is key to reason about moving objects in the current observation and at the same time to also update the internal model of the static world to ensure safety. In this paper, we address the problem of jointly estimating moving objects in the current 3D LiDAR scan and a local map of the environment. We use sparse 4D convolutions to extract spatio-temporal features from scan and local map and segment all 3D points into moving and non-moving ones. Additionally, we propose to fuse these predictions in a probabilistic representation of the dynamic environment using a Bayes filter. This volumetric belief models, which parts of the environment can be occupied by moving objects. Our experiments show that our approach outperforms existing moving object segmentation baselines and even generalizes to different types of LiDAR sensors. We demonstrate that our volumetric belief fusion can increase the precision and recall of moving object segmentation and even retrieve previously missed moving objects in an online mapping scenario.

4.Human Emergency Detection during Autonomous Hospital Transports

Authors:Andreas Zachariae, Julia Widera, Frederik Plahl, Björn Hein, Christian Wurll

Abstract: Human transports in hospitals are labor-intensive and primarily performed in beds to save time. This transfer method does not promote the mobility or autonomy of the patient. To relieve the caregivers from this time-consuming task, a mobile robot is developed to autonomously transport humans around the hospital. It provides different transfer modes including walking and sitting in a wheelchair. The problem that this paper focuses on is to detect emergencies and ensure the well-being of the patient during the transport. For this purpose, the patient is tracked and monitored with a camera system. OpenPose is used for Human Pose Estimation and a trained classifier for emergency detection. We collected and published a dataset of 18,000 images in lab and hospital environments. It differs from related work because we have a moving robot with different transfer modes in a highly dynamic environment with multiple people in the scene using only RGB-D data. To improve the critical recall metric, we apply threshold moving and a time delay. We compare different models with an AutoML approach. This paper shows that emergencies while walking are best detected by a SVM with a recall of 95.8% on single frames. In the case of sitting transport, the best model achieves a recall of 62.2%. The contribution is to establish a baseline on this new dataset and to provide a proof of concept for the human emergency detection in this use case.

5.ArUcoGlide: a Novel Wearable Robot for Position Tracking and Haptic Feedback to Increase Safety During Human-Robot Interaction

Authors:Ali Alabbas, Miguel Altamirano Cabrera, Oussama Alyounes, Dzmitry Tsetserukou

Abstract: The current capabilities of robotic systems make human collaboration necessary to accomplish complex tasks effectively. In this work, we are introducing a framework to ensure safety in a human-robot collaborative environment. The system is composed of a wearable 2-DOF robot, a low-cost and easy-to-install tracking system, and a collision avoidance algorithm based on the Artificial Potential Field (APF). The wearable robot is designed to hold a fiducial marker and maintain its visibility to the tracking system, which, in turn, localizes the user's hand with good accuracy and low latency and provides haptic feedback to the user. The system is designed to enhance the performance of collaborative tasks while ensuring user safety. Three experiments were carried out to evaluate the performance of the proposed system. The first one evaluated the accuracy of the tracking system. The second experiment analyzed human-robot behavior during an imminent collision. The third experiment evaluated the system in a collaborative activity in a shared working environment. The results show that the implementation of the introduced system reduces the operation time by 16% and increases the average distance between the user's hand and the robot by 5 cm.

6.Clarifying the Half Full or Half Empty Question: Multimodal Container Classification

Authors:Josua Spisak, Matthias Kerzel, Stefan Wermter

Abstract: Multimodal integration is a key component of allowing robots to perceive the world. Multimodality comes with multiple challenges that have to be considered, such as how to integrate and fuse the data. In this paper, we compare different possibilities of fusing visual, tactile and proprioceptive data. The data is directly recorded on the NICOL robot in an experimental setup in which the robot has to classify containers and their content. Due to the different nature of the containers, the use of the modalities can wildly differ between the classes. We demonstrate the superiority of multimodal solutions in this use case and evaluate three fusion strategies that integrate the data at different time steps. We find that the accuracy of the best fusion strategy is 15% higher than the best strategy using only one singular sense.

7.Occupancy Grid Mapping without Ray-Casting for High-resolution LiDAR Sensors

Authors:Yixi Cai, Fanze Kong, Yunfan Ren, Fangcheng Zhu, Jiarong Lin, Fu Zhang

Abstract: Occupancy mapping is a fundamental component of robotic systems to reason about the unknown and known regions of the environment. This article presents an efficient occupancy mapping framework for high-resolution LiDAR sensors, termed D-Map. The framework introduces three main novelties to address the computational efficiency challenges of occupancy mapping. Firstly, we use a depth image to determine the occupancy state of regions instead of the traditional ray-casting method. Secondly, we introduce an efficient on-tree update strategy on a tree-based map structure. These two techniques avoid redundant visits to small cells, significantly reducing the number of cells to be updated. Thirdly, we remove known cells from the map at each update by leveraging the low false alarm rate of LiDAR sensors. This approach not only enhances our framework's update efficiency by reducing map size but also endows it with an interesting decremental property, which we have named D-Map. To support our design, we provide theoretical analyses of the accuracy of the depth image projection and time complexity of occupancy updates. Furthermore, we conduct extensive benchmark experiments on various LiDAR sensors in both public and private datasets. Our framework demonstrates superior efficiency in comparison with other state-of-the-art methods while maintaining comparable mapping accuracy and high memory efficiency. We demonstrate two real-world applications of D-Map for real-time occupancy mapping on a handle device and an aerial platform carrying a high-resolution LiDAR. In addition, we open-source the implementation of D-Map on GitHub to benefit society:

8.Congestion and Scalability in Robot Swarms: a Study on Collective Decision Making

Authors:Karthik Soma, Vivek Shankar Vardharajan, Heiko Hamann, Giovanni Beltrame

Abstract: One of the most important promises of decentralized systems is scalability, which is often assumed to be present in robot swarm systems without being contested. Simple limitations, such as movement congestion and communication conflicts, can drastically affect scalability. In this work, we study the effects of congestion in a binary collective decision-making task. We evaluate the impact of two types of congestion (communication and movement) when using three different techniques for the task: Honey Bee inspired, Stigmergy based, and Division of Labor. We deploy up to 150 robots in a physics-based simulator performing a sampling mission in an arena with variable levels of robot density, applying the three techniques. Our results suggest that applying Division of Labor coupled with versioned local communication helps to scale the system by minimizing congestion.

9.A Nested U-Structure for Instrument Segmentation in Robotic Surgery

Authors:Yanjie Xia, Shaochen Wang, Zhen Kan

Abstract: Robot-assisted surgery has made great progress with the development of medical imaging and robotics technology. Medical scene understanding can greatly improve surgical performance while the semantic segmentation of the robotic instrument is a key enabling technology for robot-assisted surgery. However, how to locate an instrument's position and estimate their pose in complex surgical environments is still a challenging fundamental problem. In this paper, pixel-wise instrument segmentation is investigated. The contributions of the paper are twofold: 1) We proposed a two-level nested U-structure model, which is an encoder-decoder architecture with skip-connections and each layer of the network structure adopts a U-structure instead of a simple superposition of convolutional layers. The model can capture more context information from multiple scales and better fuse the local and global information to achieve high-quality segmentation. 2) Experiments have been conducted to qualitatively and quantitatively show the performance of our approach on three segmentation tasks: the binary segmentation, the parts segmentation, and the type segmentation, respectively.

10.Uncertainty-Aware Acoustic Localization and Mapping for Underwater Robots

Authors:Jingyu Song, Onur Bagoren, Katherine A. Skinner

Abstract: For underwater vehicles, robotic applications have the added difficulty of operating in highly unstructured and dynamic environments. Environmental effects impact not only the dynamics and controls of the robot but also the perception and sensing modalities. Acoustic sensors, which inherently use mechanically vibrated signals for measuring range or velocity, are particularly prone to the effects that such dynamic environments induce. This paper presents an uncertainty-aware localization and mapping framework that accounts for induced disturbances in acoustic sensing modalities for underwater robots operating near the surface in dynamic wave conditions. For the state estimation task, the uncertainty is accounted for as the added noise caused by the environmental disturbance. The mapping method uses an adaptive kernel-based method to propagate measurement and pose uncertainty into an occupancy map. Experiments are carried out in a wave tank environment to perform qualitative and quantitative evaluations of the proposed method. More details about this project can be found at

11.A Study in Zucker: Insights on Human-Robot Interactions

Authors:Alex Day, Ioannis Karamouzas

Abstract: In recent years there has been a large focus on how robots can operate in human populated environments. In this paper, we focus on interactions between humans and small indoor robots and introduce a new human-robot interaction (HRI) dataset. The analysis of the recorded experiments shows that anticipatory and non-reactive robot controllers impose similar constraints to humans' safety and efficiency. Additionally, we found that current state-of-the-art models for human trajectory prediction can adequately extend to indoor HRI settings. Finally, we show that humans respond differently in shared and homogeneous environments when collisions are imminent, since interacting with small differential drives can only cause a finite level of social discomfort as compared to human-human interactions. The dataset used in this analysis is available at:

12.Robotic Exploration for Mapping

Authors:Akanimoh Adeleye

Abstract: Robotic Exploration has evolved rapidly in the past two decades as new and more complex techniques have been created to explore unknown regions efficiently. Exciting advancements in exploration, autonomous navigation, and sensor technology have created opportunities for robots to be utilized in new environments and for new objectives ranging from mapping of abandon mines and deep oceans to the efficient creation of indoor models for navigation and search. In this paper we present and discuss a number of examples in research literature of these recent advancements, specifically focusing on robotic exploration algorithms for unmanned vehicles.

1.Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Authors:Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao

Abstract: In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To address this problem, we propose that an ideal AD system should drive like a human, accumulating experience through continuous driving and using common sense to solve problems. To achieve this goal, we identify three key abilities necessary for an AD system: reasoning, interpretation, and memorization. We demonstrate the feasibility of employing an LLM in driving scenarios by building a closed-loop system to showcase its comprehension and environment-interaction abilities. Our extensive experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases, providing valuable insights for the development of human-like autonomous driving. The related code are available at .

2.Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks

Authors:Ryosuke Korekata, Motonari Kambara, Yu Yoshida, Shintaro Ishikawa, Yosuke Kawasaki, Masaki Takahashi, Komei Sugiura

Abstract: This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target object to the destination. Most of the existing multimodal language understanding methods are impractical in terms of computational complexity because they require inferences for all combinations of target object candidates and destination candidates. We propose Switching Head-Tail Funnel UNITER, which solves the task by predicting the target object and the destination individually using a single model. Our method is validated on a newly-built dataset consisting of object manipulation instructions and semi photo-realistic images captured in a standard Embodied AI simulator. The results show that our method outperforms the baseline method in terms of language comprehension accuracy. Furthermore, we conduct physical experiments in which a DSR delivers standardized everyday objects in a standardized domestic environment as requested by instructions with referring expressions. The experimental results show that the object grasping and placing actions are achieved with success rates of more than 90%.

3.A Dynamic Points Removal Benchmark in Point Cloud Maps

Authors:Qingwen Zhang, Daniel Duberg, Ruoyu Geng, Mingkai Jia, Lujia Wang, Patric Jensfelt

Abstract: In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we propose an easy-to-extend unified benchmarking framework for evaluating techniques for removing dynamic points in maps. It includes refactored state-of-art methods and novel metrics to analyze the limitations of these approaches. This enables researchers to dive deep into the underlying reasons behind these limitations. The benchmark makes use of several datasets with different sensor types. All the code and datasets related to our study are publicly available for further development and utilization.

4.Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment

Authors:Kenji Leong

Abstract: Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in autonomous robotics, enabling robots to navigate to new regions while building an accurate model of their surroundings. Visual SLAM is a popular technique that uses virtual elements to enhance the experience. However, existing frontier-based exploration strategies can lead to a non-optimal path in scenarios where there are multiple frontiers with similar distance. This issue can impact the efficiency and accuracy of Visual SLAM, which is crucial for a wide range of robotic applications, such as search and rescue, exploration, and mapping. To address this issue, this research combines both an existing Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed algorithm allows the robot to learn and optimize exploration routes through a reward-based system to create an accurate map of the environment with proper frontier selection. Frontier-based exploration is used to detect unexplored areas, while reinforcement learning optimizes the robot's movement by assigning rewards for optimal frontier points. Graph SLAM is then used to integrate the robot's sensory data and build an accurate map of the environment. The proposed algorithm aims to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers to build a more accurate map. To evaluate the effectiveness of the proposed approach, experiments will be conducted in various virtual environments using Gazebo, a robot simulation software. Results of these experiments will be compared with existing methods to demonstrate the potential of the proposed approach as an optimal solution for SLAM in autonomous robotics.

5.Distributed Planning for Rigid Robot Formations using Consensus on the Transformation of a Base Configuration

Authors:Jeppe Heini Mikkelsen, Matteo Fumagalli

Abstract: This paper presents a novel planning method that achieves navigation of multi-robot formations in cluttered environments, while maintaining the formation throughout the robots motion. The method utilises a decentralised approach to find feasible formation parameters that guarantees formation constraints for rigid formations. The method proves to be computationally efficient, making it relevant for reactive planning and control of multi-robot systems formation. The method has been tested in a simulation environment to prove feasibility and run-time efficiency.

6.SDF-Pack: Towards Compact Bin Packing with Signed-Distance-Field Minimization

Authors:Jia-Hui Pan, Ka-Hei Hui, Xiaojie Gao, Shize Zhu, Yun-Hui Liu, Pheng-Ann Heng, Chi-Wing Fu

Abstract: Robotic bin packing is very challenging, especially when considering practical needs such as object variety and packing compactness. This paper presents SDF-Pack, a new approach based on signed distance field (SDF) to model the geometric condition of objects in a container and compute the object placement locations and packing orders for achieving a more compact bin packing. Our method adopts a truncated SDF representation to localize the computation, and based on it, we formulate the SDF minimization heuristic to find optimized placements to compactly pack objects with the existing ones. To further improve space utilization, if the packing sequence is controllable, our method can suggest which object to be packed next. Experimental results on a large variety of everyday objects show that our method can consistently achieve higher packing compactness over 1,000 packing cases, enabling us to pack more objects into the container, compared with the existing heuristics under various packing settings.

7.Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders

Authors:Guanqun Cao, Jiaqi Jiang, Danushka Bollegala, Shan Luo

Abstract: The missing signal caused by the objects being occluded or an unstable sensor is a common challenge during data collection. Such missing signals will adversely affect the results obtained from the data, and this issue is observed more frequently in robotic tactile perception. In tactile perception, due to the limited working space and the dynamic environment, the contact between the tactile sensor and the object is frequently insufficient and unstable, which causes the partial loss of signals, thus leading to incomplete tactile data. The tactile data will therefore contain fewer tactile cues with low information density. In this paper, we propose a tactile representation learning method, named TacMAE, based on Masked Autoencoder to address the problem of incomplete tactile data in tactile perception. In our framework, a portion of the tactile image is masked out to simulate the missing contact region. By reconstructing the missing signals in the tactile image, the trained model can achieve a high-level understanding of surface geometry and tactile properties from limited tactile cues. The experimental results of tactile texture recognition show that our proposed TacMAE can achieve a high recognition accuracy of 71.4% in the zero-shot transfer and 85.8% after fine-tuning, which are 15.2% and 8.2% higher than the results without using masked modeling. The extensive experiments on YCB objects demonstrate the knowledge transferability of our proposed method and the potential to improve efficiency in tactile exploration.

8.SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation

Authors:Dohyun Kim, Yeseung Kim, Jaehwi Jang, Minjae Song, Woojin Choi, Daehyung Park

Abstract: The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately grounding language utterances gives a significant challenge due to the acoustic variability in speakers' voices and environmental noise. In this work, we propose a novel speech-scene graph grounding network (SGGNet$^2$) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems. To incorporate the acoustic similarity, we extend our previous grounding model, the scene-graph-based grounding network (SGGNet), with the ASR model from NVIDIA NeMo. We accomplish this by feeding the latent vector of speech pronunciations into the BERT-based grounding network within SGGNet. We evaluate the effectiveness of using latent vectors of speech commands in grounding through qualitative and quantitative studies. We also demonstrate the capability of SGGNet$^2$ in a speech-based navigation task using a real quadruped robot, RBQ-3, from Rainbow Robotics.

1.FF-LINS: A Consistent Frame-to-Frame Solid-State-LiDAR-Inertial State Estimator

Authors:Hailiang Tang, Tisheng Zhang, Xiaoji Niu, Liqiang Wang, Linfu Wei, Jingnan Liu

Abstract: Most of the existing LiDAR-inertial navigation systems are based on frame-to-map registrations, leading to inconsistency in state estimation. The newest solid-state LiDAR with a non-repetitive scanning pattern makes it possible to achieve a consistent LiDAR-inertial estimator by employing a frame-to-frame data association. In this letter, we propose a robust and consistent frame-to-frame LiDAR-inertial navigation system (FF-LINS) for solid-state LiDARs. With the INS-centric LiDAR frame processing, the keyframe point-cloud map is built using the accumulated point clouds to construct the frame-to-frame data association. The LiDAR frame-to-frame and the inertial measurement unit (IMU) preintegration measurements are tightly integrated using the factor graph optimization, with online calibration of the LiDAR-IMU extrinsic and time-delay parameters. The experiments on the public and private datasets demonstrate that the proposed FF-LINS achieves superior accuracy and robustness than the state-of-the-art systems. Besides, the LiDAR-IMU extrinsic and time-delay parameters are estimated effectively, and the online calibration notably improves the pose accuracy. The proposed FF-LINS and the employed datasets are open-sourced on GitHub (

2.DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle

Authors:Oskar Natan, Jun Miura

Abstract: We present DeepIPCv2, an autonomous driving model that perceives the environment using a LiDAR sensor for more robust drivability, especially when driving under poor illumination conditions. DeepIPCv2 takes a set of LiDAR point clouds for its main perception input. As point clouds are not affected by illumination changes, they can provide a clear observation of the surroundings no matter what the condition is. This results in a better scene understanding and stable features provided by the perception module to support the controller module in estimating navigational control properly. To evaluate its performance, we conduct several tests by deploying the model to predict a set of driving records and perform real automated driving under three different conditions. We also conduct ablation and comparative studies with some recent models to justify its performance. Based on the experimental results, DeepIPCv2 shows a robust performance by achieving the best drivability in all conditions. Codes are available at

3.Aeolus Ocean -- A simulation environment for the autonomous COLREG-compliant navigation of Unmanned Surface Vehicles using Deep Reinforcement Learning and Maritime Object Detection

Authors:Andrew Alexander Vekinis, Stavros Perantonis

Abstract: Heading towards navigational autonomy in unmanned surface vehicles (USVs) in the maritime sector can fundamentally lead towards safer waters as well as reduced operating costs, while also providing a range of exciting new capabilities for oceanic research, exploration and monitoring. However, achieving such a goal is challenging. USV control systems must, safely and reliably, be able to adhere to the international regulations for preventing collisions at sea (COLREGs) in encounters with other vessels as they navigate to a given waypoint while being affected by realistic weather conditions, either during the day or at night. To deal with the multitude of possible scenarios, it is critical to have a virtual environment that is able to replicate the realistic operating conditions USVs will encounter, before they can be implemented in the real world. Such "digital twins" form the foundations upon which Deep Reinforcement Learning (DRL) and Computer Vision (CV) algorithms can be used to develop and guide USV control systems. In this paper we describe the novel development of a COLREG-compliant DRL-based collision avoidant navigational system with CV-based awareness in a realistic ocean simulation environment. The performance of the trained autonomous Agents resulting from this approach is evaluated in several successful navigations to set waypoints in both open sea and coastal encounters with other vessels. A binary executable version of the simulator with trained agents is available at

4.Robotic surface exploration with vision and tactile sensing for cracks detection and characterisation

Authors:Francesca Palermo, Bukeikhan Omarali, Changae Oh, Kaspar Althoefer, Ildar Farkhatdinov

Abstract: This paper presents a novel algorithm for crack localisation and detection based on visual and tactile analysis via fibre-optics. A finger-shaped sensor based on fibre-optics is employed for the data acquisition to collect data for the analysis and the experiments. To detect the possible locations of cracks a camera is used to scan an environment while running an object detection algorithm. Once the crack is detected, a fully-connected graph is created from a skeletonised version of the crack. A minimum spanning tree is then employed for calculating the shortest path to explore the crack which is then used to develop the motion planner for the robotic manipulator. The motion planner divides the crack into multiple nodes which are then explored individually. Then, the manipulator starts the exploration and performs the tactile data classification to confirm if there is indeed a crack in that location or just a false positive from the vision algorithm. If a crack is detected, also the length, width, orientation and number of branches are calculated. This is repeated until all the nodes of the crack are explored. In order to validate the complete algorithm, various experiments are performed: comparison of exploration of cracks through full scan and motion planning algorithm, implementation of frequency-based features for crack classification and geometry analysis using a combination of vision and tactile data. From the results of the experiments, it is shown that the proposed algorithm is able to detect cracks and improve the results obtained from vision to correctly classify cracks and their geometry with minimal cost thanks to the motion planning algorithm.

5.Spatio-Temporal Calibration for Omni-Directional Vehicle-Mounted

Authors:Xiao Li, Yi Zhou, Ruibin Guo, Xin Peng, Zongtan Zhou, Huimin Lu

Abstract: We present a solution to the problem of spatio-temporal calibration for event cameras mounted on an onmi-directional vehicle. Different from traditional methods that typically determine the camera's pose with respect to the vehicle's body frame using alignment of trajectories, our approach leverages the kinematic correlation of two sets of linear velocity estimates from event data and wheel odometers, respectively. The overall calibration task consists of estimating the underlying temporal offset between the two heterogeneous sensors, and furthermore, recovering the extrinsic rotation that defines the linear relationship between the two sets of velocity estimates. The first sub-problem is formulated as an optimization one, which looks for the optimal temporal offset that maximizes a correlation measurement invariant to arbitrary linear transformation. Once the temporal offset is compensated, the extrinsic rotation can be worked out with an iterative closed-form solver that incrementally registers associated linear velocity estimates. The proposed algorithm is proved effective on both synthetic data and real data, outperforming traditional methods based on alignment of trajectories.

6.Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Authors:Vincent Schorp, Will Panitch, Kaushik Shivakumar, Vainavi Viswanath, Justin Kerr, Yahav Avigal, Danyal M Fer, Lionel Ott, Ken Goldberg

Abstract: Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at .

7.Embodied Lifelong Learning for Task and Motion Planning

Authors:Jorge A. Mendez, Leslie Pack Kaelbling, Tomás Lozano-Pérez

Abstract: A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.

8.Automatic Routing System for Intelligent Warehouses

Authors:Kelen C. T. Vivaldini, Jorge P. M. Galdames, Thales B. Pasqual, Rafael M. Sobral, Roberto C. Araújo, Marcelo Becker, Glauco A. P. Caurin

Abstract: Automation of logistic processes is essential to improve productivity and reduce costs. In this context, intelligent warehouses are becoming a key to logistic systems thanks to their ability of optimizing transportation tasks and, consequently, reducing costs. This paper initially presents briefly routing systems applied on intelligent warehouses. Then, we present the approach used to develop our router system. This router system is able to solve traffic jams and collisions, generate conflict-free and optimized paths before sending the final paths to the robotic forklifts. It also verifies the progress of all tasks. When a problem occurs, the router system can change the task priorities, routes, etc. in order to avoid new conflicts. In the routing simulations, each vehicle executes its tasks starting from a predefined initial pose, moving to the desired position. Our algorithm is based on Dijkstra's shortest path and the time window approaches and it was implemented in C language. Computer simulation tests were used to validate the algorithm efficiency under different working conditions. Several simulations were carried out using the Player/Stage Simulator to test the algorithms. Thanks to the simulations, we could solve many faults and refine the algorithms before embedding them in real robots.

9.BittyBuzz: A Swarm Robotics Runtime for Tiny Systems

Authors:Ulrich Dah-Achinanon, Emir Khaled Belhaddad, Guillaume Ricard, Giovanni Beltrame

Abstract: Swarm robotics is an emerging field of research which is increasingly attracting attention thanks to the advances in robotics and its potential applications. However, despite the enthusiasm surrounding this area of research, software development for swarm robotics is still a tedious task. That fact is partly due to the lack of dedicated solutions, in particular for low-cost systems to be produced in large numbers and that can have important resource constraints. To address this issue, we introduce BittyBuzz, a novel runtime platform: it allows Buzz, a domain-specific language, to run on microcontrollers while maintaining dynamic memory management. BittyBuzz is designed to fit a flash memory as small as 32 kB (with usable space for scripts) and work with as little as 2 kB of RAM. In this work, we introduce the BittyBuzz implementation, its differences from the original Buzz virtual machine, and its advantages for swarm robotics systems. We show that BittyBuzz is successfully integrated with three robotic platforms with minimal memory footprint and conduct experiments to show computation performance of BittyBuzz. Results show that BittyBuzz can be effectively used to implement common swarm behaviors on microcontroller-based systems.

10.DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

Authors:Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachary Mizrachi, Justin Lin, D. Livingston McPherson, Wendy A. Rogers, Katherine Driggs-Campbell

Abstract: Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.

1.BiRP: Learning Robot Generalized Bimanual Coordination using Relative Parameterization Method on Human Demonstration

Authors:Junjia Liu, Hengyi Sim, Chenzui Li, Fei Chen

Abstract: Human bimanual manipulation can perform more complex tasks than a simple combination of two single arms, which is credited to the spatio-temporal coordination between the arms. However, the description of bimanual coordination is still an open topic in robotics. This makes it difficult to give an explainable coordination paradigm, let alone applied to robotics. In this work, we divide the main bimanual tasks in human daily activities into two types: leader-follower and synergistic coordination. Then we propose a relative parameterization method to learn these types of coordination from human demonstration. It represents coordination as Gaussian mixture models from bimanual demonstration to describe the change in the importance of coordination throughout the motions by probability. The learned coordinated representation can be generalized to new task parameters while ensuring spatio-temporal coordination. We demonstrate the method using synthetic motions and human demonstration data and deploy it to a humanoid robot to perform a generalized bimanual coordination motion. We believe that this easy-to-use bimanual learning from demonstration (LfD) method has the potential to be used as a data augmentation plugin for robot large manipulation model training. The corresponding codes are open-sourced in

2.GRAINS: Proximity Sensing of Objects in Granular Materials

Authors:Zeqing Zhang, Ruixing Jia, Youcan Yan, Ruihua Han, Shijie Lin, Qian Jiang, Liangjun Zhang, Jia Pan

Abstract: Proximity sensing detects an object's presence without contact. However, research has rarely explored proximity sensing in granular materials (GM) due to GM's lack of visual and complex properties. In this paper, we propose a granular-material-embedded autonomous proximity sensing system (GRAINS) based on three granular phenomena (fluidization, jamming, and failure wedge zone). GRAINS can automatically sense buried objects beneath GM in real-time manner (at least ~20 hertz) and perceive them 0.5 ~ 7 centimeters ahead in different granules without the use of vision or touch. We introduce a new spiral trajectory for the probe raking in GM, combining linear and circular motions, inspired by a common granular fluidization technique. Based on the observation of force-raising when granular jamming occurs in the failure wedge zone in front of the probe during its raking, we employ Gaussian process regression to constantly learn and predict the force patterns and detect the force anomaly resulting from granular jamming to identify the proximity sensing of buried objects. Finally, we apply GRAINS to a Bayesian-optimization-algorithm-guided exploration strategy to successfully localize underground objects and outline their distribution using proximity sensing without contact or digging. This work offers a simple yet reliable method with potential for safe operation in building habitation infrastructure on an alien planet without human intervention.

3.Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

Authors:Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura

Abstract: Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework. In this study, we propose a novel transfer learning approach for multimodal language understanding called Prototypical Contrastive Transfer Learning (PCTL), which uses a new contrastive loss called Dual ProtoNCE. We introduce PCTL to the task of identifying target objects in domestic environments according to free-form natural language instructions. To validate PCTL, we built new real-world and simulation datasets. Our experiment demonstrated that PCTL outperformed existing methods. Specifically, PCTL achieved an accuracy of 78.1%, whereas simple fine-tuning achieved an accuracy of 73.4%.

4.Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Authors:Moo Jin Kim, Jiajun Wu, Chelsea Finn

Abstract: Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at .

5.GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Authors:Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang

Abstract: Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world data. A straightforward solution is to collect additional training data, but the cost of human-annotation is extortionate. In this paper, we propose Grounding Vision to Ceaselessly Created Instructions (GVCCI), a lifelong learning framework for LGRM, which continuously learns VG without human supervision. GVCCI iteratively generates synthetic instruction via object detection and trains the VG model with the generated data. We validate our framework in offline and online settings across diverse environments on different VG models. Experimental results show that accumulating synthetic data from GVCCI leads to a steady improvement in VG by up to 56.7% and improves resultant LGRM by up to 29.4%. Furthermore, the qualitative analysis shows that the unadapted VG model often fails to find correct objects due to a strong bias learned from the pre-training data. Finally, we introduce a novel VG dataset for LGRM, consisting of nearly 252k triplets of image-object-instruction from diverse manipulation environments.

6.VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Authors:Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei

Abstract: Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website:

7.Reactive and human-in-the-loop planning and control of multi-robot systems under LTL specifications in dynamic environments

Authors:Pian Yu, Gianmarco Fedeli, Dimos V. Dimarogonas

Abstract: This paper investigates the planning and control problems for multi-robot systems under linear temporal logic (LTL) specifications. In contrast to most of existing literature, which presumes a static and known environment, our study focuses on dynamic environments that can have unknown moving obstacles like humans walking through. Depending on whether local communication is allowed between robots, we consider two different online re-planning approaches. When local communication is allowed, we propose a local trajectory generation algorithm for each robot to resolve conflicts that are detected on-line. In the other case, i.e., no communication is allowed, we develop a model predictive controller to reactively avoid potential collisions. In both cases, task satisfaction is guaranteed whenever it is feasible. In addition, we consider the human-in-the-loop scenario where humans may additionally take control of one or multiple robots. We design a mixed initiative controller for each robot to prevent unsafe human behaviors while guarantee the LTL satisfaction. Using our previous developed ROS software package, several experiments are conducted to demonstrate the effectiveness and the applicability of the proposed strategies.

8.Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight

Authors:Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laengle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, Davide Scaramuzza

Abstract: Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open-source and open-hardware and supports both model-based and neural-network--based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, GPU-accelerated compute hardware for real-time perception and neural-network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both model-based and neural-network--based controllers. Our demonstrators include trajectory tracking at up to 5g and 70 km/h in a motion-capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Finally, we demonstrate its use for hardware-in-the-loop simulation in virtual-reality environments. Thanks to its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research.

9.Air Bumper: A Collision Detection and Reaction Framework for Autonomous MAV Navigation

Authors:Ruoyu Wang, Zixuan Guo, Yizhou Chen, Xinyi Wang, Ben M. Chen

Abstract: Autonomous navigation in unknown environments with obstacles remains challenging for micro aerial vehicles (MAVs) due to their limited onboard computing and sensing resources. Although various collision avoidance methods have been developed, it is still possible for drones to collide with unobserved obstacles due to unpredictable disturbances, sensor limitations, and control uncertainty. Instead of completely avoiding collisions, this article proposes Air Bumper, a collision detection and reaction framework, for fully autonomous flight in 3D environments to improve the safety of drones. Our framework only utilizes the onboard inertial measurement unit (IMU) to detect and estimate collisions. We further design a collision recovery control for rapid recovery and collision-aware mapping to integrate collision information into general LiDAR-based sensing and planning frameworks. Our simulation and experimental results show that the quadrotor can rapidly detect, estimate, and recover from collisions with obstacles in 3D space and continue the flight smoothly with the help of the collision-aware map.

10.Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

Authors:Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada

Abstract: Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.

11.SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

Authors:Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, Niko Suenderhauf

Abstract: Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.

12.Diffusion Based Multi-Agent Adversarial Tracking

Authors:Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay

Abstract: Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.

13.Connected Dependability Cage Approach for Safe Automated Driving

Authors:Adina Aniculaesei, Iqra Aslam, Daniel Bamal, Felix Helsch, Andreas Vorwald, Meng Zhang, Andreas Rausch

Abstract: Automated driving systems can be helpful in a wide range of societal challenges, e.g., mobility-on-demand and transportation logistics for last-mile delivery, by aiding the vehicle driver or taking over the responsibility for the dynamic driving task partially or completely. Ensuring the safety of automated driving systems is no trivial task, even more so for those systems of SAE Level 3 or above. To achieve this, mechanisms are needed that can continuously monitor the system's operating conditions, also denoted as the system's operational design domain. This paper presents a safety concept for automated driving systems which uses a combination of onboard runtime monitoring via connected dependability cage and off-board runtime monitoring via a remote command control center, to continuously monitor the system's ODD. On one side, the connected dependability cage fulfills a double functionality: (1) to monitor continuously the operational design domain of the automated driving system, and (2) to transfer the responsibility in a smooth and safe manner between the automated driving system and the off-board remote safety driver, who is present in the remote command control center. On the other side, the remote command control center enables the remote safety driver the monitoring and takeover of the vehicle's control. We evaluate our safety concept for automated driving systems in a lab environment and on a test field track and report on results and lessons learned.

14.Cosserat-Rod Based Dynamic Modeling of Soft Slender Robot Interacting with Environment

Authors:Lingxiao Xun, Gang Zheng, Alexandre Kruszewski

Abstract: Soft slender robots have attracted more and more research attentions in these years due to their continuity and compliance natures. However, mechanics modeling for soft robots interacting with environment is still an academic challenge because of the non-linearity of deformation and the non-smooth property of the contacts. In this work, starting from a piece-wise local strain field assumption, we propose a nonlinear dynamic model for soft robot via Cosserat rod theory using Newtonian mechanics which handles the frictional contact with environment and transfer them into the nonlinear complementary constraint (NCP) formulation. Moreover, we smooth both the contact and friction constraints in order to convert the inequality equations of NCP to the smooth equality equations. The proposed model allows us to compute the dynamic deformation and frictional contact force under common optimization framework in real time when the soft slender robot interacts with other rigid or soft bodies. In the end, the corresponding experiments are carried out which valid our proposed dynamic model.

15.A Comparative Analysis Between the Additive and the Multiplicative Extended Kalman Filter for Satellite Attitude Determination

Authors:Hamza A. Hassan, William Tolstrup, Johanes P. Suriana, Ibrahim D. Kiziloklu

Abstract: The general consensus is that the Multiplicative Extended Kalman Filter (MEKF) is superior to the Additive Extended Kalman Filter (AEKF) based on a wealth of theoretical evidence. This paper deals with a practical comparison between the two filters in simulation with the goal of verifying if the previous theoretical foundations are true. The AEKF and MEKF are two variants of the Extended Kalman Filter that differ in their approach to linearizing the system dynamics. The AEKF uses an additive correction term to update the state estimate, while the MEKF uses a multiplicative correction term. The two also differ in the state of which they use. The AEKF uses the quaternion as its state while the MEKF uses the Gibbs vector as its state. The results show that the MEKF consistently outperforms the AEKF in terms of estimation accuracy with lower uncertainty. The AEKF is more computationally efficient, but the difference is so low that it is almost negligible and it has no effect on a real-time application. Overall, the results suggest that the MEKF is a better choise for satellite attitude estimation due to its superior estimation accuracy and lower uncertainty, which agrees with the statements from previous work

1.Forward Dynamics Estimation from Data-Driven Inverse Dynamics Learning

Authors:Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Daniel Nikovski, Diego Romeres

Abstract: In this paper, we propose to estimate the forward dynamics equations of mechanical systems by learning a model of the inverse dynamics and estimating individual dynamics components from it. We revisit the classical formulation of rigid body dynamics in order to extrapolate the physical dynamical components, such as inertial and gravitational components, from an inverse dynamics model. After estimating the dynamical components, the forward dynamics can be computed in closed form as a function of the learned inverse dynamics. We tested the proposed method with several machine learning models based on Gaussian Process Regression and compared them with the standard approach of learning the forward dynamics directly. Results on two simulated robotic manipulators, a PANDA Franka Emika and a UR10, show the effectiveness of the proposed method in learning the forward dynamics, both in terms of accuracy as well as in opening the possibility of using more structured~models.

2.Deep Probabilistic Movement Primitives with a Bayesian Aggregator

Authors:Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto

Abstract: Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.

3.MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants

Authors:David Rapado-Rincon, Eldert J. van Henten, Gert Kootstra

Abstract: The agro-food industry is turning to robots to address the challenge of labour shortage. However, agro-food environments pose difficulties for robots due to high variation and occlusions. In the presence of these challenges, accurate world models, with information about object location, shape, and properties, are crucial for robots to perform tasks accurately. Building such models is challenging due to the complex and unique nature of agro-food environments, and errors in the model can lead to task execution issues. In this paper, we propose MinkSORT, a novel method for generating tracking features using a 3D sparse convolutional network in a deepSORT-like approach to improve the accuracy of world models in agro-food environments. We evaluated our feature extractor network using real-world data collected in a tomato greenhouse, which significantly improved the performance of our baseline model that tracks tomato positions in 3D using a Kalman filter and Mahalanobis distance. Our deep learning feature extractor improved the HOTA from 42.8% to 44.77%, the association accuracy from 32.55% to 35.55%, and the MOTA from 57.63% to 58.81%. We also evaluated different contrastive loss functions for training our deep learning feature extractor and demonstrated that our approach leads to improved performance in terms of three separate precision and recall detection outcomes. Our method improves world model accuracy, enabling robots to perform tasks such as harvesting and plant maintenance with greater efficiency and accuracy, which is essential for meeting the growing demand for food in a sustainable manner.

4.Energy Efficient Personalized Hand-Gesture Recognition with Neuromorphic Computing

Authors:Muhammad Aitsam, Alessandro Di Nuovo

Abstract: Hand gestures are a form of non-verbal communication that is used in social interaction and it is therefore required for more natural human-robot interaction. Neuromorphic (brain-inspired) computing offers a low-power solution for Spiking neural networks (SNNs) that can be used for the classification and recognition of gestures. This article introduces the preliminary results of a novel methodology for training spiking convolutional neural networks for hand-gesture recognition so that a humanoid robot with integrated neuromorphic hardware will be able to personalise the interaction with a user according to the shown hand gesture. It also describes other approaches that could improve the overall performance of the model.

5.Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Authors:Marcelo Jacinto, João Pinto, Jay Patrikar, John Keller, Rita Cunha, Sebastian Scherer, António Pascoal

Abstract: Developing and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extension that enables real-time simulation of multiple multirotor vehicles in photo-realistic environments, while providing out-of-the-box integration with the widely adopted PX4-Autopilot and ROS2 through its modular implementation and intuitive graphical user interface. To demonstrate some of its capabilities, a nonlinear controller was implemented and simulation results for two drones performing aggressive flight maneuvers are presented. Code and documentation for this framework are also provided as supplementary material.

6.A Mixed Reality System for Interaction\\with Heterogeneous Robotic Systems

Authors:Valeria Villani, Beatrice Capelli, Lorenzo Sabattini

Abstract: The growing spread of robots for service and industrial purposes calls for versatile, intuitive and portable interaction approaches. In particular, in industrial environments, operators should be able to interact with robots in a fast, effective, and possibly effortless manner. To this end, reality enhancement techniques have been used to achieve efficient management and simplify interactions, in particular in manufacturing and logistics processes. Building upon this, in this paper we propose a system based on mixed reality that allows a ubiquitous interface for heterogeneous robotic systems in dynamic scenarios, where users are involved in different tasks and need to interact with different robots. By means of mixed reality, users can interact with a robot through manipulation of its virtual replica, which is always colocated with the user and is extracted when interaction is needed. The system has been tested in a simulated intralogistics setting, where different robots are present and require sporadic intervention by human operators, who are involved in other tasks. In our setting we consider the presence of drones and AGVs with different levels of autonomy, calling for different user interventions. The proposed approach has been validated in virtual reality, considering quantitative and qualitative assessment of performance and user's feedback.

7.Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Authors:Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang

Abstract: Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of large amount of interactive feedback. This paper presents a new method that uses scores provided by humans, instead of pairwise preferences, to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by human negatively impact the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method on robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores, while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at

1.Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey

Authors:Pranav Singh Chib, Pravendra Singh

Abstract: End-to-End driving is a promising paradigm as it circumvents the drawbacks associated with modular systems, such as their overwhelming complexity and propensity for error propagation. Autonomous driving transcends conventional traffic patterns by proactively recognizing critical events in advance, ensuring passengers' safety and providing them with comfortable transportation, particularly in highly stochastic and variable traffic settings. This paper presents a comprehensive review of the End-to-End autonomous driving stack. It provides a taxonomy of automated driving tasks wherein neural networks have been employed in an End-to-End manner, encompassing the entire driving process from perception to control, while addressing key challenges encountered in real-world applications. Recent developments in End-to-End autonomous driving are analyzed, and research is categorized based on underlying principles, methodologies, and core functionality. These categories encompass sensorial input, main and auxiliary output, learning approaches ranging from imitation to reinforcement learning, and model evaluation techniques. The survey incorporates a detailed discussion of the explainability and safety aspects. Furthermore, it assesses the state-of-the-art, identifies challenges, and explores future possibilities. We maintained the latest advancements and their corresponding open-source implementations at

2.A Versatile Door Opening System with Mobile Manipulator through Adaptive Position-Force Control and Reinforcement Learning

Authors:Gyuree Kang, Hyunki Seong, Daegyu Lee, D. Hyunchul Shim

Abstract: The ability of robots to navigate through doors is crucial for their effective operation in indoor environments. Consequently, extensive research has been conducted to develop robots capable of opening specific doors. However, the diverse combinations of door handles and opening directions necessitate a more versatile door opening system for robots to successfully operate in real-world environments. In this paper, we propose a mobile manipulator system that can autonomously open various doors without prior knowledge. By using convolutional neural networks, point cloud extraction techniques, and external force measurements during exploratory motion, we obtained information regarding handle types, poses, and door characteristics. Through two different approaches, adaptive position-force control and deep reinforcement learning, we successfully opened doors without precise trajectory or excessive external force. The adaptive position-force control method involves moving the end-effector in the direction of the door opening while responding compliantly to external forces, ensuring safety and manipulator workspace. Meanwhile, the deep reinforcement learning policy minimizes applied forces and eliminates unnecessary movements, enabling stable operation across doors with different poses and widths. The RL-based approach outperforms the adaptive position-force control method in terms of compensating for external forces, ensuring smooth motion, and achieving efficient speed. It reduces the maximum force required by 3.27 times and improves motion smoothness by 1.82 times. However, the non-learning-based adaptive position-force control method demonstrates more versatility in opening a wider range of doors, encompassing revolute doors with four distinct opening directions and varying widths.

3.PSO-Based Optimal Coverage Path Planning for Surface Defect Inspection of 3C Components with a Robotic Line Scanner

Authors:Hongpeng Chen, Shengzeng Huo, Muhammad Muddassir, Hoi-Yin Lee, Anqing Duan, Pai Zheng, Hongsheng Pan, David Navarro-Alarcon

Abstract: The automatic inspection of surface defects is an important task for quality control in the computers, communications, and consumer electronics (3C) industry. Conventional devices for defect inspection (viz. line-scan sensors) have a limited field of view, thus, a robot-aided defect inspection system needs to scan the object from multiple viewpoints. Optimally selecting the robot's viewpoints and planning a path is regarded as coverage path planning (CPP), a problem that enables inspecting the object's complete surface while reducing the scanning time and avoiding misdetection of defects. However, the development of CPP strategies for robotic line scanners has not been sufficiently studied by researchers. To fill this gap in the literature, in this paper, we present a new approach for robotic line scanners to detect surface defects of 3C free-form objects automatically. Our proposed solution consists of generating a local path by a new hybrid region segmentation method and an adaptive planning algorithm to ensure the coverage of the complete object surface. An optimization method for the global path sequence is developed to maximize the scanning efficiency. To verify our proposed methodology, we conduct detailed simulation-based and experimental studies on various free-form workpieces, and compare its performance with a state-of-the-art solution. The reported results demonstrate the feasibility and effectiveness of our approach.

4.Enabling Faster Locomotion of Planetary Rovers with a Mechanically-Hybrid Suspension

Authors:David Rodríguez-Martínez, Kentaro Uno, Kenta Sawa, Masahiro Uda, Gen Kudo, Gustavo Hernan Diaz, Ayumi Umemura, Shreya Santra, Kazuya Yoshida

Abstract: The exploration of the lunar poles and the collection of samples from the martian surface are characterized by shorter time windows demanding increased autonomy and speeds. Autonomous mobile robots must intrinsically cope with a wider range of disturbances. Faster off-road navigation has been explored for terrestrial applications but the combined effects of increased speeds and reduced gravity fields are yet to be fully studied. In this paper, we design and demonstrate a novel fully passive suspension design for wheeled planetary robots, which couples a high-range passive rocker with elastic in-wheel coil-over shock absorbers. The design was initially conceived and verified in a reduced-gravity (1.625 m/s$^2$) simulated environment, where three different passive suspension configurations were evaluated against a set of challenges--climbing steep slopes and surmounting unexpected obstacles like rocks and outcrops--and later prototyped and validated in a series of field tests. The proposed mechanically-hybrid suspension proves to mitigate more effectively the negative effects (high-frequency/high-amplitude vibrations and impact loads) of faster locomotion (>1 m/s) over unstructured terrains under varied gravity fields. This lowers the demand on navigation and control systems, impacting the efficiency of exploration missions in the years to come.

5.AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

Authors:Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dietor Fox

Abstract: Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety of the operating environment increases. In this paper, we propose AnyTeleop, a unified and general teleoperation system to support multiple different arms, hands, realities, and camera configurations within a single system. Although being designed to provide great flexibility to the choice of simulators and real hardware, our system can still achieve great performance. For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot. For teleoperation in simulation, AnyTeleop leads to better imitation learning performance, compared with a previous system that is particularly designed for that simulator. Project page:

6.Learning Fine Pinch-Grasp Skills using Tactile Sensing from Real Demonstration Data

Authors:Xiaofeng Mao, Yucheng Xu, Ruoshi Wen, Mohammadreza Kasaei, Wanming Yu, Efi Psomopoulou, Nathan F. Lepora, Zhibin Li

Abstract: This work develops a data-efficient learning from demonstration framework which exploits the use of rich tactile sensing and achieves fine dexterous bimanual manipulation. Specifically, we formulated a convolutional autoencoder network that can effectively extract and encode high-dimensional tactile information. Further, we developed a behaviour cloning network that can learn human-like sensorimotor skills demonstrated directly on the robot hardware in the task space by fusing both proprioceptive and tactile feedback. Our comparison study with the baseline method revealed the effectiveness of the contact information, which enabled successful extraction and replication of the demonstrated motor skills. Extensive experiments on real dual-arm robots demonstrated the robustness and effectiveness of the fine pinch grasp policy directly learned from one-shot demonstration, including grasping of the same object with different initial poses, generalizing to ten unseen new objects, robust and firm grasping against external pushes, as well as contact-aware and reactive re-grasping in case of dropping objects under very large perturbations. Moreover, the saliency map method is employed to describe the weight distribution across various modalities during pinch grasping. The video is available online at: \href{}{}.

7.Toward optimal placement of spatial sensors

Authors:Mingyu Kim, Harun Yetkin, Daniel J. Stilwell, Jorge Jimenez, Saurav Shrestha, Nina Stark

Abstract: This paper addresses the challenges of optimally placing a finite number of sensors to detect Poisson-distributed targets in a bounded domain. We seek to rigorously account for uncertainty in the target arrival model throughout the problem. Sensor locations are selected to maximize the probability that no targets are missed. While this objective function is well-suited to applications where failure to detect targets is highly undesirable, it does not lead to a computationally efficient optimization problem. We propose an approximation of the objective function that is non-negative, submodular, and monotone and for which greedy selection of sensor locations works well. We also characterize the gap between the desired objective function and our approximation. For numerical illustrations, we consider the case of the detection of ship traffic using sensors mounted on the seafloor.

8.Optimal Robot Path Planning In a Collaborative Human-Robot Team with Intermittent Human Availability

Authors:Abhinav Dahiya, Stephen L. Smith

Abstract: This paper presents a solution for the problem of optimal planning for a robot in a collaborative human-robot team, where the human supervisor is intermittently available to assist the robot in completing tasks more quickly. Specifically, we address the challenge of computing the fastest path between two configurations in an environment with time constraints on how long the robot can wait for assistance. To solve this problem, we propose a novel approach that utilizes the concepts of budget and critical departure times, which enables us to obtain optimal solutions while scaling to larger problem instances than existing methods. We demonstrate the effectiveness of our approach by comparing it with several baseline algorithms on a city road network and analyzing the quality of the solutions obtained. Our work contributes to the field of robot planning by addressing the critical issue of incorporating human assistance and environmental restrictions, which has significant implications for real-world applications.

9.RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

Authors:Zhao Mandi, Shreeya Jain, Shuran Song

Abstract: We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website for videos and code.

10.Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

Authors:Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox

Abstract: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos:

1.Optimized Path Planning for USVs under Ocean Currents

Authors:Behzad Akbari, Ya-Jun Pan, Shiwei Liu, Tianye Wang

Abstract: The proposed work focuses on the path planning for Unmanned Surface Vehicles (USVs) in the ocean enviroment, taking into account various spatiotemporal factors such as ocean currents and other energy consumption factors. The paper proposes the use of Gaussian Process Motion Planning (GPMP2), a Bayesian optimization method that has shown promising results in continuous and nonlinear path planning algorithms. The proposed work improves GPMP2 by incorporating a new spatiotemporal factor for tracking and predicting ocean currents using a spatiotemporal Bayesian inference. The algorithm is applied to the USV path planning and is shown to optimize for smoothness, obstacle avoidance, and ocean currents in a challenging environment. The work is relevant for practical applications in ocean scenarios where an optimal path planning for USVs is essential for minimizing costs and optimizing performance.

1.Touch, press and stroke: a soft capacitive sensor skin

Authors:Mirza S. Sarwar, Ryusuke Ishizaki, Kieran Morton, Claire Preston, Tan Nguyen, Xu Fan, Bertille Dupont, Leanna Hogarth, Takahide Yoshiike, Shahriar Mirabbasi, John D. W. Madden

Abstract: Soft sensors that can discriminate shear and normal force could help provide machines the fine control desirable for safe and effective physical interactions with people. A capacitive sensor is made for this purpose, composed of patterned elastomer and containing both fixed and sliding pillars that allow the sensor to deform and buckle, much like skin itself. The sensor differentiates between simultaneously applied pressure and shear. In addition, finger proximity is detectable up to 15 mm, with a pressure and shear sensitivity of 1 kPa and a displacement resolution of 50 $\mu$m. The operation is demonstrated on a simple gripper holding a cup. The combination of features and the straightforward fabrication method make this sensor a candidate for implementation as a sensing skin for humanoid robotics applications.

2.Incremental Nonlinear Dynamic Inversion based Optical Flow Control for Flying Robots: An Efficient Data-driven Approach

Authors:Hann Woei Ho, Ye Zhou

Abstract: This paper presents a novel approach for optical flow control of Micro Air Vehicles (MAVs). The task is challenging due to the nonlinearity of optical flow observables. Our proposed Incremental Nonlinear Dynamic Inversion (INDI) control scheme incorporates an efficient data-driven method to address the nonlinearity. It directly estimates the inverse of the time-varying control effectiveness in real-time, eliminating the need for the constant assumption and avoiding high computation in traditional INDI. This approach effectively handles fast-changing system dynamics commonly encountered in optical flow control, particularly height-dependent changes. We demonstrate the robustness and efficiency of the proposed control scheme in numerical simulations and also real-world flight tests: multiple landings of an MAV on a static and flat surface with various tracking setpoints, hovering and landings on moving and undulating surfaces. Despite being challenged with the presence of noisy optical flow estimates and the lateral and vertical movement of the landing surfaces, the MAV is able to successfully track or land on the surface with an exponential decay of both height and vertical velocity at almost the same time, as desired.

1.Traversability Analysis for Autonomous Driving in Complex Environment: A LiDAR-based Terrain Modeling Approach

Authors:Hanzhang Xue, Hao Fu, Liang Xiao, Yiming Fan, Dawei Zhao, Bin Dai

Abstract: For autonomous driving, traversability analysis is one of the most basic and essential tasks. In this paper, we propose a novel LiDAR-based terrain modeling approach, which could output stable, complete and accurate terrain models and traversability analysis results. As terrain is an inherent property of the environment that does not change with different view angles, our approach adopts a multi-frame information fusion strategy for terrain modeling. Specifically, a normal distributions transform mapping approach is adopted to accurately model the terrain by fusing information from consecutive LiDAR frames. Then the spatial-temporal Bayesian generalized kernel inference and bilateral filtering are utilized to promote the stability and completeness of the results while simultaneously retaining the sharp terrain edges. Based on the terrain modeling results, the traversability of each region is obtained by performing geometric connectivity analysis between neighboring terrain regions. Experimental results show that the proposed method could run in real-time and outperforms state-of-the-art approaches.

2.Multi Object Tracking for Predictive Collision Avoidance

Authors:Bruk Gebregziabher

Abstract: The safe and efficient operation of Autonomous Mobile Robots (AMRs) in complex environments, such as manufacturing, logistics, and agriculture, necessitates accurate multi-object tracking and predictive collision avoidance. This paper presents algorithms and techniques for addressing these challenges using Lidar sensor data, emphasizing ensemble Kalman filter. The developed predictive collision avoidance algorithm employs the data provided by lidar sensors to track multiple objects and predict their velocities and future positions, enabling the AMR to navigate safely and effectively. A modification to the dynamic windowing approach is introduced to enhance the performance of the collision avoidance system. The overall system architecture encompasses object detection, multi-object tracking, and predictive collision avoidance control. The experimental results, obtained from both simulation and real-world data, demonstrate the effectiveness of the proposed methods in various scenarios, which lays the foundation for future research on global planners, other controllers, and the integration of additional sensors. This thesis contributes to the ongoing development of safe and efficient autonomous systems in complex and dynamic environments.

3.RBDCore: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines

Authors:Yuxin Yang, Xiaoming Chen, Yinhe Han

Abstract: Rigid body dynamics is a key technology in the robotics field. In trajectory optimization and model predictive control algorithms, there are usually a large number of rigid body dynamics computing tasks. Using CPUs to process these tasks consumes a lot of time, which will affect the real-time performance of robots. To this end, we propose a multifunctional robot rigid body dynamics accelerator, named RBDCore, to address the performance bottleneck. By analyzing different functions commonly used in robot dynamics calculations, we summarize their reuse relationship and optimize them according to the hardware. Based on this, RBDCore can fully reuse common hardware modules when processing different computing tasks. By dynamically switching the dataflow path, RBDCore can accelerate various dynamics functions without reconfiguring the hardware. We design Structure-Adaptive Pipelines for RBDCore, which can greatly improve the throughput of the accelerator. Robots with different structures and parameters can be optimized specifically. Compared with the state-of-the-art CPU, GPU dynamics libraries and FPGA accelerator, RBDCore can significantly improve the performance.

4.Planning and Control for a Dynamic Morphing-Wing UAV Using a Vortex Particle Model

Authors:Gino Perrotta, Luca Scheuer, Yocheved Kopel, Max Basescu, Adam Polevoy, Kevin Wolfe, Joseph Moore

Abstract: Achieving precise, highly-dynamic maneuvers with Unmanned Aerial Vehicles (UAVs) is a major challenge due to the complexity of the associated aerodynamics. In particular, unsteady effects -- as might be experienced in post-stall regimes or during sudden vehicle morphing -- can have an adverse impact on the performance of modern flight control systems. In this paper, we present a vortex particle model and associated model-based controller capable of reasoning about the unsteady aerodynamics during aggressive maneuvers. We evaluate our approach in hardware on a morphing-wing UAV executing post-stall perching maneuvers. Our results show that the use of the unsteady aerodynamics model improves performance during both fixed-wing and dynamic-wing perching, while the use of wing-morphing planned with quasi-steady aerodynamics results in reduced performance. While the focus of this paper is a pre-computed control policy, we believe that, with sufficient computational resources, our approach could enable online planning in the future.

5.Floating-base manipulation on zero-perturbation manifolds

Authors:Brian A. Bittner, Jason Reid, Kevin C. Wolfe

Abstract: To achieve high-dexterity motion planning on floating-base systems, the base dynamics induced by arm motions must be treated carefully. In general, it is a significant challenge to establish a fixed-base frame during tasking due to forces and torques on the base that arise directly from arm motions (e.g. arm drag in low Reynolds environments and arm momentum in high Reynolds environments). While thrusters can in theory be used to regulate the vehicle pose, it is often insufficient to establish a stable pose for precise tasking, whether the cause be due to underactuation, modeling inaccuracy, suboptimal control parameters, or insufficient power. We propose a solution that asks the thrusters to do less high bandwidth perturbation correction by planning arm motions that induce zero perturbation on the base. We are able to cast our motion planner as a nonholonomic rapidly-exploring random tree (RRT) by representing the floating-base dynamics as pfaffian constraints on joint velocity. These constraints guide the manipulators to move on zero-perturbation manifolds (which inhabit a subspace of the tangent space of the internal configuration space). To invoke this representation (termed a \textit{perturbation map}) we assume the body velocity (perturbation) of the base to be a joint-defined linear mapping of joint velocity and describe situations where this assumption is realistic (including underwater, aerial, and orbital environments). The core insight of this work is that when perturbation of the floating-base has affine structure with respect to joint velocity, it provides the system a class of kinematic reduction that permits the use of sample-based motion planners (specifically a nonholonomic RRT). We show that this allows rapid, exploration-geared motion planning for high degree of freedom systems in obstacle rich environments, even on floating-base systems with nontrivial dynamics.

6.3D Multi-Robot Exploration with a Two-Level Coordination Strategy and Prioritization

Authors:Luigi Freda, Tiago Novo, David Portugal, Rui P. Rocha

Abstract: This work presents a 3D multi-robot exploration framework for a team of UGVs moving on uneven terrains. The framework was designed by casting the two-level coordination strategy presented in [1] into the context of multi-robot exploration. The resulting distributed exploration technique minimizes and explicitly manages the occurrence of conflicts and interferences in the robot team. Each robot selects where to scan next by using a receding horizon next-best-view approach [2]. A sampling-based tree is directly expanded on segmented traversable regions of the terrain 3D map to generate the candidate next viewpoints. During the exploration, users can assign locations with higher priorities on-demand to steer the robot exploration toward areas of interest. The proposed framework can be also used to perform coverage tasks in the case a map of the environment is a priori provided as input. An open-source implementation is available online.

7.FOCUS: Object-Centric World Models for Robotics Manipulation

Authors:Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt

Abstract: Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.

8.Robotic Sonographer: Autonomous Robotic Ultrasound using Domain Expertise in Bayesian Optimization

Authors:Deepak Raina, SH Chandrashekhara, Richard Voyles, Juan Wachs, Subir Kumar Saha

Abstract: Ultrasound is a vital imaging modality utilized for a variety of diagnostic and interventional procedures. However, an expert sonographer is required to make accurate maneuvers of the probe over the human body while making sense of the ultrasound images for diagnostic purposes. This procedure requires a substantial amount of training and up to a few years of experience. In this paper, we propose an autonomous robotic ultrasound system that uses Bayesian Optimization (BO) in combination with the domain expertise to predict and effectively scan the regions where diagnostic quality ultrasound images can be acquired. The quality map, which is a distribution of image quality in a scanning region, is estimated using Gaussian process in BO. This relies on a prior quality map modeled using expert's demonstration of the high-quality probing maneuvers. The ultrasound image quality feedback is provided to BO, which is estimated using a deep convolution neural network model. This model was previously trained on database of images labelled for diagnostic quality by expert radiologists. Experiments on three different urinary bladder phantoms validated that the proposed autonomous ultrasound system can acquire ultrasound images for diagnostic purposes with a probing position and force accuracy of 98.7% and 97.8%, respectively.

9.Stair Climbing using the Angular Momentum Linear Inverted Pendulum Model and Model Predictive Control

Authors:Oluwami Dosunmu-Ogunbi, Aayushi Shrivastava, Grant Gibson, Jessy W Grizzle

Abstract: A new control paradigm using angular momentum and foot placement as state variables in the linear inverted pendulum model has expanded the realm of possibilities for the control of bipedal robots. This new paradigm, known as the ALIP model, has shown effectiveness in cases where a robot's center of mass height can be assumed to be constant or near constant as well as in cases where there are no non-kinematic restrictions on foot placement. Walking up and down stairs violates both of these assumptions, where center of mass height varies significantly within a step and the geometry of the stairs restrict the effectiveness of foot placement. In this paper, we explore a variation of the ALIP model that allows the length of the virtual pendulum formed by the robot's stance foot and center of mass to follow smooth trajectories during a step. We couple this model with a control strategy constructed from a novel combination of virtual constraint-based control and a model predictive control algorithm to stabilize a stair climbing gait that does not soley rely on foot placement. Simulations on a 20-degree of freedom model of the Cassie biped in the SimMechanics simulation environment show that the controller is able to achieve periodic gait.

10.Mainline Automatic Train Horn and Brake Performance Metric

Authors:Rustam Tagiew

Abstract: This paper argues for the introduction of a mainline rail-oriented performance metric for driver-replacing on-board perception systems. Perception at the head of a train is divided into several subfunctions. This article presents a preliminary submetric for the obstacle detection subfunction. To the best of the author's knowledge, no other such proposal for obstacle detection exists. A set of submetrics for the subfunctions should facilitate the comparison of perception systems among each other and guide the measurement of human driver performance. It should also be useful for a standardized prediction of the number of accidents for a given perception system in a given operational design domain. In particular, for the proposal of the obstacle detection submetric, the professional readership is invited to provide their feedback and quantitative information to the author. The analysis results of the feedback will be published separately later.

11.Active Class Selection for Few-Shot Class-Incremental Learning

Authors:Christopher McClurg, Ali Ayub, Harsh Tyagi, Sarah M. Rajtmajer, Alan R. Wagner

Abstract: For real-world applications, robots will need to continually learn in their environments through limited interactions with their users. Toward this, previous works in few-shot class incremental learning (FSCIL) and active class selection (ACS) have achieved promising results but were tested in constrained setups. Therefore, in this paper, we combine ideas from FSCIL and ACS to develop a novel framework that can allow an autonomous agent to continually learn new objects by asking its users to label only a few of the most informative objects in the environment. To this end, we build on a state-of-the-art (SOTA) FSCIL model and extend it with techniques from ACS literature. We term this model Few-shot Incremental Active class SeleCtiOn (FIASco). We further integrate a potential field-based navigation technique with our model to develop a complete framework that can allow an agent to process and reason on its sensory data through the FIASco model, navigate towards the most informative object in the environment, gather data about the object through its sensors and incrementally update the FIASco model. Experimental results on a simulated agent and a real robot show the significance of our approach for long-term real-world robotics applications.

12.A Robust Open-source Tendon-driven Robot Arm for Learning Control of Dynamic Motions

Authors:Simon Guist, Jan Schneider, Hao Ma, Vincent Berenz, Julian Martus, Felix Grüninger, Michael Mühlebach, Jonathan Fiene, Bernhard Schölkopf, Dieter Büchler

Abstract: A long-lasting goal of robotics research is to operate robots safely, while achieving high performance which often involves fast motions. Traditional motor-driven systems frequently struggle to balance these competing demands. Addressing this trade-off is crucial for advancing fields such as manufacturing and healthcare, where seamless collaboration between robots and humans is essential. We introduce a four degree-of-freedom (DoF) tendon-driven robot arm, powered by pneumatic artificial muscles (PAMs), to tackle this challenge. Our new design features low friction, passive compliance, and inherent impact resilience, enabling rapid, precise, high-force, and safe interactions during dynamic tasks. In addition to fostering safer human-robot collaboration, the inherent safety properties are particularly beneficial for reinforcement learning, where the robot's ability to explore dynamic motions without causing self-damage is crucial. We validate our robotic arm through various experiments, including long-term dynamic motions, impact resilience tests, and assessments of its ease of control. On a challenging dynamic table tennis task, we further demonstrate our robot's capabilities in rapid and precise movements. By showcasing our new design's potential, we aim to inspire further research on robotic systems that balance high performance and safety in diverse tasks. Our open-source hardware design, software, and a large dataset of diverse robot motions can be found at

13.Only Pick Once -- Multi-Object Picking Algorithms for Picking Exact Number of Objects Efficiently

Authors:Zihe Ye, Yu Sun

Abstract: Picking up multiple objects at once is a grasping skill that makes a human worker efficient in many domains. This paper presents a system to pick a requested number of objects by only picking once (OPO). The proposed Only-Pick-Once System (OPOS) contains several graph-based algorithms that convert the layout of objects into a graph, cluster nodes in the graph, rank and select candidate clusters based on their topology. OPOS also has a multi-object picking predictor based on a convolutional neural network for estimating how many objects would be picked up with a given gripper location and orientation. This paper presents four evaluation metrics and three protocols to evaluate the proposed OPOS. The results show OPOS has very high success rates for two and three objects when only picking once. Using OPOS can significantly outperform two to three times single object picking in terms of efficiency. The results also show OPOS can generalize to unseen size and shape objects.

14.SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding

Authors:Qiushi Lin, Hang Ma

Abstract: Multi-Agent Path Finding (MAPF) is a crucial component for many large-scale robotic systems, where agents must plan their collision-free paths to their given goal positions. Recently, multi-agent reinforcement learning has been introduced to solve the partially observable variant of MAPF by learning a decentralized single-agent policy in a centralized fashion based on each agent's partial observation. However, existing learning-based methods are ineffective in achieving complex multi-agent cooperation, especially in congested environments, due to the non-stationarity of this setting. To tackle this challenge, we propose a multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA), which employs novel heuristic-based attention mechanisms for both the actors and critics to encourage cooperation among agents. SACHA learns a neural network for each agent to selectively pay attention to the shortest path heuristic guidance from multiple agents within its field of view, thereby allowing for more scalable learning of cooperation. SACHA also extends the existing multi-agent actor-critic framework by introducing a novel critic centered on each agent to approximate $Q$-values. Compared to existing methods that use a fully observable critic, our agent-centered multi-agent actor-critic method results in more impartial credit assignment and better generalizability of the learned policy to MAPF instances with varying numbers of agents and types of environments. We also implement SACHA(C), which embeds a communication module in the agent's policy network to enable information exchange among agents. We evaluate both SACHA and SACHA(C) on a variety of MAPF instances and demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.

1.Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Authors:Minh-Quan Dao, Julie Stephany Berrio, Vincent Frémont, Mao Shan, Elwan Héry, Stewart Worrall

Abstract: Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

1.Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain Shifts

Authors:Agnese Chiatti, Riccardo Bertoglio, Nico Catalano, Matteo Gatti, Matteo Matteucci

Abstract: Mobile robots will play a crucial role in the transition towards sustainable agriculture. To autonomously and effectively monitor the state of plants, robots ought to be equipped with visual perception capabilities that are robust to the rapid changes that characterise agricultural settings. In this paper, we focus on the challenging task of segmenting grape bunches from images collected by mobile robots in vineyards. In this context, we present the first study that applies surgical fine-tuning to instance segmentation tasks. We show how selectively tuning only specific model layers can support the adaptation of pre-trained Deep Learning models to newly-collected grape images that introduce visual domain shifts, while also substantially reducing the number of tuned parameters.

2.Advantages of Multimodal versus Verbal-Only Robot-to-Human Communication with an Anthropomorphic Robotic Mock Driver

Authors:Tim Schreiter, Lucas Morillo-Mendez, Ravi T. Chadalavada, Andrey Rudenko, Erik Billing, Martin Magnusson, Kai O. Arras, Achim J. Lilienthal

Abstract: Robots are increasingly used in shared environments with humans, making effective communication a necessity for successful human-robot interaction. In our work, we study a crucial component: active communication of robot intent. Here, we present an anthropomorphic solution where a humanoid robot communicates the intent of its host robot acting as an "Anthropomorphic Robotic Mock Driver" (ARMoD). We evaluate this approach in two experiments in which participants work alongside a mobile robot on various tasks, while the ARMoD communicates a need for human attention, when required, or gives instructions to collaborate on a joint task. The experiments feature two interaction styles of the ARMoD: a verbal-only mode using only speech and a multimodal mode, additionally including robotic gaze and pointing gestures to support communication and register intent in space. Our results show that the multimodal interaction style, including head movements and eye gaze as well as pointing gestures, leads to more natural fixation behavior. Participants naturally identified and fixated longer on the areas relevant for intent communication, and reacted faster to instructions in collaborative tasks. Our research further indicates that the ARMoD intent communication improves engagement and social interaction with mobile robots in workplace settings.

3.Perch a quadrotor on planes by the ceiling effect

Authors:Yuying Zou, Haotian Li, Yunfan Ren, Wei Xu, Yihang Li, Yixi Cai, Shenji Zhou, Fu Zhang

Abstract: Perching is a promising solution for a small unmanned aerial vehicle (UAV) to save energy and extend operation time. This paper proposes a quadrotor that can perch on planar structures using the ceiling effect. Compared with the existing work, this perching method does not require any claws, hooks, or adhesive pads, leading to a simpler system design. This method does not limit the perching by surface angle or material either. The design of the quadrotor that only uses its propeller guards for surface contact is presented in this paper. We also discussed the automatic perching strategy including trajectory generation and power management. Experiments are conducted to verify that the approach is practical and the UAV can perch on planes with different angles. Energy consumption in the perching state is assessed, showing that more than 30% of power can be saved. Meanwhile, the quadrotor exhibits improved stability while perching compared to when it is hovering.

4.A Biomimetic Fingerprint for Robotic Tactile Sensing

Authors:Oscar Alberto Juiña Quilachamín, Nicolás Navarro-Guerrero

Abstract: Tactile sensors have been developed since the early '70s and have greatly improved, but there are still no widely adopted solutions. Various technologies, such as capacitive, piezoelectric, piezoresistive, optical, and magnetic, are used in haptic sensing. However, most sensors are not mechanically robust for many applications and cannot cope well with curved or sizeable surfaces. Aiming to address this problem, we present a 3D-printed fingerprint pattern to enhance the body-borne vibration signal for dynamic tactile feedback. The 3D-printed fingerprint patterns were designed and tested for an RH8D Adult size Robot Hand. The patterns significantly increased the signal's power to over 11 times the baseline. A public haptic dataset including 52 objects of several materials was created using the best fingerprint pattern and material.

5.A Data-Driven Approach to Geometric Modeling of Systems with Low-Bandwidth Actuator Dynamics

Authors:Siming Deng, Junning Liu, Bibekananda Datta, Aishwarya Pantula, David H. Gracias, Thao D. Nguyen, Brian A. Bittner, Noah J. Cowan

Abstract: It is challenging to perform identification on soft robots due to their underactuated, high dimensional dynamics. In this work, we present a data-driven modeling framework, based on geometric mechanics (also known as gauge theory), that can be applied to systems with low-bandwidth actuation of the shape space. By exploiting temporal asymmetries in actuator dynamics, our approach enables the design of robots that can be driven by a single control input. We present a method for constructing a series connected model comprising actuator and locomotor dynamics based on data points from stochastically perturbed, repeated behaviors around the observed limit cycle. We demonstrate our methods on a real-world example of a soft crawler made by stimuli-responsive hydrogels that locomotes on merely one cycling control signal by utilizing its geometric and temporal asymmetry. For systems with first-order, low-pass actuator dynamics, such as swelling-driven actuators used in hydrogel crawlers, we show that first order Taylor approximations can well capture the dynamics of the system shape as well as its movements. Finally, we propose an approach of numerically optimizing control signals by iteratively refining models and optimizing the input waveform.

6.Scenario-Based Motion Planning with Bounded Probability of Collision

Authors:Oscar de Groot, Laura Ferranti, Dariu Gavrila, Javier Alonso-Mora

Abstract: Robots will increasingly operate near humans that introduce uncertainties in the motion planning problem due to their complex nature. Typically, chance constraints are introduced in the planner to optimize performance while guaranteeing probabilistic safety. However, existing methods do not consider the actual probability of collision for the planned trajectory, but rather its marginalization, that is, the independent collision probabilities for each planning step and/or dynamic obstacle, resulting in conservative trajectories. To address this issue, we introduce a novel real-time capable method termed Safe Horizon MPC, that explicitly constrains the joint probability of collision with all obstacles over the duration of the motion plan. This is achieved by reformulating the chance-constrained planning problem using scenario optimization and predictive control. Our method is less conservative than state-of-the-art approaches, applicable to arbitrary probability distributions of the obstacles' trajectories, computationally tractable and scalable. We demonstrate our proposed approach using a mobile robot and an autonomous vehicle in an environment shared with humans.

7.Assessment of the Utilization of Quadruped Robots in Pharmaceutical Research and Development Laboratories

Authors:Brian Parkinson, Ádám Wolf, Péter Galambos, Károly Széll

Abstract: Drug development is becoming more and more complex and resource-intensive. To reduce the costs and the time-to-market, the pharmaceutical industry employs cutting-edge automation solutions. Supportive robotics technologies, such as stationary and mobile manipulators, exist in various laboratory settings. However, they still lack the mobility and dexterity to navigate and operate in human-centered environments. We evaluate the feasibility of quadruped robots for the specific use case of remote inspection, utilizing the out-of-the-box capabilities of Boston Dynamics' Spot platform. We also provide an outlook on the newest technological advancements and the future applications these are anticipated to enable.

8.Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Authors:Federico Rollo, Gennaro Raiola, Andrea Zunino, Nikolaos Tsagarakis, Arash Ajoudani

Abstract: Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.

9.Soft Gripping: Specifying for Trustworthiness

Authors:Dhaminda B. Abeywickrama, Nguyen Hao Le, Greg Chance, Peter D. Winter, Arianna Manzini, Alix J. Partridge, Jonathan Ives, John Downer, Graham Deacon, Jonathan Rossiter, Kerstin Eder, Shane Windsor

Abstract: Soft robotics is an emerging technology in which engineers create flexible devices for use in a variety of applications. In order to advance the wide adoption of soft robots, ensuring their trustworthiness is essential; if soft robots are not trusted, they will not be used to their full potential. In order to demonstrate trustworthiness, a specification needs to be formulated to define what is trustworthy. However, even for soft robotic grippers, which is one of the most mature areas in soft robotics, the soft robotics community has so far given very little attention to formulating specifications. In this work, we discuss the importance of developing specifications during development of soft robotic systems, and present an extensive example specification for a soft gripper for pick-and-place tasks for grocery items. The proposed specification covers both functional and non-functional requirements, such as reliability, safety, adaptability, predictability, ethics, and regulations. We also highlight the need to promote verifiability as a first-class objective in the design of a soft gripper.

10.Social Impressions of the NAO Robot and its Impact on Physiology

Authors:Ruchik Mishra, Karla Conn Welch

Abstract: The social applications of robots possess intrinsic challenges with respect to social paradigms and heterogeneity of different groups. These challenges can be in the form of social acceptability, anthropomorphism, likeability, past experiences with robots etc. In this paper, we have considered a group of neurotypical adults to describe how different voices and motion types of the NAO robot can have effect on the perceived safety, anthropomorphism, likeability, animacy, and perceived intelligence of the robot. In addition, prior robot experience has also been taken into consideration to perform this analysis using a one-way Analysis of Variance (ANOVA). Further, we also demonstrate that these different modalities instigate different physiological responses in the person. This classification has been done using two different deep learning approaches, 1) Convolutional Neural Network (CNN), and 2) Gramian Angular Fields on the Blood Volume Pulse (BVP) data recorded. Both of these approaches achieve better than chance accuracy 25% for a 4 class classification.

11.Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach

Authors:Iman Sharifi, Mustafa Yildirim, Saber Fallah

Abstract: The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.

12.Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly

Authors:Jianxiang Feng, Matan Atad, Ismael Rodríguez, Maximilian Durner, Stephan Günnemann, Rudolph Triebel

Abstract: Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.

13.Dynamic Mobile Manipulation via Whole-Body Bilateral Teleoperation of a Wheeled Humanoid

Authors:Amartya Purushottam, Yeongtae Jung, Christopher Xu, Joao Ramos

Abstract: Humanoid robots have the potential to help human workers by realizing physically demanding manipulation tasks such as moving large boxes within warehouses. We define such tasks as Dynamic Mobile Manipulation (DMM). This paper presents a framework for DMM via whole-body teleoperation, built upon three key contributions: Firstly, a teleoperation framework employing a Human Machine Interface (HMI) and a bi-wheeled humanoid, SATYRR, is proposed. Secondly, the study introduces a dynamic locomotion mapping, utilizing human-robot reduced order models, and a kinematic retargeting strategy for manipulation tasks. Additionally, the paper discusses the role of whole-body haptic feedback for wheeled humanoid control. Finally, the system's effectiveness and mappings for DMM are validated through locomanipulation experiments and heavy box pushing tasks. Here we show two forms of DMM: grasping a target moving at an average speed of 0.4 m/s, and pushing boxes weighing up to 105\% of the robot's weight. By simultaneously adjusting their pitch and using their arms, the pilot adjusts the robot pose to apply larger contact forces and move a heavy box at a constant velocity of 0.2 m/s.

14.Efficient Determination of Safety Requirements for Perception Systems

Authors:Sydney M. Katz, Anthony L. Corso, Esen Yel, Mykel J. Kochenderfer

Abstract: Perception systems operate as a subcomponent of the general autonomy stack, and perception system designers often need to optimize performance characteristics while maintaining safety with respect to the overall closed-loop system. For this reason, it is useful to distill high-level safety requirements into component-level requirements on the perception system. In this work, we focus on efficiently determining sets of safe perception system performance characteristics given a black-box simulator of the fully-integrated, closed-loop system. We combine the advantages of common black-box estimation techniques such as Gaussian processes and threshold bandits to develop a new estimation method, which we call smoothing bandits. We demonstrate our method on a vision-based aircraft collision avoidance problem and show improvements in terms of both accuracy and efficiency over the Gaussian process and threshold bandit baselines.

15.Multi-Predictor Fusion: Combining Learning-based and Rule-based Trajectory Predictors

Authors:Sushant Veer, Apoorva Sharma, Marco Pavone

Abstract: Trajectory prediction modules are key enablers for safe and efficient planning of autonomous vehicles (AVs), particularly in highly interactive traffic scenarios. Recently, learning-based trajectory predictors have experienced considerable success in providing state-of-the-art performance due to their ability to learn multimodal behaviors of other agents from data. In this paper, we present an algorithm called multi-predictor fusion (MPF) that augments the performance of learning-based predictors by imbuing them with motion planners that are tasked with satisfying logic-based rules. MPF probabilistically combines learning- and rule-based predictors by mixing trajectories from both standalone predictors in accordance with a belief distribution that reflects the online performance of each predictor. In our results, we show that MPF outperforms the two standalone predictors on various metrics and delivers the most consistent performance.

1.Decentralized Motor Skill Learning for Complex Robotic Systems

Authors:Yanjiang Guo, Zheyuan Jiang, Yen-Jen Wang, Jingyue Gao, Jianyu Chen

Abstract: Reinforcement learning (RL) has achieved remarkable success in complex robotic systems (eg. quadruped locomotion). In previous works, the RL-based controller was typically implemented as a single neural network with concatenated observation input. However, the corresponding learned policy is highly task-specific. Since all motors are controlled in a centralized way, out-of-distribution local observations can impact global motors through the single coupled neural network policy. In contrast, animals and humans can control their limbs separately. Inspired by this biological phenomenon, we propose a Decentralized motor skill (DEMOS) learning algorithm to automatically discover motor groups that can be decoupled from each other while preserving essential connections and then learn a decentralized motor control policy. Our method improves the robustness and generalization of the policy without sacrificing performance. Experiments on quadruped and humanoid robots demonstrate that the learned policy is robust against local motor malfunctions and can be transferred to new tasks.

2.Micromanipulation in Surgery: Autonomous Needle Insertion Inside the Eye for Targeted Drug Delivery

Authors:Ji Woong Kim, Peiyao Zhang, Peter Gehlbach, Iulian Iordachita, Marin Kobilarov

Abstract: We consider a micromanipulation problem in eye surgery, specifically retinal vein cannulation (RVC). RVC involves inserting a microneedle into a retinal vein for the purpose of targeted drug delivery. The procedure requires accurately guiding a needle to a target vein and inserting it while avoiding damage to the surrounding tissues. RVC can be considered similar to the reach or push task studied in robotics manipulation, but with additional constraints related to precision and safety while interacting with soft tissues. Prior works have mainly focused developing robotic hardware and sensors to enhance the surgeons' accuracy, leaving the automation of RVC largely unexplored. In this paper, we present the first autonomous strategy for RVC while relying on a minimal setup: a robotic arm, a needle, and monocular images. Our system exclusively relies on monocular vision to achieve precise navigation, gentle placement on the target vein, and safe insertion without causing tissue damage. Throughout the procedure, we employ machine learning for perception and to identify key surgical events such as needle-vein contact and vein punctures. Detecting these events guides our task and motion planning framework, which generates safe trajectories using model predictive control to complete the procedure. We validate our system through 24 successful autonomous trials on 4 cadaveric pig eyes. We show that our system can navigate to target veins within 22 micrometers of XY accuracy and under 35 seconds, and consistently puncture the target vein without causing tissue damage. Preliminary comparison to a human demonstrates the superior accuracy and reliability of our system.

3.Modeling and parametric optimization of 3D tendon-sheath actuator system for upper limb soft exosuit

Authors:Amit Yadav, Nitesh Kumar, Shaurya Surana, Lalan Kumar, Suriya Prakash Muthukrishnan, Shubhendu Bhasin, Sitikantha Roy

Abstract: This paper presents an analysis of parametric characterization of a motor driven tendon-sheath actuator system for use in upper limb augmentation for applications such as rehabilitation, therapy, and industrial automation. The double tendon sheath system, which uses two sets of cables (agonist and antagonist side) guided through a sheath, is considered to produce smooth and natural-looking movements of the arm. The exoskeleton is equipped with a single motor capable of controlling both the flexion and extension motions. One of the key challenges in the implementation of a double tendon sheath system is the possibility of slack in the tendon, which can impact the overall performance of the system. To address this issue, a robust mathematical model is developed and a comprehensive parametric study is carried out to determine the most effective strategies for overcoming the problem of slack and improving the transmission. The study suggests that incorporating a series spring into the system's tendon leads to a universally applicable design, eliminating the need for individual customization. The results also show that the slack in the tendon can be effectively controlled by changing the pretension, spring constant, and size and geometry of spool mounted on the axle of motor.

4.LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

Authors:Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua Xie

Abstract: This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric is proposed for the filter-based Lidar inertial odometry, which demonstrates an improvement from merely quantifying distance to incorporating variance disparity, further enriching the comprehensiveness and accuracy of the residual metric. Due to the strategic design of the residual metric, we propose a simple yet effective voxel-solely mapping scheme, which only necessities the maintenance of one centroid and one covariance matrix for each voxel. Experiments on different datasets demonstrate the robustness and accuracy of our framework for various data inputs and environments. To the benefit of the robotics society, we open source the code at

5.Human-like Decision-making at Unsignalized Intersection using Social Value Orientation

Authors:Yan Tong, Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

Abstract: With the commercial application of automated vehicles (AVs), the sharing of roads between AVs and human-driven vehicles (HVs) becomes a common occurrence in the future. While research has focused on improving the safety and reliability of autonomous driving, it's also crucial to consider collaboration between AVs and HVs. Human-like interaction is a required capability for AVs, especially at common unsignalized intersections, as human drivers of HVs expect to maintain their driving habits for inter-vehicle interactions. This paper uses the social value orientation (SVO) in the decision-making of vehicles to describe the social interaction among multiple vehicles. Specifically, we define the quantitative calculation of the conflict-involved SVO at unsignalized intersections to enhance decision-making based on the reinforcement learning method. We use naturalistic driving scenarios with highly interactive motions for performance evaluation of the proposed method. Experimental results show that SVO is more effective in characterizing inter-vehicle interactions than conventional motion state parameters like velocity, and the proposed method can accurately reproduce naturalistic driving trajectories compared to behavior cloning.

6.Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization

Authors:Stephen Hausler, Sourav Garg, Punarjay Chakravarty, Shubham Shrivastava, Ankit Vora, Michael Milford

Abstract: Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.

7.Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle

Authors:Václav Pritzl, Matouš Vrba, Petr Štěpán, Martin Saska

Abstract: A novel relative localization approach for cooperative guidance of a micro-scale Unmanned Aerial Vehicle (UAV) fusing Visual-Inertial Odometry (VIO) with Light Detection and Ranging (LiDAR) is proposed in this paper. LiDAR-based localization is accurate and robust to challenging environmental conditions, but 3D LiDARs are relatively heavy and require large UAV platforms. Visual cameras are cheap and lightweight. However, visual-based self-localization methods exhibit lower accuracy and can suffer from significant drift with respect to the global reference frame. We focus on cooperative navigation in a heterogeneous team of a primary LiDAR-equipped UAV and secondary camera-equipped UAV. We propose a novel cooperative approach combining LiDAR relative localization data with VIO output on board the primary UAV to obtain an accurate pose of the secondary UAV. The pose estimate is used to guide the secondary UAV along trajectories defined in the primary UAV reference frame. The experimental evaluation has shown the superior accuracy of our method to the raw VIO output and demonstrated its capability to guide the secondary UAV along desired trajectories while mitigating VIO drift.

8.Role of single particle motility statistics on efficiency of targeted delivery of micro-robot swarms

Authors:Akshatha Jagadish, Manoj Varma

Abstract: The study of dynamics of single active particles plays an important role in the development of artificial or hybrid micro-systems for bio-medical and other applications at micro-scale. Here, we utilize the results of these studies to better understand their implications for the specific application of drug delivery. We analyze the variations in the capture efficiency for different types of motion dynamics without inter-particle interactions and compare the results. We also discuss the reasons for the same and describe the specific parameters that affect the capture efficiency, which in turn helps in both hardware and control design of a micro-bot swarm system for drug delivery.

9.Unscented Optimal Control for 3D Coverage Planning with an Autonomous UAV Agent

Authors:Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou

Abstract: We propose a novel probabilistically robust controller for the guidance of an unmanned aerial vehicle (UAV) in coverage planning missions, which can simultaneously optimize both the UAV's motion, and camera control inputs for the 3D coverage of a given object of interest. Specifically, the coverage planning problem is formulated in this work as an optimal control problem with logical constraints to enable the UAV agent to jointly: a) select a series of discrete camera field-of-view states which satisfy a set of coverage constraints, and b) optimize its motion control inputs according to a specified mission objective. We show how this hybrid optimal control problem can be solved with standard optimization tools by converting the logical expressions in the constraints into equality/inequality constraints involving only continuous variables. Finally, probabilistic robustness is achieved by integrating the unscented transformation to the proposed controller, thus enabling the design of robust open-loop coverage plans which take into account the future posterior distribution of the UAV's state inside the planning horizon.

10.Navigation of micro-robot swarms for targeted delivery using reinforcement learning

Authors:Akshatha Jagadish, Manoj Varma

Abstract: Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optimization (RPO) to navigate a swarm of 4, 9 and 16 microswimmers under hydrodynamic effects, controlled by their orientation, towards a circular absorbing target. We look at both PPO and RPO performances with limited state information scenarios and also test their robustness for random target location and size. We use curriculum learning to improve upon the performance and demonstrate the same in learning to navigate a swarm of 25 swimmers and steering the swarm to exemplify the manoeuvring capabilities of the RL model.

11.Mixed Integer Programming for Time-Optimal Multi-Robot Coverage Path Planning with Efficient Heuristics

Authors:Jingtao Tang, Hang Ma

Abstract: We investigate time-optimal Multi-Robot Coverage Path Planning (MCPP) for both unweighted and weighted terrains, which aims to minimize the coverage time, defined as the maximum travel time of all robots. Specifically, we focus on a reduction from MCPP to Rooted Min-Max Tree Cover (RMMTC). For the first time, we propose a Mixed Integer Programming (MIP) model to optimally solve RMMTC, resulting in an MCPP solution with a coverage time that is provably at most four times the optimal. Moreover, we propose two suboptimal yet effective heuristics that reduce the number of variables in the MIP model, thus improving its efficiency for large-scale MCPP instances. We show that both heuristics result in reduced-size MIP models that remain complete (i.e., guarantee to find a solution if one exists) for all RMMTC instances. Additionally, we explore the use of model optimization warm-startup to further improve the efficiency of both the original MIP model and the reduced-size MIP models. We validate the effectiveness of our MIP-based MCPP planner through experiments that compare it with two state-of-the-art MCPP planners on various instances, demonstrating a reduction in the coverage time by an average of 42.42% and 39.16% over them, respectively.

12.Projection-based first-order constrained optimization solver for robotics

Authors:Hakan Girgin, Tobias Löw, Teng Xue, Sylvain Calinon

Abstract: Robot programming tools ranging from inverse kinematics (IK) to model predictive control (MPC) are most often described as constrained optimization problems. Even though there are currently many commercially-available second-order solvers, robotics literature recently focused on efficient implementations and improvements over these solvers for real-time robotic applications. However, most often, these implementations stay problem-specific and are not easy to access or implement, or do not exploit the geometric aspect of the robotics problems. In this work, we propose to solve these problems using a fast, easy-to-implement first-order method that fully exploits the geometric constraints via Euclidean projections, called Augmented Lagrangian Spectral Projected Gradient Descent (ALSPG). We show that 1. using projections instead of full constraints and gradients improves the performance of the solver and 2. ALSPG stays competitive to the standard second-order methods such as iLQR in the unconstrained case. We showcase these results with IK and motion planning problems on simulated examples and with an MPC problem on a 7-axis manipulator experiment.

13.An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning

Authors:Keisuke Sugiura, Hiroki Matsutani

Abstract: Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP core (P3NetCore) targeting FPGA SoCs (Xilinx ZCU104). P3Net improves the algorithm and model architecture of the recently-proposed MPNet. P3Net employs an encoder with a PointNet backbone and a lightweight planning network in order to extract robust point cloud features and sample path points from a promising region. P3NetCore is comprised of the fully-pipelined point cloud encoder, batched bidirectional path planner, and parallel collision checker, to cover most part of the algorithm. On the 2D (3D) datasets, P3Net with the IP core runs 24.54-149.57x and 6.19-115.25x (10.03-59.47x and 3.38-28.76x) faster than ARM Cortex CPU and Nvidia Jetson while only consuming 0.255W (0.809W), and is up to 1049.42x (133.84x) power-efficient than the workstation. P3Net improves the success rate by up to 28.2% and plans a near-optimal path, leading to a significantly better tradeoff between computation and solution quality than MPNet and the state-of-the-art sampling-based methods.

14.Evaluation of the Benefits of Zero Velocity Update in Decentralized EKF-Based Cooperative Localization Algorithms for GNSS-Denied Multi-Robot Systems

Authors:Cagri Kilic, Eduardo Gutierrez, Jason N. Gross

Abstract: This paper proposes the cooperative use of zero velocity update (ZU) in a decentralized extended Kalman filter (DEKF) based localization algorithm for multi-robot systems. The filter utilizes inertial measurement unit (IMU), ultra-wideband (UWB), and odometry velocity measurements to improve the localization performance of the system in the presence of a GNSS-denied environment. The contribution of this work is to evaluate the benefits of using ZU in a DEKF-based localization algorithm. The algorithm is tested with real hardware in a video motion capture facility and a Robot Operating System (ROS) based simulation environment for unmanned ground vehicles (UGV). Both simulation and real-world experiments are performed to show the effectiveness of using ZU in one robot to reinstate the localization of other robots in a multi-robot system. Experimental results from GNSS-denied simulation and real-world environments show that using ZU with simple heuristics in the DEKF significantly improves the 3D localization accuracy.

15.The Bridge between Xsens Motion-Capture and Robot Operating System (ROS): Enabling Robots with Online 3D Human Motion Tracking

Authors:Mattia Leonori Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Marta Lorenzini Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Luca Fortini Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Juan M. Gandarias Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Arash Ajoudani Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia

Abstract: This document introduces the bridge between the leading inertial motion-capture systems for 3D human tracking and the most used robotics software framework. 3D kinematic data provided by Xsens are translated into ROS messages to make them usable by robots and a Unified Robotics Description Format (URDF) model of the human kinematics is generated, which can be run and displayed in ROS 3D visualizer, RViz. The code to implement the to-ROS-bridge is a ROS package called xsens_mvn_ros and is available on GitHub at The main documentation can be found at

16.Zespol: A Lightweight Environment for Training Swarming Agents

Authors:Shay Snyder George Mason University, Kevin Zhu George Mason University, Ricardo Vega George Mason University, Cameron Nowzari George Mason University, Maryam Parsa George Mason University

Abstract: Agent-based modeling (ABM) and simulation have emerged as important tools for studying emergent behaviors, especially in the context of swarming algorithms for robotic systems. Despite significant research in this area, there is a lack of standardized simulation environments, which hinders the development and deployment of real-world robotic swarms. To address this issue, we present Zespol, a modular, Python-based simulation environment that enables the development and testing of multi-agent control algorithms. Zespol provides a flexible and extensible sandbox for initial research, with the potential for scaling to real-world applications. We provide a topological overview of the system and detailed descriptions of its plug-and-play elements. We demonstrate the fidelity of Zespol in simulated and real-word robotics by replicating existing works highlighting the simulation to real gap with the milling behavior. We plan to leverage Zespol's plug-and-play feature for neuromorphic computing in swarming scenarios, which involves using the modules in Zespol to simulate the behavior of neurons and their connections as synapses. This will enable optimizing and studying the emergent behavior of swarm systems in complex environments. Our goal is to gain a better understanding of the interplay between environmental factors and neural-like computations in swarming systems.

17.Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

Authors:Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

Abstract: 3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site:

18.Learning Evacuee Models from Robot-Guided Emergency Evacuation Experiments

Authors:Mollik Nayyar, Ghanghoon Paik, Zhenyuan Yuan, Tongjia Zheng, Minghui Zhu, Hai Lin, Alan R. Wagner

Abstract: Recent research has examined the possibility of using robots to guide evacuees to safe exits during emergencies. Yet, there are many factors that can impact a person's decision to follow a robot. Being able to model how an evacuee follows an emergency robot guide could be crucial for designing robots that effectively guide evacuees during an emergency. This paper presents a method for developing realistic and predictive human evacuee models from physical human evacuation experiments. The paper analyzes the behavior of 14 human subjects during physical robot-guided evacuation. We then use the video data to create evacuee motion models that predict the person's future positions during the emergency. Finally, we validate the resulting models by running a k-fold cross-validation on the data collected during physical human subject experiments. We also present performance results of the model using data from a similar simulated emergency evacuation experiment demonstrating that these models can serve as a tool to predict evacuee behavior in novel evacuation simulations.

19.Statler: State-Maintaining Language Models for Embodied Reasoning

Authors:Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter

Abstract: Large language models (LLMs) provide a promising tool that enable robots to perform complex robot reasoning tasks. However, the limited context window of contemporary LLMs makes reasoning over long time horizons difficult. Embodied tasks such as those that one might expect a household robot to perform typically require that the planner consider information acquired a long time ago (e.g., properties of the many objects that the robot previously encountered in the environment). Attempts to capture the world state using an LLM's implicit internal representation is complicated by the paucity of task- and environment-relevant information available in a robot's action history, while methods that rely on the ability to convey information via the prompt to the LLM are subject to its limited context window. In this paper, we propose Statler, a framework that endows LLMs with an explicit representation of the world state as a form of ``memory'' that is maintained over time. Integral to Statler is its use of two instances of general LLMs -- a world-model reader and a world-model writer -- that interface with and maintain the world state. By providing access to this world state ``memory'', Statler improves the ability of existing LLMs to reason over longer time horizons without the constraint of context length. We evaluate the effectiveness of our approach on three simulated table-top manipulation domains and a real robot domain, and show that it improves the state-of-the-art in LLM-based robot reasoning. Project website:

20.GIRA: Gaussian Mixture Models for Inference and Robot Autonomy

Authors:Kshitij Goel, Wennie Tabib

Abstract: Large-scale deployments of robot teams are challenged by the need to share high-resolution perceptual information over low-bandwidth communication channels. Individual size, weight, and power constrained robots rely on environment models to assess navigability and safely traverse unstructured and complex environments. State of the art perception frameworks construct these models via multiple disparate pipelines that reuse the same underlying sensor data, which leads to increased computation, redundancy, and complexity. To bridge this gap, this paper introduces GIRA -- an open-source framework for compact, high-resolution environment modeling using Gaussian mixture models (GMMs). GIRA provides fundamental robotics capabilities such as high-fidelity reconstruction, pose estimation, and occupancy modeling in a single continuous representation.

21.Parallel Self-assembly for a Multi-USV System on Water Surface with Obstacles

Authors:Lianxin Zhang, Yihan Huang, Zhongzhong Cao, Yang Jiao, Huihuan Qian

Abstract: Parallel self-assembly is an efficient approach to accelerate the assembly process for modular robots. However, these approaches cannot accommodate complicated environments with obstacles, which restricts their applications. This paper considers the surrounding stationary obstacles and proposes a parallel self-assembly planning algorithm named SAPOA. With this algorithm, modular robots can avoid immovable obstacles when performing docking actions, which adapts the parallel self-assembly process to complex scenes. To validate the efficiency and scalability, we have designed 25 distinct grid maps with different obstacle configurations to simulate the algorithm. From the results compared to the existing parallel self-assembly algorithms, our algorithm shows a significantly higher success rate, which is more than 80%. For verification in real-world applications, a multi-agent hardware testbed system is developed. The algorithm is successfully deployed on four omnidirectional unmanned surface vehicles, CuBoats. The navigation strategy that translates the discrete planner, SAPOA, to the continuous controller on the CuBoats is presented. The algorithm's feasibility and flexibility were demonstrated through successful self-assembly experiments on 5 maps with varying obstacle configurations.

22.Collapse of Straight Soft Growing Inflated Beam Robots Under Their Own Weight

Authors:Ciera McFarland, Margaret M. Coad

Abstract: Soft, growing inflated beam robots, also known as everting vine robots, have previously been shown to navigate confined spaces with ease. Less is known about their ability to navigate three-dimensional open spaces where they have the potential to collapse under their own weight as they attempt to move through a space. Previous work has studied collapse of inflated beams and vine robots due to purely transverse or purely axial external loads. Here, we extend previous models to predict the length at which straight vine robots will collapse under their own weight at arbitrary launch angle relative to gravity, inflated diameter, and internal pressure. Our model successfully predicts the general trends of collapse behavior of straight vine robots. We find that collapse length increases non-linearly with the robot's launch angle magnitude, linearly with the robot's diameter, and with the square root of the robot's internal pressure. We also demonstrate the use of our model to determine the robot parameters required to grow a vine robot across a gap in the floor. This work forms the foundation of an approach for modeling the collapse of vine robots and inflated beams in arbitrary shapes.

23.A Personalized Household Assistive Robot that Learns and Creates New Breakfast Options through Human-Robot Interaction

Authors:Ali Ayub, Chrystopher L. Nehaniv, Kerstin Dautenhahn

Abstract: For robots to assist users with household tasks, they must first learn about the tasks from the users. Further, performing the same task every day, in the same way, can become boring for the robot's user(s), therefore, assistive robots must find creative ways to perform tasks in the household. In this paper, we present a cognitive architecture for a household assistive robot that can learn personalized breakfast options from its users and then use the learned knowledge to set up a table for breakfast. The architecture can also use the learned knowledge to create new breakfast options over a longer period of time. The proposed cognitive architecture combines state-of-the-art perceptual learning algorithms, computational implementation of cognitive models of memory encoding and learning, a task planner for picking and placing objects in the household, a graphical user interface (GUI) to interact with the user and a novel approach for creating new breakfast options using the learned knowledge. The architecture is integrated with the Fetch mobile manipulator robot and validated, as a proof-of-concept system evaluation in a large indoor environment with multiple kitchen objects. Experimental results demonstrate the effectiveness of our architecture to learn personalized breakfast options from the user and generate new breakfast options never learned by the robot.

24.Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

Authors:Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Abstract: Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsight with its final state as the goal. In this work, we contribute a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Prior work has made progress on this using vision-language models or by jointly training language-goal-conditioned policies, but so far neither method has scaled effectively to real-world robot tasks without significant human annotation. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to. We then train a policy on this embedding: the policy benefits from all the unlabeled data, but the aligned embedding provides an interface for language to steer the policy. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data. Videos and code for our approach can be found on our website: .

25.How Do Human Users Teach a Continual Learning Robot in Repeated Interactions?

Authors:Ali Ayub, Jainish Mehta, Zachary De Francesco, Patrick Holthaus, Kerstin Dautenhahn, Chrystopher L. Nehaniv

Abstract: Continual learning (CL) has emerged as an important avenue of research in recent years, at the intersection of Machine Learning (ML) and Human-Robot Interaction (HRI), to allow robots to continually learn in their environments over long-term interactions with humans. Most research in continual learning, however, has been robot-centered to develop continual learning algorithms that can quickly learn new information on static datasets. In this paper, we take a human-centered approach to continual learning, to understand how humans teach continual learning robots over the long term and if there are variations in their teaching styles. We conducted an in-person study with 40 participants that interacted with a continual learning robot in 200 sessions. In this between-participant study, we used two different CL models deployed on a Fetch mobile manipulator robot. An extensive qualitative and quantitative analysis of the data collected in the study shows that there is significant variation among the teaching styles of individual users indicating the need for personalized adaptation to their distinct teaching styles. The results also show that although there is a difference in the teaching styles between expert and non-expert users, the style does not have an effect on the performance of the continual learning robot. Finally, our analysis shows that the constrained experimental setups that have been widely used to test most continual learning techniques are not adequate, as real users interact with and teach continual learning robots in a variety of ways. Our code is available at

26.RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks

Authors:Eleftherios Triantafyllidis, Fernando Acero, Zhaocheng Liu, Zhibin Li

Abstract: Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.

27.Vision-based Oxy-fuel Torch Control for Robotic Metal Cutting

Authors:James Akl, Yash Patil, Chinmay Todankar, Berk Calli

Abstract: The automation of key processes in metal cutting would substantially benefit many industries such as manufacturing and metal recycling. We present a vision-based control scheme for automated metal cutting with oxy-fuel torches, an established cutting medium in industry. The system consists of a robot equipped with a cutting torch and an eye-in-hand camera observing the scene behind a tinted visor. We develop a vision-based control algorithm to servo the torch's motion by visually observing its effects on the metal surface. As such, the vision system processes the metal surface's heat pool and computes its associated features, specifically pool convexity and intensity, which are then used for control. The operating conditions of the control problem are defined within which the stability is proven. In addition, metal cutting experiments are performed using a physical 1-DOF robot and oxy-fuel cutting equipment. Our results demonstrate the successful cutting of metal plates across three different plate thicknesses, relying purely on visual information without a priori knowledge of the thicknesses.

28.Modeling, Characterization, and Control of Bacteria-inspired Bi-flagellated Mechanism with Tumbling

Authors:Zhuonan Hao, Sangmin Lim, M. Khalid Jawed

Abstract: Multi-flagellated bacteria utilize the hydrodynamic interaction between their filamentary tails, known as flagella, to swim and change their swimming direction in low Reynolds number flow. This interaction, referred to as bundling and tumbling, is often overlooked in simplified hydrodynamic models such as Resistive Force Theories (RFT). However, for the development of efficient and steerable robots inspired by bacteria, it becomes crucial to exploit this interaction. In this paper, we present the construction of a macroscopic bio-inspired robot featuring two rigid flagella arranged as right-handed helices, along with a cylindrical head. By rotating the flagella in opposite directions, the robot's body can reorient itself through repeatable and controllable tumbling. To accurately model this bi-flagellated mechanism in low Reynolds flow, we employ a coupling of rigid body dynamics and the method of Regularized Stokeslet Segments (RSS). Unlike RFT, RSS takes into account the hydrodynamic interaction between distant filamentary structures. Furthermore, we delve into the exploration of the parameter space to optimize the propulsion and torque of the system. To achieve the desired reorientation of the robot, we propose a tumble control scheme that involves modulating the rotation direction and speed of the two flagella. By implementing this scheme, the robot can effectively reorient itself to attain the desired attitude. Notably, the overall scheme boasts a simplified design and control as it only requires two control inputs. With our macroscopic framework serving as a foundation, we envision the eventual miniaturization of this technology to construct mobile and controllable micro-scale bacterial robots.

1.Introspective Perception for Mobile Robots

Authors:Sadegh Rabiee, Joydeep Biswas

Abstract: Perception algorithms that provide estimates of their uncertainty are crucial to the development of autonomous robots that can operate in challenging and uncontrolled environments. Such perception algorithms provide the means for having risk-aware robots that reason about the probability of successfully completing a task when planning. There exist perception algorithms that come with models of their uncertainty; however, these models are often developed with assumptions, such as perfect data associations, that do not hold in the real world. Hence the resultant estimated uncertainty is a weak lower bound. To tackle this problem we present introspective perception - a novel approach for predicting accurate estimates of the uncertainty of perception algorithms deployed on mobile robots. By exploiting sensing redundancy and consistency constraints naturally present in the data collected by a mobile robot, introspective perception learns an empirical model of the error distribution of perception algorithms in the deployment environment and in an autonomously supervised manner. In this paper, we present the general theory of introspective perception and demonstrate successful implementations for two different perception tasks. We provide empirical results on challenging real-robot data for introspective stereo depth estimation and introspective visual simultaneous localization and mapping and show that they learn to predict their uncertainty with high accuracy and leverage this information to significantly reduce state estimation errors for an autonomous mobile robot.

2.Dynamic-Resolution Model Learning for Object Pile Manipulation

Authors:Yixuan Wang, Yunzhu Li, Katherine Driggs-Campbell, Li Fei-Fei, Jiajun Wu

Abstract: Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.

3.Principles and Guidelines for Evaluating Social Robot Navigation Algorithms

Authors:Anthony Francis Logical Robotics, Claudia Perez-D'Arpino NVIDIA, Chengshu Li Stanford, Fei Xia Google, Alexandre Alahi EPFL, Rachid Alami LAAS-CNRS, Universite de Toulouse, Aniket Bera Purdue, Abhijat Biswas CMU, Joydeep Biswas UT Austin, Rohan Chandra UT Austin, Hao-Tien Lewis Chiang Google, Michael Everett Northeastern, Sehoon Ha Georgia Tech, Justin Hart UT Austin, Jonathan P. How MIT, Haresh Karnan UT Austin, Tsang-Wei Edward Lee Google, Luis J. Manso Aston, Reuth Mirksy Bar Ilan, Soeren Pirk Adobe, Phani Teja Singamaneni LAAS-CNRS, Universite de Toulouse, Peter Stone UT Austin Sony AI, Ada V. Taylor CMU, Peter Trautman Honda, Nathan Tsoi Yale, Marynel Vazquez Yale, Xuesu Xiao GMU, Peng Xu Google, Naoki Yokoyama Georgia Tech, Alexander Toshev Apple, Roberto Martin-Martin UT Austin

Abstract: A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.

4.TacMMs: Tactile Mobile Manipulators for Warehouse Automation

Authors:Zhuochao He, Xuyang Zhang, Simon Jones, Sabine Hauert, Dandan Zhang, Nathan F. Lepora

Abstract: Multi-robot platforms are playing an increasingly important role in warehouse automation for efficient goods transport. This paper proposes a novel customization of a multi-robot system, called Tactile Mobile Manipulators (TacMMs). Each TacMM integrates a soft optical tactile sensor and a mobile robot with a load-lifting mechanism, enabling cooperative transportation in tasks requiring coordinated physical interaction. More specifically, we mount the TacTip (biomimetic optical tactile sensor) on the Distributed Organisation and Transport System (DOTS) mobile robot. The tactile information then helps the mobile robots adjust the relative robot-object pose, thereby increasing the efficiency of load-lifting tasks. This study compares the performance of using two TacMMs with tactile perception with traditional vision-based pose adjustment for load-lifting. The results show that the average success rate of the TacMMs (66%) is improved over a purely visual-based method (34%), with a larger improvement when the mass of the load was non-uniformly distributed. Although this initial study considers two TacMMs, we expect the benefits of tactile perception to extend to multiple mobile robots. Website:

5.A Survey on Datasets for Decision-making of Autonomous Vehicle

Authors:Yuning Wang, Zeyu Han, Yining Xing, Shaobing Xu, Jianqiang Wang

Abstract: Autonomous vehicles (AV) are expected to reshape future transportation systems, and decision-making is one of the critical modules toward high-level automated driving. To overcome those complicated scenarios that rule-based methods could not cope with well, data-driven decision-making approaches have aroused more and more focus. The datasets to be used in developing data-driven methods dramatically influences the performance of decision-making, hence it is necessary to have a comprehensive insight into the existing datasets. From the aspects of collection sources, driving data can be divided into vehicle, environment, and driver related data. This study compares the state-of-the-art datasets of these three categories and summarizes their features including sensors used, annotation, and driving scenarios. Based on the characteristics of the datasets, this survey also concludes the potential applications of datasets on various aspects of AV decision-making, assisting researchers to find appropriate ones to support their own research. The future trends of AV dataset development are summarized.

6.ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Authors:Zhengrong Xue, Han Zhang, Jingwen Cheng, Zhengmao He, Yuanchen Ju, Changyi Lin, Gu Zhang, Huazhe Xu

Abstract: We present ArrayBot, a distributed manipulation system consisting of a $16 \times 16$ array of vertically sliding pillars integrated with tactile sensors, which can simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Surprisingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also transfer to the physical robot without any domain randomization. Leveraging the deployed policy, we present abundant real-world manipulation tasks, illustrating the vast potential of RL on ArrayBot for distributed manipulation.

7.Whole-Body Exploration with a Manipulator Using Heat Equation

Authors:Cem Bilaloglu, Tobias Löw, Sylvain Calinon

Abstract: This paper presents a whole-body robot control method for exploring and probing a given region of interest. The ergodic control formalism behind such an exploration behavior consists of matching the time-averaged statistics of a robot trajectory with the spatial statistics of the target distribution. Most existing ergodic control approaches assume the robots/sensors as individual point agents moving in space. We introduce an approach exploiting multiple kinematically constrained agents on the whole-body of a robotic manipulator, where a consensus among the agents is found for generating control actions. To do so, we exploit an existing ergodic control formulation called heat equation-driven area coverage (HEDAC), c