arXiv daily: Artificial Intelligence

arXiv daily: Artificial Intelligence (cs.AI)

1.Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

Authors:Matthias Unverzagt

Abstract: Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability. The source code, written entirely in Java, is available at

2.Rigorous Runtime Analysis of MOEA/D for Solving Multi-Objective Minimum Weight Base Problems

Authors:Anh Viet Do, Aneta Neumann, Frank Neumann, Andrew M. Sutton

Abstract: We study the multi-objective minimum weight base problem, an abstraction of classical NP-hard combinatorial problems such as the multi-objective minimum spanning tree problem. We prove some important properties of the convex hull of the non-dominated front, such as its approximation quality and an upper bound on the number of extreme points. Using these properties, we give the first run-time analysis of the MOEA/D algorithm for this problem, an evolutionary algorithm that effectively optimizes by decomposing the objectives into single-objective components. We show that the MOEA/D, given an appropriate decomposition setting, finds all extreme points within expected fixed-parameter polynomial time in the oracle model, the parameter being the number of objectives. Experiments are conducted on random bi-objective minimum spanning tree instances, and the results agree with our theoretical findings. Furthermore, compared with a previously studied evolutionary algorithm for the problem GSEMO, MOEA/D finds all extreme points much faster across all instances.

3.I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models

Authors:Max Reuter, William Schulze

Abstract: Since the release of OpenAI's ChatGPT, generative language models have attracted extensive public attention. The increased usage has highlighted generative models' broad utility, but also revealed several forms of embedded bias. Some is induced by the pre-training corpus; but additional bias specific to generative models arises from the use of subjective fine-tuning to avoid generating harmful content. Fine-tuning bias may come from individual engineers and company policies, and affects which prompts the model chooses to refuse. In this experiment, we characterize ChatGPT's refusal behavior using a black-box attack. We first query ChatGPT with a variety of offensive and benign prompts (n=1,730), then manually label each response as compliance or refusal. Manual examination of responses reveals that refusal is not cleanly binary, and lies on a continuum; as such, we map several different kinds of responses to a binary of compliance or refusal. The small manually-labeled dataset is used to train a refusal classifier, which achieves an accuracy of 92%. Second, we use this refusal classifier to bootstrap a larger (n=10,000) dataset adapted from the Quora Insincere Questions dataset. With this machine-labeled data, we train a prompt classifier to predict whether ChatGPT will refuse a given question, without seeing ChatGPT's response. This prompt classifier achieves 76% accuracy on a test set of manually labeled questions (n=1,009). We examine our classifiers and the prompt n-grams that are most predictive of either compliance or refusal. Datasets and code are available at

4.A Belief Model for Conflicting and Uncertain Evidence -- Connecting Dempster-Shafer Theory and the Topology of Evidence

Authors:Daira Pinto Prieto, Ronald de Haan, Aybüke Özgün

Abstract: One problem to solve in the context of information fusion, decision-making, and other artificial intelligence challenges is to compute justified beliefs based on evidence. In real-life examples, this evidence may be inconsistent, incomplete, or uncertain, making the problem of evidence fusion highly non-trivial. In this paper, we propose a new model for measuring degrees of beliefs based on possibly inconsistent, incomplete, and uncertain evidence, by combining tools from Dempster-Shafer Theory and Topological Models of Evidence. Our belief model is more general than the aforementioned approaches in two important ways: (1) it can reproduce them when appropriate constraints are imposed, and, more notably, (2) it is flexible enough to compute beliefs according to various standards that represent agents' evidential demands. The latter novelty allows the users of our model to employ it to compute an agent's (possibly) distinct degrees of belief, based on the same evidence, in situations when, e.g, the agent prioritizes avoiding false negatives and when it prioritizes avoiding false positives. Finally, we show that computing degrees of belief with this model is #P-complete in general.

5.Scalable Concept Extraction in Industry 4.0

Authors:Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, Sebastian Trimpe

Abstract: The industry 4.0 is leveraging digital technologies and machine learning techniques to connect and optimize manufacturing processes. Central to this idea is the ability to transform raw data into human understandable knowledge for reliable data-driven decision-making. Convolutional Neural Networks (CNNs) have been instrumental in processing image data, yet, their ``black box'' nature complicates the understanding of their prediction process. In this context, recent advances in the field of eXplainable Artificial Intelligence (XAI) have proposed the extraction and localization of concepts, or which visual cues intervene on the prediction process of CNNs. This paper tackles the application of concept extraction (CE) methods to industry 4.0 scenarios. To this end, we modify a recently developed technique, ``Extracting Concepts with Local Aggregated Descriptors'' (ECLAD), improving its scalability. Specifically, we propose a novel procedure for calculating concept importance, utilizing a wrapper function designed for CNNs. This process is aimed at decreasing the number of times each image needs to be evaluated. Subsequently, we demonstrate the potential of CE methods, by applying them in three industrial use cases. We selected three representative use cases in the context of quality control for material design (tailored textiles), manufacturing (carbon fiber reinforcement), and maintenance (photovoltaic module inspection). In these examples, CE was able to successfully extract and locate concepts directly related to each task. This is, the visual cues related to each concept, coincided with what human experts would use to perform the task themselves, even when the visual cues were entangled between multiple classes. Through empirical results, we show that CE can be applied for understanding CNNs in an industrial context, giving useful insights that can relate to domain knowledge.

6.An Approach to Solving the Abstraction and Reasoning Corpus (ARC) Challenge

Authors:Tan John Chong Min

Abstract: We utilise the power of Large Language Models (LLMs), in particular GPT4, to be prompt engineered into performing an arbitrary task. Here, we give the model some human priors via text, along with some typical procedures for solving the ARC tasks, and ask it to generate the i) broad description of the input-output relation, ii) detailed steps of the input-output mapping, iii) use the detailed steps to perform manipulation on the test input and derive the test output. The current GPT3.5/GPT4 prompt solves 2 out of 4 tested small ARC challenges (those with small grids of 8x8 and below). With tweaks to the prompt to make it more specific for the use case, it can solve more. We posit that when scaled to a multi-agent system with usage of past memory and equipped with an image interpretation tool via Visual Question Answering, we may actually be able to solve the majority of the ARC challenge

7.The Creative Frontier of Generative AI: Managing the Novelty-Usefulness Tradeoff

Authors:Anirban Mukherjee, Hannah Chang

Abstract: In this paper, drawing inspiration from the human creativity literature, we explore the optimal balance between novelty and usefulness in generative Artificial Intelligence (AI) systems. We posit that overemphasizing either aspect can lead to limitations such as hallucinations and memorization. Hallucinations, characterized by AI responses containing random inaccuracies or falsehoods, emerge when models prioritize novelty over usefulness. Memorization, where AI models reproduce content from their training data, results from an excessive focus on usefulness, potentially limiting creativity. To address these challenges, we propose a framework that includes domain-specific analysis, data and transfer learning, user preferences and customization, custom evaluation metrics, and collaboration mechanisms. Our approach aims to generate content that is both novel and useful within specific domains, while considering the unique requirements of various contexts.

8.Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach

Authors:Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu

Abstract: Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an algorithm agent in solving complex sequential decision making tasks in embodied environments by providing high-level instructions. However, interacting with LLMs can be time-consuming, as in many practical scenarios, they require a significant amount of storage space that can only be deployed on remote cloud server nodes. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable efficient and cost-effective interactions between the agent and an LLM. We propose a reinforcement learning based mediator model that determines when it is necessary to consult LLMs for high-level instructions to accomplish a target task. Experiments on 4 MiniGrid environments that entail planning sub-goals demonstrate that our method can learn to solve target tasks with only a few necessary interactions with an LLM, significantly reducing interaction costs in testing environments, compared with baseline methods. Experimental results also suggest that by learning a mediator model to interact with the LLM, the agent's performance becomes more robust against both exploratory and stochastic environments.

9.BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Authors:Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth

Abstract: Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at .

10.Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

Authors:Nicolas Hubert, Heiko Paulheim, Pierre Monnin, Armelle Brun, Davy Monticolo

Abstract: Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage schema-based information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHInE has little impact on rank-based performance but increases the number of semantically valid predictions.

11.Description Logics with Abstraction and Refinement

Authors:Carsten Lutz, Lukas Schulze

Abstract: Ontologies often require knowledge representation on multiple levels of abstraction, but description logics (DLs) are not well-equipped for supporting this. We propose an extension of DLs in which abstraction levels are first-class citizens and which provides explicit operators for the abstraction and refinement of concepts and roles across multiple abstraction levels, based on conjunctive queries. We prove that reasoning in the resulting family of DLs is decidable while several seemingly harmless variations turn out to be undecidable. We also pinpoint the precise complexity of our logics and several relevant fragments.

12.Newly Formed Cities: an AI Curation

Authors:Dario Negueruela del Castillo, Ludovica Schaerf, Pepe Ballesteros, Iacopo Neri, Valentine Bernasconi

Abstract: Art curatorial processes are characterized by the presentation of a collection of artworks in a knowledgeable way. Machine processes are characterized by their capacity to manage and analyze large amounts of data. This paper envisages machine curation and audience interaction as a means to explore the implications of contemporary AI models for the curatorial world. This project was developed for the occasion of the 2023 Helsinki Art Biennial, entitled New Directions May Emerge. We use the Helsinki Art Museum (HAM) collection to re-imagine the city of Helsinki through the lens of machine perception. We use visual-textual models to place artworks currently hosted inside the museum in outdoor public spaces of the city, assigning fictional coordinates based on similarity scores. Synthetic 360{\deg} art panoramas are generated using diffusion-based models to propose a machinic visual style guided by the artworks. The result of this project will be virtually presented as a web-based installation, where such a re-contextualization allows the navigation of an alternative version of the city while exploring its artistic heritage. Finally, we discuss our contributions to machine curation and the ethical implications that such a process entails. The web-based installation is available at this link:

13.AI-Supported Assessment of Load Safety

Authors:Julius Schöning, Niklas Kruse

Abstract: Load safety assessment and compliance is an essential step in the corporate process of every logistics service provider. In 2020, a total of 11,371 police checks of trucks were carried out, during which 9.6% (1091) violations against the load safety regulations were detected. For a logistic service provider, every load safety violation results in height fines and damage to reputation. An assessment of load safety supported by artificial intelligence (AI) will reduce the risk of accidents by unsecured loads and fines during safety assessments. This work shows how photos of the load, taken by the truck driver or the loadmaster after the loading process, can be used to assess load safety. By a trained two-stage artificial neural network (ANN), these photos are classified into three different classes I) cargo loaded safely, II) cargo loaded unsafely, and III) unusable image. By applying several architectures of convolutional neural networks (CNN), it can be shown that it is possible to distinguish between unusable and usable images for cargo safety assessment. This distinction is quite crucial since the truck driver and the loadmaster sometimes provide photos without the essential image features like the case structure of the truck and the whole cargo. A human operator or another ANN will then assess the load safety within the second stage.

14.Remarks on Utility in Repeated Bets

Authors:Nimrod Megiddo

Abstract: The use of von Neumann -- Morgenstern utility is examined in the context of multiple choices between lotteries. Different conclusions are reached if the choices are simultaneous or sequential. It is demonstrated that utility cannot be additive.

15.Considering Human Factors in Risk Maps for Robust and Foresighted Driver Warning

Authors:Tim Puphal, Ryohei Hirano, Malte Probst, Raphael Wenzel, Akihito Kimata

Abstract: Driver support systems that include human states in the support process is an active research field. Many recent approaches allow, for example, to sense the driver's drowsiness or awareness of the driving situation. However, so far, this rich information has not been utilized much for improving the effectiveness of support systems. In this paper, we therefore propose a warning system that uses human states in the form of driver errors and can warn users in some cases of upcoming risks several seconds earlier than the state of the art systems not considering human factors. The system consists of a behavior planner Risk Maps which directly changes its prediction of the surrounding driving situation based on the sensed driver errors. By checking if this driver's behavior plan is objectively safe, a more robust and foresighted driver warning is achieved. In different simulations of a dynamic lane change and intersection scenarios, we show how the driver's behavior plan can become unsafe, given the estimate of driver errors, and experimentally validate the advantages of considering human factors.

16.Embracing Background Knowledge in the Analysis of Actual Causality: An Answer Set Programming Approach

Authors:Michael Gelfond, Jorge Fandinno, Evgenii Balai

Abstract: This paper presents a rich knowledge representation language aimed at formalizing causal knowledge. This language is used for accurately and directly formalizing common benchmark examples from the literature of actual causality. A definition of cause is presented and used to analyze the actual causes of changes with respect to sequences of actions representing those examples.

17.ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

Authors:Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao

Abstract: Large language models (LLMs) with memory are computationally universal. However, mainstream LLMs are not taking full advantage of memory, and the designs are heavily influenced by biological brains. Due to their approximate nature and proneness to the accumulation of errors, conventional neural memory mechanisms cannot support LLMs to simulate complex reasoning. In this paper, we seek inspiration from modern computer architectures to augment LLMs with symbolic memory for complex multi-hop reasoning. Such a symbolic memory framework is instantiated as an LLM and a set of SQL databases, where the LLM generates SQL instructions to manipulate the SQL databases. We validate the effectiveness of the proposed memory framework on a synthetic dataset requiring complex reasoning. The project website is available at .

1.A Novel Multi-Agent Deep RL Approach for Traffic Signal Control

Authors:Shijie Wang, Shangbo Wang

Abstract: As travel demand increases and urban traffic condition becomes more complicated, applying multi-agent deep reinforcement learning (MARL) to traffic signal control becomes one of the hot topics. The rise of Reinforcement Learning (RL) has opened up opportunities for solving Adaptive Traffic Signal Control (ATSC) in complex urban traffic networks, and deep neural networks have further enhanced their ability to handle complex data. Traditional research in traffic signal control is based on the centralized Reinforcement Learning technique. However, in a large-scale road network, centralized RL is infeasible because of an exponential growth of joint state-action space. In this paper, we propose a Friend-Deep Q-network (Friend-DQN) approach for multiple traffic signal control in urban networks, which is based on an agent-cooperation scheme. In particular, the cooperation between multiple agents can reduce the state-action space and thus speed up the convergence. We use SUMO (Simulation of Urban Transport) platform to evaluate the performance of Friend-DQN model, and show its feasibility and superiority over other existing methods.

2.Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Authors:Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko

Abstract: Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM) structure. Finally, we customize forward and backward operations through the TTM-based layer for simplicity and the stableness of further training. % The resulting GPT-2-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model. On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model. The proposed tensorized layers could be used to efficiently pre-training other Transformer models.

3.Interpretable Multimodal Emotion Recognition using Facial Features and Physiological Signals

Authors:Puneet Kumar, Xiaobai Li

Abstract: This paper aims to demonstrate the importance and feasibility of fusing multimodal information for emotion recognition. It introduces a multimodal framework for emotion understanding by fusing the information from visual facial features and rPPG signals extracted from the input videos. An interpretability technique based on permutation feature importance analysis has also been implemented to compute the contributions of rPPG and visual modalities toward classifying a given input video into a particular emotion class. The experiments on IEMOCAP dataset demonstrate that the emotion classification performance improves by combining the complementary information from multiple modalities.

4.Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs

Authors:Alejandro Peña, Aythami Morales, Julian Fierrez, Ignacio Serna, Javier Ortega-Garcia, Iñigo Puente, Jorge Cordova, Gonzalo Cordova

Abstract: The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.

5.Action-Evolution Petri Nets: a Framework for Modeling and Solving Dynamic Task Assignment Problems

Authors:Riccardo Lo Bianco, Remco Dijkman, Wim Nuijten, Willem van Jaarsveld

Abstract: Dynamic task assignment involves assigning arriving tasks to a limited number of resources in order to minimize the overall cost of the assignments. To achieve optimal task assignment, it is necessary to model the assignment problem first. While there exist separate formalisms, specifically Markov Decision Processes and (Colored) Petri Nets, to model, execute, and solve different aspects of the problem, there is no integrated modeling technique. To address this gap, this paper proposes Action-Evolution Petri Nets (A-E PN) as a framework for modeling and solving dynamic task assignment problems. A-E PN provides a unified modeling technique that can represent all elements of dynamic task assignment problems. Moreover, A-E PN models are executable, which means they can be used to learn close-to-optimal assignment policies through Reinforcement Learning (RL) without additional modeling effort. To evaluate the framework, we define a taxonomy of archetypical assignment problems. We show for three cases that A-E PN can be used to learn close-to-optimal assignment policies. Our results suggest that A-E PN can be used to model and solve a broad range of dynamic task assignment problems.

6.Adversarial Ink: Componentwise Backward Error Attacks on Deep Learning

Authors:Lucas Beerens, Desmond J. Higham

Abstract: Deep neural networks are capable of state-of-the-art performance in many classification tasks. However, they are known to be vulnerable to adversarial attacks -- small perturbations to the input that lead to a change in classification. We address this issue from the perspective of backward error and condition number, concepts that have proved useful in numerical analysis. To do this, we build on the work of Beuzeville et al. (2021). In particular, we develop a new class of attack algorithms that use componentwise relative perturbations. Such attacks are highly relevant in the case of handwritten documents or printed texts where, for example, the classification of signatures, postcodes, dates or numerical quantities may be altered by changing only the ink consistency and not the background. This makes the perturbed images look natural to the naked eye. Such ``adversarial ink'' attacks therefore reveal a weakness that can have a serious impact on safety and security. We illustrate the new attacks on real data and contrast them with existing algorithms. We also study the use of a componentwise condition number to quantify vulnerability.

7.The Chai Platform's AI Safety Framework

Authors:Xiaoding Lu, Aleksey Korshuk, Zongyi Liu, William Beauchamp

Abstract: Chai empowers users to create and interact with customized chatbots, offering unique and engaging experiences. Despite the exciting prospects, the work recognizes the inherent challenges of a commitment to modern safety standards. Therefore, this paper presents the integrated AI safety principles into Chai to prioritize user safety, data protection, and ethical technology use. The paper specifically explores the multidimensional domain of AI safety research, demonstrating its application in Chai's conversational chatbot platform. It presents Chai's AI safety principles, informed by well-established AI research centres and adapted for chat AI. This work proposes the following safety framework: Content Safeguarding; Stability and Robustness; and Operational Transparency and Traceability. The subsequent implementation of these principles is outlined, followed by an experimental analysis of Chai's AI safety framework's real-world impact. We emphasise the significance of conscientious application of AI safety principles and robust safety measures. The successful implementation of the safe AI framework in Chai indicates the practicality of mitigating potential risks for responsible and ethical use of AI technologies. The ultimate vision is a transformative AI tool fostering progress and innovation while prioritizing user safety and ethical standards.

8.Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination

Authors:Yang Li, Shao Zhang, Jichen Sun, Wenhao Zhang, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

Abstract: Achieving coordination between humans and artificial intelligence in scenarios involving previously unencountered humans remains a substantial obstacle within Zero-Shot Human-AI Coordination, which aims to develop AI agents capable of efficiently working alongside previously unknown human teammates. Traditional algorithms have aimed to collaborate with humans by optimizing fixed objectives within a population, fostering diversity in strategies and behaviors. However, these techniques may lead to learning loss and an inability to cooperate with specific strategies within the population, a phenomenon named cooperative incompatibility. To mitigate this issue, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We put forth a practical algorithm incorporating insights from game theory and graph theory, e.g., Shapley Value and Centrality. We also show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis. Subsequently, we created an online Overcooked human-AI experiment platform, the COLE platform, which enables easy customization of questionnaires, model weights, and other aspects. Utilizing the COLE platform, we enlist 130 participants for human experiments. Our findings reveal a preference for our approach over state-of-the-art methods using a variety of subjective metrics. Moreover, objective experimental outcomes in the Overcooked game environment indicate that our method surpasses existing ones when coordinating with previously unencountered AI agents and the human proxy model. Our code and demo are publicly available at

9.From Robustness to Explainability and Back Again

Authors:Xuanxiang Huang, Joao Marques-Silva

Abstract: In contrast with ad-hoc methods for eXplainable Artificial Intelligence (XAI), formal explainability offers important guarantees of rigor. However, formal explainability is hindered by poor scalability for some families of classifiers, the most significant being neural networks. As a result, there are concerns as to whether formal explainability might serve to complement other approaches in delivering trustworthy AI. This paper addresses the limitation of scalability of formal explainability, and proposes novel algorithms for computing formal explanations. The novel algorithm computes explanations by answering instead a number of robustness queries, and such that the number of such queries is at most linear on the number of features. Consequently, the proposed algorithm establishes a direct relationship between the practical complexity of formal explainability and that of robustness. More importantly, the paper generalizes the definition of formal explanation, thereby allowing the use of robustness tools that are based on different distance norms, and also by reasoning in terms of some target degree of robustness. The experiments validate the practical efficiency of the proposed approach.

10.Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

Authors:Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, Vikash K. Mansinghka

Abstract: Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (, for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.

11.InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Authors:Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou

Abstract: Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at

12.DeepGraphDMD: Interpretable Spatio-Temporal Decomposition of Non-linear Functional Brain Network Dynamics

Authors:Md Asadullah Turja, Martin Styner, Guorong Wu

Abstract: Functional brain dynamics is supported by parallel and overlapping functional network modes that are associated with specific neural circuits. Decomposing these network modes from fMRI data and finding their temporal characteristics is challenging due to their time-varying nature and the non-linearity of the functional dynamics. Dynamic Mode Decomposition (DMD) algorithms have been quite popular for solving this decomposition problem in recent years. In this work, we apply GraphDMD -- an extension of the DMD for network data -- to extract the dynamic network modes and their temporal characteristics from the fMRI time series in an interpretable manner. GraphDMD, however, regards the underlying system as a linear dynamical system that is sub-optimal for extracting the network modes from non-linear functional data. In this work, we develop a generalized version of the GraphDMD algorithm -- DeepGraphDMD -- applicable to arbitrary non-linear graph dynamical systems. DeepGraphDMD is an autoencoder-based deep learning model that learns Koopman eigenfunctions for graph data and embeds the non-linear graph dynamics into a latent linear space. We show the effectiveness of our method in both simulated data and the HCP resting-state fMRI data. In the HCP data, DeepGraphDMD provides novel insights into cognitive brain functions by discovering two major network modes related to fluid and crystallized intelligence.

1.Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning

Authors:Shaoming Peng

Abstract: Multi-robot path finding in dynamic environments is a highly challenging classic problem. In the movement process, robots need to avoid collisions with other moving robots while minimizing their travel distance. Previous methods for this problem either continuously replan paths using heuristic search methods to avoid conflicts or choose appropriate collision avoidance strategies based on learning approaches. The former may result in long travel distances due to frequent replanning, while the latter may have low learning efficiency due to low sample exploration and utilization, and causing high training costs for the model. To address these issues, we propose a path planning method, MAPPOHR, which combines heuristic search, empirical rules, and multi-agent reinforcement learning. The method consists of two layers: a real-time planner based on the multi-agent reinforcement learning algorithm, MAPPO, which embeds empirical rules in the action output layer and reward functions, and a heuristic search planner used to create a global guiding path. During movement, the heuristic search planner replans new paths based on the instructions of the real-time planner. We tested our method in 10 different conflict scenarios. The experiments show that the planning performance of MAPPOHR is better than that of existing learning and heuristic methods. Due to the utilization of empirical knowledge and heuristic search, the learning efficiency of MAPPOHR is higher than that of existing learning methods.

2.Egocentric Planning for Scalable Embodied Task Achievement

Authors:Xiaotian Liu, Hector Palacios, Christian Muise

Abstract: Embodied agents face significant challenges when tasked with performing actions in diverse environments, particularly in generalizing across object types and executing suitable actions to accomplish tasks. Furthermore, agents should exhibit robustness, minimizing the execution of illegal actions. In this work, we present Egocentric Planning, an innovative approach that combines symbolic planning and Object-oriented POMDPs to solve tasks in complex environments, harnessing existing models for visual perception and natural language processing. We evaluated our approach in ALFRED, a simulated environment designed for domestic tasks, and demonstrated its high scalability, achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires reliable perception and the specification or learning of a symbolic description of the preconditions and effects of the agent's actions, as well as what object types reveal information about others. It is capable of naturally scaling to solve new tasks beyond ALFRED, as long as they can be solved using the available skills. This work offers a solid baseline for studying end-to-end and hybrid methods that aim to generalize to new tasks, including recent approaches relying on LLMs, but often struggle to scale to long sequences of actions or produce robust plans for novel tasks.

3.Energy-Efficient UAV-Assisted IoT Data Collection via TSP-Based Solution Space Reduction

Authors:Sivaram Krishnan, Mahyar Nemati, Seng W. Loke, Jihong Park, Jinho Choi

Abstract: This paper presents a wireless data collection framework that employs an unmanned aerial vehicle (UAV) to efficiently gather data from distributed IoT sensors deployed in a large area. Our approach takes into account the non-zero communication ranges of the sensors to optimize the flight path of the UAV, resulting in a variation of the Traveling Salesman Problem (TSP). We prove mathematically that the optimal waypoints for this TSP-variant problem are restricted to the boundaries of the sensor communication ranges, greatly reducing the solution space. Building on this finding, we develop a low-complexity UAV-assisted sensor data collection algorithm, and demonstrate its effectiveness in a selected use case where we minimize the total energy consumption of the UAV and sensors by jointly optimizing the UAV's travel distance and the sensors' communication ranges.

4.Knowledge Graph Reasoning over Entities and Numerical Values

Authors:Jiaxin Bai, Chen Luo, Zheng Li, Qingyu Yin, Bing Yin, Yangqiu Song

Abstract: A complex logic query in a knowledge graph refers to a query expressed in logic form that conveys a complex meaning, such as where did the Canadian Turing award winner graduate from? Knowledge graph reasoning-based applications, such as dialogue systems and interactive search engines, rely on the ability to answer complex logic queries as a fundamental task. In most knowledge graphs, edges are typically used to either describe the relationships between entities or their associated attribute values. An attribute value can be in categorical or numerical format, such as dates, years, sizes, etc. However, existing complex query answering (CQA) methods simply treat numerical values in the same way as they treat entities. This can lead to difficulties in answering certain queries, such as which Australian Pulitzer award winner is born before 1927, and which drug is a pain reliever and has fewer side effects than Paracetamol. In this work, inspired by the recent advances in numerical encoding and knowledge graph reasoning, we propose numerical complex query answering. In this task, we introduce new numerical variables and operations to describe queries involving numerical attribute values. To address the difference between entities and numerical values, we also propose the framework of Number Reasoning Network (NRN) for alternatively encoding entities and numerical values into separate encoding structures. During the numerical encoding process, NRN employs a parameterized density function to encode the distribution of numerical values. During the entity encoding process, NRN uses established query encoding methods for the original CQA problem. Experimental results show that NRN consistently improves various query encoding methods on three different knowledge graphs and achieves state-of-the-art results.

5.An Architecture for Deploying Reinforcement Learning in Industrial Environments

Authors:Georg Schäfer, Reuf Kozlica, Stefan Wegenkittl, Stefan Huber

Abstract: Industry 4.0 is driven by demands like shorter time-to-market, mass customization of products, and batch size one production. Reinforcement Learning (RL), a machine learning paradigm shown to possess a great potential in improving and surpassing human level performance in numerous complex tasks, allows coping with the mentioned demands. In this paper, we present an OPC UA based Operational Technology (OT)-aware RL architecture, which extends the standard RL setting, combining it with the setting of digital twins. Moreover, we define an OPC UA information model allowing for a generalized plug-and-play like approach for exchanging the RL agent used. In conclusion, we demonstrate and evaluate the architecture, by creating a proof of concept. By means of solving a toy example, we show that this architecture can be used to determine the optimal policy using a real control system.

6.A Modular Test Bed for Reinforcement Learning Incorporation into Industrial Applications

Authors:Reuf Kozlica, Georg Schäfer, Simon Hirländer, Stefan Wegenkittl

Abstract: This application paper explores the potential of using reinforcement learning (RL) to address the demands of Industry 4.0, including shorter time-to-market, mass customization, and batch size one production. Specifically, we present a use case in which the task is to transport and assemble goods through a model factory following predefined rules. Each simulation run involves placing a specific number of goods of random color at the entry point. The objective is to transport the goods to the assembly station, where two rivets are installed in each product, connecting the upper part to the lower part. Following the installation of rivets, blue products must be transported to the exit, while green products are to be transported to storage. The study focuses on the application of reinforcement learning techniques to address this problem and improve the efficiency of the production process.

7.Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task

Authors:Reuf Kozlica, Stefan Wegenkittl, Simon Hirländer

Abstract: This paper presents a comparison between two well-known deep Reinforcement Learning (RL) algorithms: Deep Q-Learning (DQN) and Proximal Policy Optimization (PPO) in a simulated production system. We utilize a Petri Net (PN)-based simulation environment, which was previously proposed in related work. The performance of the two algorithms is compared based on several evaluation metrics, including average percentage of correctly assembled and sorted products, average episode length, and percentage of successful episodes. The results show that PPO outperforms DQN in terms of all evaluation metrics. The study highlights the advantages of policy-based algorithms in problems with high-dimensional state and action spaces. The study contributes to the field of deep RL in context of production systems by providing insights into the effectiveness of different algorithms and their suitability for different tasks.

8.Accelerating science with human-aware artificial intelligence

Authors:Jamshid Sourati, James Evans

Abstract: Artificial intelligence (AI) models trained on published scientific findings have been used to invent valuable materials and targeted therapies, but they typically ignore the human scientists who continually alter the landscape of discovery. Here we show that incorporating the distribution of human expertise by training unsupervised models on simulated inferences cognitively accessible to experts dramatically improves (up to 400%) AI prediction of future discoveries beyond those focused on research content alone, especially when relevant literature is sparse. These models succeed by predicting human predictions and the scientists who will make them. By tuning human-aware AI to avoid the crowd, we can generate scientifically promising "alien" hypotheses unlikely to be imagined or pursued without intervention until the distant future, which hold promise to punctuate scientific advance beyond questions currently pursued. Accelerating human discovery or probing its blind spots, human-aware AI enables us to move toward and beyond the contemporary scientific frontier.

9.OMNI: Open-endedness via Models of human Notions of Interestingness

Authors:Jenny Zhang, Joel Lehman, Kenneth Stanley, Jeff Clune

Abstract: Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize large (language) models (LMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that LM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

1.The Survey, Taxonomy, and Future Directions of Trustworthy AI: A Meta Decision of Strategic Decisions

Authors:Caesar Wu, Yuan-Fang Lib, Pascal Bouvry

Abstract: When making strategic decisions, we are often confronted with overwhelming information to process. The situation can be further complicated when some pieces of evidence are contradicted each other or paradoxical. The challenge then becomes how to determine which information is useful and which ones should be eliminated. This process is known as meta-decision. Likewise, when it comes to using Artificial Intelligence (AI) systems for strategic decision-making, placing trust in the AI itself becomes a meta-decision, given that many AI systems are viewed as opaque "black boxes" that process large amounts of data. Trusting an opaque system involves deciding on the level of Trustworthy AI (TAI). We propose a new approach to address this issue by introducing a novel taxonomy or framework of TAI, which encompasses three crucial domains: articulate, authentic, and basic for different levels of trust. To underpin these domains, we create ten dimensions to measure trust: explainability/transparency, fairness/diversity, generalizability, privacy, data governance, safety/robustness, accountability, reproducibility, reliability, and sustainability. We aim to use this taxonomy to conduct a comprehensive survey and explore different TAI approaches from a strategic decision-making perspective.

2.Parallel Neurosymbolic Integration with Concordia

Authors:Jonathan Feldstein, Modestas Jurčius, Efthymia Tsamoura

Abstract: Parallel neurosymbolic architectures have been applied effectively in NLP by distilling knowledge from a logic theory into a deep model.However, prior art faces several limitations including supporting restricted forms of logic theories and relying on the assumption of independence between the logic and the deep network. We present Concordia, a framework overcoming the limitations of prior art. Concordia is agnostic both to the deep network and the logic theory offering support for a wide range of probabilistic theories. Our framework can support supervised training of both components and unsupervised training of the neural component. Concordia has been successfully applied to tasks beyond NLP and data classification, improving the accuracy of state-of-the-art on collective activity detection, entity linking and recommendation tasks.

3.From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent Confounders

Authors:Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Gal Novik

Abstract: We present a constraint-based algorithm for learning causal structures from observational time-series data, in the presence of latent confounders. We assume a discrete-time, stationary structural vector autoregressive process, with both temporal and contemporaneous causal relations. One may ask if temporal and contemporaneous relations should be treated differently. The presented algorithm gradually refines a causal graph by learning long-term temporal relations before short-term ones, where contemporaneous relations are learned last. This ordering of causal relations to be learnt leads to a reduction in the required number of statistical tests. We validate this reduction empirically and demonstrate that it leads to higher accuracy for synthetic data and more plausible causal graphs for real-world data compared to state-of-the-art algorithms.

4.Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork

Authors:Hasra Dodampegama, Mohan Sridharan

Abstract: Ad hoc teamwork refers to the problem of enabling an agent to collaborate with teammates without prior coordination. Data-driven methods represent the state of the art in ad hoc teamwork. They use a large labeled dataset of prior observations to model the behavior of other agent types and to determine the ad hoc agent's behavior. These methods are computationally expensive, lack transparency, and make it difficult to adapt to previously unseen changes, e.g., in team composition. Our recent work introduced an architecture that determined an ad hoc agent's behavior based on non-monotonic logical reasoning with prior commonsense domain knowledge and predictive models of other agents' behavior that were learned from limited examples. In this paper, we substantially expand the architecture's capabilities to support: (a) online selection, adaptation, and learning of the models that predict the other agents' behavior; and (b) collaboration with teammates in the presence of partial observability and limited communication. We illustrate and experimentally evaluate the capabilities of our architecture in two simulated multiagent benchmark domains for ad hoc teamwork: Fort Attack and Half Field Offense. We show that the performance of our architecture is comparable or better than state of the art data-driven baselines in both simple and complex scenarios, particularly in the presence of limited training data, partial observability, and changes in team composition.

5.Cross Modal Data Discovery over Structured and Unstructured Data Lakes

Authors:Mohamed Y. Eltabakh, Mayuresh Kunjir, Ahmed Elmagarmid, Mohammad Shahmeer Ahmad

Abstract: Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical task very challenging. Despite the recent efforts in data discovery, the problem remains widely open especially in the two fronts of (1) discovering relationships and relatedness across structured and unstructured datasets where existing techniques suffer from either scalability, being customized for a specific problem type (e.g., entity matching or data integration), or demolishing the structural properties on its way, and (2) developing a holistic system for integrating various similarity measurements and sketches in an effective way to boost the discovery accuracy. In this paper, we propose a new data discovery system, named CMDL, for addressing these two limitations. CMDL supports the data discovery process over both structured and unstructured data while retaining the structural properties of tables.

6.STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

Authors:Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila McIlraith

Abstract: Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL-E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow a wide range of short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 sets a new bar for open-ended instruction following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.

7.chemSKI with tokens: world building and economy in the SKI universe

Authors:Marius Buliga

Abstract: chemSKI with tokens is a confluent graph rewrite system where all rewrites are local, which moreover can be used to do SKI calculus reductions. The graph rewrites of chemSKI are made conservative by the use of tokens. We thus achieve several goals: conservative rewrites in a chemical style, a solution to the problem of new edge names in a distributed, decentralized graphical reduction and a new estimation of the cost of a combinatory calculus computation. This formalism can be used either as an artificial chemistry or as a model of a virtual decentralized machine which performs only local reductions. A programs repository and the same article with simulations are available at github at

1.Medication Recommendation via Domain Knowledge Informed Deep Learning

Authors:Sicen Liu, Xiaolong Wang, Xianbing Zhao, Hao Chen

Abstract: Medication recommendation is a fundamental yet crucial branch of healthcare, which provides opportunities to support clinical physicians with more accurate medication prescriptions for patients with complex health conditions. Learning from electronic health records (EHR) to recommend medications is the most common way in previous studies. However, most of them neglect incorporating domain knowledge according to the clinical manifestations in the EHR of the patient. To address these issues, we propose a novel \textbf{D}omain \textbf{K}nowledge \textbf{I}nformed \textbf{Net}work (DKINet) to integrate domain knowledge with observable clinical manifestations of the patient, which is the first dynamic domain knowledge informed framework toward medication recommendation. In particular, we first design a knowledge-driven encoder to capture the domain information and then develop a data-driven encoder to integrate domain knowledge into the observable EHR. To endow the model with the capability of temporal decision, we design an explicit medication encoder for learning the longitudinal dependence of the patient. Extensive experiments on three publicly available datasets verify the superiority of our method. The code will be public upon acceptance.

2.Knowledge Base Question Answering for Space Debris Queries

Authors:Paul Darm, Antonio Valerio Miceli-Barone, Shay B. Cohen, Annalisa Riccardi

Abstract: Space agencies execute complex satellite operations that need to be supported by the technical knowledge contained in their extensive information systems. Knowledge bases (KB) are an effective way of storing and accessing such information at scale. In this work we present a system, developed for the European Space Agency (ESA), that can answer complex natural language queries, to support engineers in accessing the information contained in a KB that models the orbital space debris environment. Our system is based on a pipeline which first generates a sequence of basic database operations, called a %program sketch, from a natural language question, then specializes the sketch into a concrete query program with mentions of entities, attributes and relations, and finally executes the program against the database. This pipeline decomposition approach enables us to train the system by leveraging out-of-domain data and semi-synthetic data generated by GPT-3, thus reducing overfitting and shortcut learning even with limited amount of in-domain training data. Our code can be found at \url{}.

3.Human Control: Definitions and Algorithms

Authors:Ryan Carey, Tom Everitt

Abstract: How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.

4.Human or Not? A Gamified Approach to the Turing Test

Authors:Daniel Jannai, Amos Meron, Barak Lenz, Yoav Levine, Yoav Shoham

Abstract: We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). This white paper details the development, deployment, and results of this unique experiment. While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI.

1.Temporally Layered Architecture for Efficient Continuous Control

Authors:Devdhar Patel, Terrence Sejnowski, Hava Siegelmann

Abstract: We present a temporally layered architecture (TLA) for temporally adaptive control with minimal energy expenditure. The TLA layers a fast and a slow policy together to achieve temporal abstraction that allows each layer to focus on a different time scale. Our design draws on the energy-saving mechanism of the human brain, which executes actions at different timescales depending on the environment's demands. We demonstrate that beyond energy saving, TLA provides many additional advantages, including persistent exploration, fewer required decisions, reduced jerk, and increased action repetition. We evaluate our method on a suite of continuous control tasks and demonstrate the significant advantages of TLA over existing methods when measured over multiple important metrics. We also introduce a multi-objective score to qualitatively assess continuous control policies and demonstrate a significantly better score for TLA. Our training algorithm uses minimal communication between the slow and fast layers to train both policies simultaneously, making it viable for future applications in distributed control.

2.Large-scale Ridesharing DARP Instances Based on Real Travel Demand

Authors:David Fiedler, Jan Mrkos

Abstract: Accurately predicting the real-life performance of algorithms solving the Dial-a-Ride Problem (DARP) in the context of Mobility on Demand (MoD) systems with ridesharing requires evaluating them on representative instances. However, the benchmarking of state-of-the-art DARP solution methods has been limited to small, artificial instances or outdated non-public instances, hindering direct comparisons. With the rise of large MoD systems and the availability of open travel demand datasets for many US cities, there is now an opportunity to evaluate these algorithms on standardized, realistic, and representative instances. Despite the significant challenges involved in processing obfuscated and diverse datasets, we have developed a methodology using which we have created a comprehensive set of large-scale demand instances based on real-world data. These instances cover diverse use cases, one of which is demonstrated in an evaluation of two established DARP methods: the insertion heuristic and optimal vehicle-group assignment method. We publish the full results of both methods in a standardized format. The results show significant differences between areas in all measured quantities, emphasizing the importance of evaluating methods across different cities.

3.Bottom-Up Grounding in the Probabilistic Logic Programming System Fusemate

Authors:Peter Baumgartner, Elena Tartaglia

Abstract: This paper introduces the Fusemate probabilistic logic programming system. Fusemate's inference engine comprises a grounding component and a variable elimination method for probabilistic inference. Fusemate differs from most other systems by grounding the program in a bottom-up way instead of the common top-down way. While bottom-up grounding is attractive for a number of reasons, e.g., for dynamically creating distributions of varying support sizes, it makes it harder to control the amount of ground clauses generated. We address this problem by interleaving grounding (along program stratification) with a query-guided relevance test. This test prunes ground rules whose heads are inconsistent with the query dynamically extended by the ground rules so far. We present our method in detail and demonstrate it with examples that involve ``time'', such as (hidden) Markov models. Our experiments demonstrate competitive or better performance compared to a state-of-the probabilistic logic programming system, in particular for high branching problems.

4.IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics

Authors:Jia-Qi Yang, Yu-Cheng Xu, Jia-Lei Shen, Ke-Bin Fan, De-Chuan Zhan, Yang Yang

Abstract: Aiding humans with scientific designs is one of the most exciting of artificial intelligence (AI) and machine learning (ML), due to their potential for the discovery of new drugs, design of new materials and chemical compounds, etc. However, scientific design typically requires complex domain knowledge that is not familiar to AI researchers. Further, scientific studies involve professional skills to perform experiments and evaluations. These obstacles prevent AI researchers from developing specialized methods for scientific designs. To take a step towards easy-to-understand and reproducible research of scientific design, we propose a benchmark for the inverse design of nanophotonic devices, which can be verified computationally and accurately. Specifically, we implemented three different nanophotonic design problems, namely a radiative cooler, a selective emitter for thermophotovoltaics, and structural color filters, all of which are different in design parameter spaces, complexity, and design targets. The benchmark environments are implemented with an open-source simulator. We further implemented 10 different inverse design algorithms and compared them in a reproducible and fair framework. The results revealed the strengths and weaknesses of existing methods, which shed light on several future directions for developing more efficient inverse design algorithms. Our benchmark can also serve as the starting point for more challenging scientific design problems. The code of IDToolkit is available at

5.DHRL-FNMR: An Intelligent Multicast Routing Approach Based on Deep Hierarchical Reinforcement Learning in SDN

Authors:Miao Ye, Chenwei Zhao, Xingsi Xue, Jinqiang Li, Hongwen Hu, Yejin Yang, Qiuxiang Jiang

Abstract: The optimal multicast tree problem in the Software-Defined Networking (SDN) multicast routing is an NP-hard combinatorial optimization problem. Although existing SDN intelligent solution methods, which are based on deep reinforcement learning, can dynamically adapt to complex network link state changes, these methods are plagued by problems such as redundant branches, large action space, and slow agent convergence. In this paper, an SDN intelligent multicast routing algorithm based on deep hierarchical reinforcement learning is proposed to circumvent the aforementioned problems. First, the multicast tree construction problem is decomposed into two sub-problems: the fork node selection problem and the construction of the optimal path from the fork node to the destination node. Second, based on the information characteristics of SDN global network perception, the multicast tree state matrix, link bandwidth matrix, link delay matrix, link packet loss rate matrix, and sub-goal matrix are designed as the state space of intrinsic and meta controllers. Then, in order to mitigate the excessive action space, our approach constructs different action spaces at the upper and lower levels. The meta-controller generates an action space using network nodes to select the fork node, and the intrinsic controller uses the adjacent edges of the current node as its action space, thus implementing four different action selection strategies in the construction of the multicast tree. To facilitate the intelligent agent in constructing the optimal multicast tree with greater speed, we developed alternative reward strategies that distinguish between single-step node actions and multi-step actions towards multiple destination nodes.

6.Strategic Reasoning with Language Models

Authors:Kanishk Gandhi, Dorsa Sadigh, Noah D. Goodman

Abstract: Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations. Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining. Large Language Models (LLMs), with their ability to comprehend and generate complex, context-rich language, could prove powerful as tools for strategic gameplay. This paper introduces an approach that uses pretrained LLMs with few-shot chain-of-thought examples to enable strategic reasoning for AI agents. Our approach uses systematically generated demonstrations of reasoning about states, values, and beliefs to prompt the model. Using extensive variations of simple matrix games, we show that strategies that are derived based on systematically generated prompts generalize almost perfectly to new game structures, alternate objectives, and hidden information. Additionally, we demonstrate our approach can lead to human-like negotiation strategies in realistic scenarios without any extra training or fine-tuning. Our results highlight the ability of LLMs, guided by systematic reasoning demonstrations, to adapt and excel in diverse strategic scenarios.

7.Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

Authors:Catalin Mitelut, Ben Smith, Peter Vamplew

Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

1.Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning

Authors:Hiroshi Nakahara, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

Abstract: Analysis of invasive sports such as soccer is challenging because the game situation changes continuously in time and space, and multiple agents individually recognize the game situation and make decisions. Previous studies using deep reinforcement learning have often considered teams as a single agent and valued the teams and players who hold the ball in each discrete event. Then it was challenging to value the actions of multiple players, including players far from the ball, in a spatiotemporally continuous state space. In this paper, we propose a method of valuing possible actions for on- and off-ball soccer players in a single holistic framework based on multi-agent deep reinforcement learning. We consider a discrete action space in a continuous state space that mimics that of Google research football and leverages supervised learning for actions in reinforcement learning. In the experiment, we analyzed the relationships with conventional indicators, season goals, and game ratings by experts, and showed the effectiveness of the proposed method. Our approach can assess how multiple players move continuously throughout the game, which is difficult to be discretized or labeled but vital for teamwork, scouting, and fan engagement.

2.On the Correspondence Between Monotonic Max-Sum GNNs and Datalog

Authors:David Tena Cucala, Bernardo Cuenca Grau, Boris Motik, Egor V. Kostylev

Abstract: Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs.

3.Towards a Unifying Model of Rationality in Multiagent Systems

Authors:Robert Loftin, Mustafa Mert Çelikok, Frans A. Oliehoek

Abstract: Multiagent systems deployed in the real world need to cooperate with other agents (including humans) nearly as effectively as these agents cooperate with one another. To design such AI, and provide guarantees of its effectiveness, we need to clearly specify what types of agents our AI must be able to cooperate with. In this work we propose a generic model of socially intelligent agents, which are individually rational learners that are also able to cooperate with one another (in the sense that their joint behavior is Pareto efficient). We define rationality in terms of the regret incurred by each agent over its lifetime, and show how we can construct socially intelligent agents for different forms of regret. We then discuss the implications of this model for the development of "robust" MAS that can cooperate with a wide variety of socially intelligent agents.

4.Reason to explain: Interactive contrastive explanations (REASONX)

Authors:Laura State, Salvatore Ruggieri, Franco Turini

Abstract: Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of interactivity. While (dialogue-like) interactivity is important to better communicate an explanation, background knowledge has the potential to significantly improve their quality, e.g., by adapting the explanation to the needs of the end-user. To close this gap, we present REASONX, an explanation tool based on Constraint Logic Programming (CLP). REASONX provides interactive contrastive explanations that can be augmented by background knowledge, and allows to operate under a setting of under-specified information, leading to increased flexibility in the provided explanations. REASONX computes factual and constrative decision rules, as well as closest constrative examples. It provides explanations for decision trees, which can be the ML models under analysis, or global/local surrogate models of any ML model. While the core part of REASONX is built on CLP, we also provide a program layer that allows to compute the explanations via Python, making the tool accessible to a wider audience. We illustrate the capability of REASONX on a synthetic data set, and on a a well-developed example in the credit domain. In both cases, we can show how REASONX can be flexibly used and tailored to the needs of the user.

5.An Emergency Disposal Decision-making Method with Human--Machine Collaboration

Authors:Yibo Guo, Jingyi Xue, Yingkang Zhang, Mingliang Xu

Abstract: Rapid developments in artificial intelligence technology have led to unmanned systems replacing human beings in many fields requiring high-precision predictions and decisions. In modern operational environments, all job plans are affected by emergency events such as equipment failures and resource shortages, making a quick resolution critical. The use of unmanned systems to assist decision-making can improve resolution efficiency, but their decision-making is not interpretable and may make the wrong decisions. Current unmanned systems require human supervision and control. Based on this, we propose a collaborative human--machine method for resolving unplanned events using two phases: task filtering and task scheduling. In the task filtering phase, we propose a human--machine collaborative decision-making algorithm for dynamic tasks. The GACRNN model is used to predict the state of the job nodes, locate the key nodes, and generate a machine-predicted resolution task list. A human decision-maker supervises the list in real time and modifies and confirms the machine-predicted list through the human--machine interface. In the task scheduling phase, we propose a scheduling algorithm that integrates human experience constraints. The steps to resolve an event are inserted into the normal job sequence to schedule the resolution. We propose several human--machine collaboration methods in each phase to generate steps to resolve an unplanned event while minimizing the impact on the original job plan.

6.Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

Authors:Yiran Mao, Madeline G. Reinecke, Markus Kunesch, Edgar A. Duéñez-Guzmán, Ramona Comanescu, Julia Haas, Joel Z. Leibo

Abstract: Is it possible to evaluate the moral cognition of complex artificial agents? In this work, we take a look at one aspect of morality: `doing the right thing for the right reasons.' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans to facilitate like-for-like comparison. Morally-motivated behavior should persist despite mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper insight into underlying motivations. We apply this evaluation to a particular set of deep reinforcement learning agents, trained by memory-based meta-reinforcement learning. Our results indicate that agents trained with a reward function that includes other-regarding preferences perform helping behavior in a way that is less sensitive to increasing cost than agents trained with more self-interested preferences.

7.Controllable Path of Destruction

Authors:Matthew Siper, Sam Earle, Zehua Jiang, Ahmed Khalifa, Julian Togelius

Abstract: Path of Destruction (PoD) is a self-supervised method for learning iterative generators. The core idea is to produce a training set by destroying a set of artifacts, and for each destructive step create a training instance based on the corresponding repair action. A generator trained on this dataset can then generate new artifacts by ``repairing'' from arbitrary states. The PoD method is very data-efficient in terms of original training examples and well-suited to functional artifacts composed of categorical data, such as game levels and discrete 3D structures. In this paper, we extend the Path of Destruction method to allow designer control over aspects of the generated artifacts. Controllability is introduced by adding conditional inputs to the state-action pairs that make up the repair trajectories. We test the controllable PoD method in a 2D dungeon setting, as well as in the domain of small 3D Lego cars.

1.Physical Deep Reinforcement Learning: Safety and Unknown Unknowns

Authors:Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: In this paper, we propose the Phy-DRL: a physics-model-regulated deep reinforcement learning framework for safety-critical autonomous systems. The Phy-DRL is unique in three innovations: i) proactive unknown-unknowns training, ii) conjunctive residual control (i.e., integration of data-driven control and physics-model-based control) and safety- \& stability-sensitive reward, and iii) physics-model-based neural network editing, including link editing and activation editing. Thanks to the concurrent designs, the Phy-DRL is able to 1) tolerate unknown-unknowns disturbances, 2) guarantee mathematically provable safety and stability, and 3) strictly comply with physical knowledge pertaining to Bellman equation and reward. The effectiveness of the Phy-DRL is finally validated by an inverted pendulum and a quadruped robot. The experimental results demonstrate that compared with purely data-driven DRL, Phy-DRL features remarkably fewer learning parameters, accelerated training and enlarged reward, while offering enhanced model robustness and safety assurance.

2.A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents

Authors:Sukai Huang, Nir Lipovetzky, Trevor Cohn

Abstract: Teaching agents to follow complex written instructions has been an important yet elusive goal. One technique for improving learning efficiency is language reward shaping (LRS), which is used in reinforcement learning (RL) to reward actions that represent progress towards a sparse reward. We argue that the apparent success of LRS is brittle, and prior positive findings can be attributed to weak RL baselines. Specifically, we identified suboptimal LRS designs that reward partially matched trajectories, and we characterised a novel type of reward perturbation that addresses this issue based on the concept of loosening task constraints. We provided theoretical and empirical evidence that agents trained using LRS rewards converge more slowly compared to pure RL agents.

1.TransWorldNG: Traffic Simulation via Foundation Model

Authors:Ding Wang, Xuhong Wang, Liang Chen, Shengyue Yao, Ming Jing, Honghai Li, Li Li, Shiqiang Bao, Fei-Yue Wang, Yilun Lin

Abstract: Traffic simulation is a crucial tool for transportation decision-making and policy development. However, achieving realistic simulations in the face of the high dimensionality and heterogeneity of traffic environments is a longstanding challenge. In this paper, we present TransWordNG, a traffic simulator that uses Data-driven algorithms and Graph Computing techniques to learn traffic dynamics from real data. The functionality and structure of TransWorldNG are introduced, which utilize a foundation model for transportation management and control. The results demonstrate that TransWorldNG can generate more realistic traffic patterns compared to traditional simulators. Additionally, TransWorldNG exhibits better scalability, as it shows linear growth in computation time as the scenario scale increases. To the best of our knowledge, this is the first traffic simulator that can automatically learn traffic patterns from real-world data and efficiently generate accurate and realistic traffic environments.

2.On the Planning Abilities of Large Language Models -- A Critical Investigation

Authors:Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati

Abstract: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs as a source of heuristic guidance for other agents (AI planners) in their planning tasks. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the heuristic mode show more promise. In the heuristic mode, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation.

3.Learning Assumption-based Argumentation Frameworks

Authors:Maurizio Proietti, Francesca Toni

Abstract: We propose a novel approach to logic-based learning which generates assumption-based argumentation (ABA) frameworks from positive and negative examples, using a given background knowledge. These ABA frameworks can be mapped onto logic programs with negation as failure that may be non-stratified. Whereas existing argumentation-based methods learn exceptions to general rules by interpreting the exceptions as rebuttal attacks, our approach interprets them as undercutting attacks. Our learning technique is based on the use of transformation rules, including some adapted from logic program transformation rules (notably folding) as well as others, such as rote learning and assumption introduction. We present a general strategy that applies the transformation rules in a suitable order to learn stratified frameworks, and we also propose a variant that handles the non-stratified case. We illustrate the benefits of our approach with a number of examples, which show that, on one hand, we are able to easily reconstruct other logic-based learning approaches and, on the other hand, we can work out in a very simple and natural way problems that seem to be hard for existing techniques.

4.A Diagnosis Algorithms for a Rotary Indexing Machine

Authors:Maria Krantz, Oliver Niggemann

Abstract: Rotary Indexing Machines (RIMs) are widely used in manufacturing due to their ability to perform multiple production steps on a single product without manual repositioning, reducing production time and improving accuracy and consistency. Despite their advantages, little research has been done on diagnosing faults in RIMs, especially from the perspective of the actual production steps carried out on these machines. Long downtimes due to failures are problematic, especially for smaller companies employing these machines. To address this gap, we propose a diagnosis algorithm based on the product perspective, which focuses on the product being processed by RIMs. The algorithm traces the steps that a product takes through the machine and is able to diagnose possible causes in case of failure. We also analyze the properties of RIMs and how these influence the diagnosis of faults in these machines. Our contributions are three-fold. Firstly, we provide an analysis of the properties of RIMs and how they influence the diagnosis of faults in these machines. Secondly, we suggest a diagnosis algorithm based on the product perspective capable of diagnosing faults in such a machine. Finally, we test this algorithm on a model of a rotary indexing machine, demonstrating its effectiveness in identifying faults and their root causes.

5.Understanding the Capabilities of Large Language Models for Automated Planning

Authors:Vishal Pallagani, Bharath Muppasani, Keerthiram Murugesan, Francesca Rossi, Biplav Srivastava, Lior Horesh, Francesco Fabiano, Andrea Loreggia

Abstract: Automated planning is concerned with developing efficient algorithms to generate plans or sequences of actions to achieve a specific goal in a given environment. Emerging Large Language Models (LLMs) can answer questions, write high-quality programming code, and predict protein folding, showcasing their versatility in solving various tasks beyond language-based problems. In this paper, we aim to explore how LLMs can also be used for automated planning. To do so, we seek to answer four key questions. Firstly, we want to understand the extent to which LLMs can be used for plan generation. Secondly, we aim to identify which pre-training data is most effective in facilitating plan generation. Thirdly, we investigate whether fine-tuning or prompting is a more effective approach for plan generation. Finally, we explore whether LLMs are capable of plan generalization. By answering these questions, the study seeks to shed light on the capabilities of LLMs in solving complex planning problems and provide insights into the most effective approaches for using LLMs in this context.

6.UpMax: User partitioning for MaxSAT

Authors:Pedro Orvalho, Vasco Manquinho, Ruben Martins

Abstract: It has been shown that Maximum Satisfiability (MaxSAT) problem instances can be effectively solved by partitioning the set of soft clauses into several disjoint sets. The partitioning methods can be based on clause weights (e.g., stratification) or based on graph representations of the formula. Afterwards, a merge procedure is applied to guarantee that an optimal solution is found. This paper proposes a new framework called UpMax that decouples the partitioning procedure from the MaxSAT solving algorithms. As a result, new partitioning procedures can be defined independently of the MaxSAT algorithm to be used. Moreover, this decoupling also allows users that build new MaxSAT formulas to propose partition schemes based on knowledge of the problem to be solved. We illustrate this approach using several problems and show that partitioning has a large impact on the performance of unsatisfiability-based MaxSAT algorithms.

7.Voyager: An Open-Ended Embodied Agent with Large Language Models

Authors:Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar

Abstract: We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at

8.Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

Authors:Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai

Abstract: The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20%, highlighting the limitations of Reinforcement Learning (RL) based controllers used in existing methods. To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate on the "ObtainDiamond" task, demonstrating superior robustness compared to traditional RL-based controllers. Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website at

1.Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Authors:Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.

2.ECHo: Event Causality Inference via Human-centric Reasoning

Authors:Yuxi Xie, Guanzhen Li, Min-Yen Kan

Abstract: We introduce ECHo, a diagnostic dataset of event causality inference grounded in visual-and-linguistic social scenarios. ECHo employs real-world human-centric deductive information collected from crime drama, bridging the gap in multimodal reasoning towards higher social intelligence through the elicitation of intermediate Theory-of-Mind (ToM). We propose a unified framework aligned with the Chain-of-Thought (CoT) paradigm to assess the reasoning capability of current AI systems. This ToM-enhanced CoT pipeline can accommodate and integrate various large foundation models in zero-shot visual-and-linguistic understanding. With this framework, we scrutinize the advanced large language and multimodal models via three complementary human-centric ECHo tasks. Further analysis demonstrates ECHo as a challenging dataset to expose imperfections and inconsistencies in reasoning.

3.Anthropomorphization of AI: Opportunities and Risks

Authors:Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, Ashwin Kalyan

Abstract: Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts -- children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and the push from stakeholders to make it human-like through alignment techniques, human voice, and pictorial avatars, the tendency for users to anthropomorphize it increases significantly. We take a dyadic approach to understanding this phenomenon with large language models (LLMs) by studying (1) the objective legal implications, as analyzed through the lens of the recent blueprint of AI bill of rights and the (2) subtle psychological aspects customization and anthropomorphization. We find that anthropomorphized LLMs customized for different user bases violate multiple provisions in the legislative blueprint. In addition, we point out that anthropomorphization of LLMs affects the influence they can have on their users, thus having the potential to fundamentally change the nature of human-AI interaction, with potential for manipulation and negative influence. With LLMs being hyper-personalized for vulnerable groups like children and patients among others, our work is a timely and important contribution. We propose a conservative strategy for the cautious use of anthropomorphization to improve trustworthiness of AI systems.

4.Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Authors:Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati

Abstract: There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources including the source code will be released at:

5.In-Context Impersonation Reveals Large Language Models' Strengths and Biases

Authors:Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata

Abstract: In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.

6.GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Authors:Jiayan Guo, Lun Du, Hengyu Liu

Abstract: Large language models~(LLM) like ChatGPT have become indispensable to artificial general intelligence~(AGI), demonstrating excellent performance in various natural language processing tasks. In the real world, graph data is ubiquitous and an essential part of AGI and prevails in domains like social network analysis, bioinformatics and recommender systems. The training corpus of large language models often includes some algorithmic components, which allows them to achieve certain effects on some graph data-related problems. However, there is still little research on their performance on a broader range of graph-structured data. In this study, we conduct an extensive investigation to assess the proficiency of LLMs in comprehending graph data, employing a diverse range of structural and semantic-related tasks. Our analysis encompasses 10 distinct tasks that evaluate the LLMs' capabilities in graph understanding. Through our study, we not only uncover the current limitations of language models in comprehending graph structures and performing associated reasoning tasks but also emphasize the necessity for further advancements and novel approaches to enhance their graph processing capabilities. Our findings contribute valuable insights towards bridging the gap between language models and graph understanding, paving the way for more effective graph mining and knowledge extraction.

7.Guessing Winning Policies in LTL Synthesis by Semantic Learning

Authors:Jan Kretinsky, Tobias Meggendorfer, Maximilian Prokop, Sabine Rieder

Abstract: We provide a learning-based technique for guessing a winning strategy in a parity game originating from an LTL synthesis problem. A cheaply obtained guess can be useful in several applications. Not only can the guessed strategy be applied as best-effort in cases where the game's huge size prohibits rigorous approaches, but it can also increase the scalability of rigorous LTL synthesis in several ways. Firstly, checking whether a guessed strategy is winning is easier than constructing one. Secondly, even if the guess is wrong in some places, it can be fixed by strategy iteration faster than constructing one from scratch. Thirdly, the guess can be used in on-the-fly approaches to prioritize exploration in the most fruitful directions. In contrast to previous works, we (i)~reflect the highly structured logical information in game's states, the so-called semantic labelling, coming from the recent LTL-to-automata translations, and (ii)~learn to reflect it properly by learning from previously solved games, bringing the solving process closer to human-like reasoning.

8.A Mini Review on the utilization of Reinforcement Learning with OPC UA

Authors:Simon Schindler, Martin Uray, Stefan Huber

Abstract: Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing achieving state-of-the-art results. Targeted to solve sequential decision making problems, it is by design able to learn from experience and therefore adapt to changing dynamic environments. These capabilities make it a prime candidate for controlling and optimizing complex processes in industry. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. The industrial communication standard Open Platform Communications UnifiedArchitecture (OPC UA) could bridge this gap. However, since RL and OPC UA are from different fields,there is a need for researchers to bridge the gap between the two technologies. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review to gain insights on how RL and OPC UA are applied in combination. With this survey, three main research topics have been identified, following the intersection of RL with OPC UA. The results of the literature review show that RL is a promising technology for the control and optimization of industrial processes, but does not yet have the necessary standardized interfaces to be deployed in real-world scenarios with reasonably low effort.

9.Ethics and Deep Learning

Authors:Travis LaCroix, Simon J. D. Prince

Abstract: This article appears as chapter 21 of Prince (2023, Understanding Deep Learning); a complete draft of the textbook is available here: This chapter considers potential harms arising from the design and use of AI systems. These include algorithmic bias, lack of explainability, data privacy violations, militarization, fraud, and environmental concerns. The aim is not to provide advice on being more ethical. Instead, the goal is to express ideas and start conversations in key areas that have received attention in philosophy, political science, and the broader social sciences.

10.Discounting in Strategy Logic

Authors:Munyque Mittelmann, Aniello Murano, Laurent Perrussel

Abstract: Discounting is an important dimension in multi-agent systems as long as we want to reason about strategies and time. It is a key aspect in economics as it captures the intuition that the far-away future is not as important as the near future. Traditional verification techniques allow to check whether there is a winning strategy for a group of agents but they do not take into account the fact that satisfying a goal sooner is different from satisfying it after a long wait. In this paper, we augment Strategy Logic with future discounting over a set of discounted functions D, denoted SLdisc[D]. We consider "until" operators with discounting functions: the satisfaction value of a specification in SLdisc[D] is a value in [0, 1], where the longer it takes to fulfill requirements, the smaller the satisfaction value is. We motivate our approach with classical examples from Game Theory and study the complexity of model-checking SLdisc[D]-formulas.

11."What if?" in Probabilistic Logic Programming

Authors:Rafael Kiesel, Kilian Rückschloß, Felix Weitkämper

Abstract: A ProbLog program is a logic program with facts that only hold with a specified probability. In this contribution we extend this ProbLog language by the ability to answer "What if" queries. Intuitively, a ProbLog program defines a distribution by solving a system of equations in terms of mutually independent predefined Boolean random variables. In the theory of causality, Judea Pearl proposes a counterfactual reasoning for such systems of equations. Based on Pearl's calculus, we provide a procedure for processing these counterfactual queries on ProbLog programs, together with a proof of correctness and a full implementation. Using the latter, we provide insights into the influence of different parameters on the scalability of inference. Finally, we also show that our approach is consistent with CP-logic, i.e. with the causal semantics for logic programs with annotated with disjunctions.

12.Model evaluation for extreme risks

Authors:Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Abstract: Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.

13.Measuring and Mitigating Constraint Violations of In-Context Learning for Utterance-to-API Semantic Parsing

Authors:Shufan Wang, Sebastien Jean, Sailik Sengupta, James Gung, Nikolaos Pappas, Yi Zhang

Abstract: In executable task-oriented semantic parsing, the system aims to translate users' utterances in natural language to machine-interpretable programs (API calls) that can be executed according to pre-defined API specifications. With the popularity of Large Language Models (LLMs), in-context learning offers a strong baseline for such scenarios, especially in data-limited regimes. However, LLMs are known to hallucinate and therefore pose a formidable challenge in constraining generated content. Thus, it remains uncertain if LLMs can effectively perform task-oriented utterance-to-API generation where respecting API's structural and task-specific constraints is crucial. In this work, we seek to measure, analyze and mitigate such constraints violations. First, we identify the categories of various constraints in obtaining API-semantics from task-oriented utterances, and define fine-grained metrics that complement traditional ones. Second, we leverage these metrics to conduct a detailed error analysis of constraints violations seen in state-of-the-art LLMs, which motivates us to investigate two mitigation strategies: Semantic-Retrieval of Demonstrations (SRD) and API-aware Constrained Decoding (API-CD). Our experiments show that these strategies are effective at reducing constraints violations and improving the quality of the generated API calls, but require careful consideration given their implementation complexity and latency.

1.XRoute Environment: A Novel Reinforcement Learning Environment for Routing

Authors:Zhanwen Zhou, Hankz Hankui Zhuo, Xiaowu Zhang, Qiyuan Deng

Abstract: Routing is a crucial and time-consuming stage in modern design automation flow for advanced technology nodes. Great progress in the field of reinforcement learning makes it possible to use those approaches to improve the routing quality and efficiency. However, the scale of the routing problems solved by reinforcement learning-based methods in recent studies is too small for these methods to be used in commercial EDA tools. We introduce the XRoute Environment, a new reinforcement learning environment where agents are trained to select and route nets in an advanced, end-to-end routing framework. Novel algorithms and ideas can be quickly tested in a safe and reproducible manner in it. The resulting environment is challenging, easy to use, customize and add additional scenarios, and it is available under a permissive open-source license. In addition, it provides support for distributed deployment and multi-instance experiments. We propose two tasks for learning and build a full-chip test bed with routing benchmarks of various region sizes. We also pre-define several static routing regions with different pin density and number of nets for easier learning and testing. For net ordering task, we report baseline results for two widely used reinforcement learning algorithms (PPO and DQN) and one searching-based algorithm (TritonRoute). The XRoute Environment will be available at

2.ChipGPT: How far are we from natural language hardware design

Authors:Kaiyan Chang, Ying Wang, Haimeng Ren, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li

Abstract: As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction. To estimate the potential of the hardware design process assisted by LLMs, this work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications. To realize a more accessible and efficient chip development flow, we present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning. At first, the demo, ChipGPT, begins by generating prompts for the LLM, which then produces initial Verilog programs. Second, an output manager corrects and optimizes these programs before collecting them into the final design space. Eventually, ChipGPT will search through this space to select the optimal design under the target metrics. The evaluation sheds some light on whether LLMs can generate correct and complete hardware logic designs described by natural language for some specifications. It is shown that ChipGPT improves programmability, and controllability, and shows broader design optimization space compared to prior work and native LLMs alone.

3.Message Intercommunication for Inductive Relation Reasoning

Authors:Ke Liang, Lingyuan Meng, Sihang Zhou, Siwei Wang, Wenxuan Tu, Yue Liu, Meng Liu, Xinwang Liu

Abstract: Inductive relation reasoning for knowledge graphs, aiming to infer missing links between brand-new entities, has drawn increasing attention. The models developed based on Graph Inductive Learning, called GraIL-based models, have shown promising potential for this task. However, the uni-directional message-passing mechanism hinders such models from exploiting hidden mutual relations between entities in directed graphs. Besides, the enclosing subgraph extraction in most GraIL-based models restricts the model from extracting enough discriminative information for reasoning. Consequently, the expressive ability of these models is limited. To address the problems, we propose a novel GraIL-based inductive relation reasoning model, termed MINES, by introducing a Message Intercommunication mechanism on the Neighbor-Enhanced Subgraph. Concretely, the message intercommunication mechanism is designed to capture the omitted hidden mutual information. It introduces bi-directed information interactions between connected entities by inserting an undirected/bi-directed GCN layer between uni-directed RGCN layers. Moreover, inspired by the success of involving more neighbors in other graph-based tasks, we extend the neighborhood area beyond the enclosing subgraph to enhance the information collection for inductive relation reasoning. Extensive experiments on twelve inductive benchmark datasets demonstrate that our MINES outperforms existing state-of-the-art models, and show the effectiveness of our intercommunication mechanism and reasoning on the neighbor-enhanced subgraph.

4.Simulating News Recommendation Ecosystem for Fun and Profit

Authors:Guangping Zhang, Dongsheng Li, Hansu Gu, Tun Lu, Li Shang, Ning Gu

Abstract: Understanding the evolution of online news communities is essential for designing more effective news recommender systems. However, due to the lack of appropriate datasets and platforms, the existing literature is limited in understanding the impact of recommender systems on this evolutionary process and the underlying mechanisms, resulting in sub-optimal system designs that may affect long-term utilities. In this work, we propose SimuLine, a simulation platform to dissect the evolution of news recommendation ecosystems and present a detailed analysis of the evolutionary process and underlying mechanisms. SimuLine first constructs a latent space well reflecting the human behaviors, and then simulates the news recommendation ecosystem via agent-based modeling. Based on extensive simulation experiments and the comprehensive analysis framework consisting of quantitative metrics, visualization, and textual explanations, we analyze the characteristics of each evolutionary phase from the perspective of life-cycle theory, and propose a relationship graph illustrating the key factors and affecting mechanisms. Furthermore, we explore the impacts of recommender system designing strategies, including the utilization of cold-start news, breaking news, and promotion, on the evolutionary process, which shed new light on the design of recommender systems.

5.Training Transitive and Commutative Multimodal Transformers with LoReTTa

Authors:Manuel Tran, Amal Lahiani, Yashin Dicente Cid, Fabian J. Theis, Tingying Peng, Eldad Klaiman

Abstract: Collecting a multimodal dataset with two paired modalities A and B or B and C is difficult in practice. Obtaining a dataset with three aligned modalities A, B, and C is even more challenging. For example, some public medical datasets have only genetic sequences and microscopic images for one patient, and only genetic sequences and radiological images for another - but no dataset includes both microscopic and radiological images for the same patient. This makes it difficult to integrate and combine all modalities into a large pre-trained neural network. We introduce LoReTTa (Linking mOdalities with a tRansitive and commutativE pre-Training sTrAtegy) to address this understudied problem. Our self-supervised framework combines causal masked modeling with the rules of commutativity and transitivity to transition within and between different modalities. Thus, it can model the relation A -> C with A -> B -> C. Given a dataset containing only the disjoint combinations (A, B) and (B, C), we show that a transformer pre-trained with LoReTTa can handle any modality combination at inference time, including the never-seen pair (A, C) and the triplet (A, B, C). We evaluate our approach on a multimodal dataset derived from MNIST containing speech, vision, and language, as well as a real-world medical dataset containing mRNA, miRNA, and RPPA samples from TCGA. Compared to traditional pre-training methods, we observe up to a 100-point reduction in perplexity for autoregressive generation tasks and up to a 15% improvement in classification accuracy for previously unseen modality pairs during the pre-training phase.

1.Diversity and Inclusion in Artificial Intelligence

Authors:Didar Zowghi, Francesca da Rimini

Abstract: To date, there has been little concrete practical advice about how to ensure that diversity and inclusion considerations should be embedded within both specific Artificial Intelligence (AI) systems and the larger global AI ecosystem. In this chapter, we present a clear definition of diversity and inclusion in AI, one which positions this concept within an evolving and holistic ecosystem. We use this definition and conceptual framing to present a set of practical guidelines primarily aimed at AI technologists, data scientists and project leaders.

2.Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning

Authors:Chi Han, Qizheng He, Charles Yu, Xinya Du, Hanghang Tong, Heng Ji

Abstract: Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable.

3.GraphCare: Enhancing Healthcare Predictions with Open-World Personalized Knowledge Graphs

Authors:Pengcheng Jiang, Cao Xiao, Adam Cross, Jimeng Sun

Abstract: Clinical predictive models often rely on patients electronic health records (EHR), but integrating medical knowledge to enhance predictions and decision-making is challenging. This is because personalized predictions require personalized knowledge graphs (KGs), which are difficult to generate from patient EHR data. To address this, we propose GraphCare, an open-world framework that leverages external KGs to improve EHR-based predictions. Our method extracts knowledge from large language models (LLMs) and external biomedical KGs to generate patient-specific KGs, which are then used to train our proposed Bi-attention AugmenTed BAT graph neural network GNN for healthcare predictions. We evaluate GraphCare on two public datasets: MIMIC-III and MIMIC-IV. Our method outperforms baseline models in four vital healthcare prediction tasks: mortality, readmission, length-of-stay, and drug recommendation, improving AUROC on MIMIC-III by average margins of 10.4%, 3.8%, 2.0%, and 1.5%, respectively. Notably, GraphCare demonstrates a substantial edge in scenarios with limited data availability. Our findings highlight the potential of using external KGs in healthcare prediction tasks and demonstrate the promise of GraphCare in generating personalized KGs for promoting personalized medicine.

4.Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Authors:Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Nakahara, Naoya Takeishi, Yoshinobu Kawahara

Abstract: Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace for RL), and the source environment parameters are usually unknown. In this paper, we propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios. We adopt an approach that combines RL and supervised learning by selecting actions of demonstrations in RL based on the minimum distance of dynamic time warping for utilizing the information of the unknown source dynamics. This approach can be easily applied to many existing neural network architectures and provide us with an RL model balanced between reproducibility as imitation and generalization ability to obtain rewards in cyberspace. In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the reproducibility and the generalization ability compared with the baselines. In particular, we used the tracking data of professional football players as expert demonstrations in football and show successful performances despite the larger gap between behaviors in the source and target environments than the chase-and-escape task.

5.Road Planning for Slums via Deep Reinforcement Learning

Authors:Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, Yong Li

Abstract: Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading.

6.Transforming Geospatial Ontologies by Homomorphisms

Authors:Xiuzhan Guo, Wei Huang, Min Luo, Priya Rangarajan

Abstract: In this paper, we study the (geospatial) ontologies we are interested in together as an ontology (a geospatial ontology) system, consisting of a set of the (geospatial) ontologies and a set of ontology operations. A homomorphism between two ontology systems is a function between two sets of ontologies, which preserves these ontology operations. We view clustering a set of the ontologies we are interested in as partitioning the set or defining an equivalence relation on the set or forming a quotient set of the set or obtaining the surjective image of the set. Each ontology system homomorphism can be factored as a surjective clustering to a quotient space, followed by an embedding. Ontology (merging) systems, natural partial orders on the systems, and ontology merging closures in the systems are then transformed under ontology system homomorphisms, given by quotients and embeddings.

7.Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman

Authors:Jannis Weil, Johannes Czech, Tobias Meuser, Kristian Kersting

Abstract: In combination with Reinforcement Learning, Monte-Carlo Tree Search has shown to outperform human grandmasters in games such as Chess, Shogi and Go with little to no prior domain knowledge. However, most classical use cases only feature up to two players. Scaling the search to an arbitrary number of players presents a computational challenge, especially if decisions have to be planned over a longer time horizon. In this work, we investigate techniques that transform general-sum multiplayer games into single-player and two-player games that consider other agents to act according to given opponent models. For our evaluation, we focus on the challenging Pommerman environment which involves partial observability, a long time horizon and sparse rewards. In combination with our search methods, we investigate the phenomena of opponent modeling using heuristics and self-play. Overall, we demonstrate the effectiveness of our multiplayer search variants both in a supervised learning and reinforcement learning setting.

8.NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

Authors:David Herron, Ernesto Jiménez-Ruiz, Giacomo Tarroni, Tillman Weyde

Abstract: NeSy4VRD is a multifaceted resource designed to support the development of neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images of the VRD dataset and couples them with an extensively revised, quality-improved version of the VRD visual relationship annotations. Crucially, NeSy4VRD provides a well-aligned, companion OWL ontology that describes the dataset domain.It comes with open source infrastructure that provides comprehensive support for extensibility of the annotations (which, in turn, facilitates extensibility of the ontology), and open source code for loading the annotations to/from a knowledge graph. We are contributing NeSy4VRD to the computer vision, NeSy and Semantic Web communities to help foster more NeSy research using OWL-based knowledge graphs.

1.TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

Authors:Shubhra Kanti Karmaker Santu, Dongji Feng

Abstract: While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task.

2.SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs

Authors:IokTong Lei, ZhiDong Deng

Abstract: This paper show a work on better use of LLMs with SelfzCoT a self-prompt zero-shot CoT. Specifically, on the zero-shot arithmetic reasoning tasks, the accuracy of the proposed SelfzCoT is improved with GSM8K from 40.50% to 82.34%, with MultiArith from 79.3% to 94.7%, with ADDSUB from 74.70% to 94.10%, with SingleEq from 78.70% to 91.30%, with AQUA from 31.90% to 82.33%, and with SVAMP from 63.70% to 79.70%. Totally, using the first two lasting path activations to LLM and particularly, the code-level self-prompt, the SelfzCoT has a huge improvement on all six zero-shot arithmetic reasoning tasks. Additionally, our modified zero-shot CoT (MzCoT) also achieves remarkable performance in the reasoning tasks. The accuracy of the proposed MzCoT is enhanced with GSM8K from 40.50% to 76.32%, with MultiArith from 79.3% to 96.97%, with ADDSUB from 74.70% to 92.39%, with SingleEq from 78.70% to 94.60%, with AQUA from 31.90% to 79.90%, and with SVAMP from 63.70% to 81.50%. Notably, SelfzCoT has the best performance on GSM8K among all the recent zero-shot methods.

3.Testing System Intelligence

Authors:Joseph Sifakis

Abstract: We discuss the adequacy of tests for intelligent systems and practical problems raised by their implementation. We propose the replacement test as the ability of a system to replace successfully another system performing a task in a given context. We show how it can characterize salient aspects of human intelligence that cannot be taken into account by the Turing test. We argue that building intelligent systems passing the replacement test involves a series of technical problems that are outside the scope of current AI. We present a framework for implementing the proposed test and validating the properties of the intelligent systems. We discuss the inherent limitations of intelligent system validation and advocate new theoretical foundations for extending existing rigorous test methods. We suggest that the replacement test, based on the complementarity of skills between human and machine, can lead to a multitude of intelligence concepts reflecting the ability to combine data-based and symbolic knowledge to varying degrees.

4.Terraforming -- Environment Manipulation during Disruptions for Multi-Agent Pickup and Delivery

Authors:David Vainshtein, Yaakov Sherma, Kiril Solovey, Oren Salzman

Abstract: In automated warehouses, teams of mobile robots fulfill the packaging process by transferring inventory pods to designated workstations while navigating narrow aisles formed by tightly packed pods. This problem is typically modeled as a Multi-Agent Pickup and Delivery (MAPD) problem, which is then solved by repeatedly planning collision-free paths for agents on a fixed graph, as in the Rolling-Horizon Collision Resolution (RHCR) algorithm. However, existing approaches make the limiting assumption that agents are only allowed to move pods that correspond to their current task, while considering the other pods as stationary obstacles (even though all pods are movable). This behavior can result in unnecessarily long paths which could otherwise be avoided by opening additional corridors via pod manipulation. To this end, we explore the implications of allowing agents the flexibility of dynamically relocating pods. We call this new problem Terraforming MAPD (tMAPD) and develop an RHCR-based approach to tackle it. As the extra flexibility of terraforming comes at a significant computational cost, we utilize this capability judiciously by identifying situations where it could make a significant impact on the solution quality. In particular, we invoke terraforming in response to disruptions that often occur in automated warehouses, e.g., when an item is dropped from a pod or when agents malfunction. Empirically, using our approach for tMAPD, where disruptions are modeled via a stochastic process, we improve throughput by over 10%, reduce the maximum service time (the difference between the drop-off time and the pickup time of a pod) by more than 50%, without drastically increasing the runtime, compared to the MAPD setting.

5.Trustworthy Federated Learning: A Survey

Authors:Asadullah Tariq, Mohamed Adel Serhani, Farag Sallabi, Tariq Qayyum, Ezedin S. Barka, Khaled A. Shuaib

Abstract: Federated Learning (FL) has emerged as a significant advancement in the field of Artificial Intelligence (AI), enabling collaborative model training across distributed devices while maintaining data privacy. As the importance of FL increases, addressing trustworthiness issues in its various aspects becomes crucial. In this survey, we provide an extensive overview of the current state of Trustworthy FL, exploring existing solutions and well-defined pillars relevant to Trustworthy . Despite the growth in literature on trustworthy centralized Machine Learning (ML)/Deep Learning (DL), further efforts are necessary to identify trustworthiness pillars and evaluation metrics specific to FL models, as well as to develop solutions for computing trustworthiness levels. We propose a taxonomy that encompasses three main pillars: Interpretability, Fairness, and Security & Privacy. Each pillar represents a dimension of trust, further broken down into different notions. Our survey covers trustworthiness challenges at every level in FL settings. We present a comprehensive architecture of Trustworthy FL, addressing the fundamental principles underlying the concept, and offer an in-depth analysis of trust assessment mechanisms. In conclusion, we identify key research challenges related to every aspect of Trustworthy FL and suggest future research directions. This comprehensive survey serves as a valuable resource for researchers and practitioners working on the development and implementation of Trustworthy FL systems, contributing to a more secure and reliable AI landscape.

6.Trustworthy, responsible, ethical AI in manufacturing and supply chains: synthesis and emerging research questions

Authors:Alexandra Brintrup, George Baryannis, Ashutosh Tiwari, Svetan Ratchev, Giovanna Martinez-Arellano, Jatinder Singh

Abstract: While the increased use of AI in the manufacturing sector has been widely noted, there is little understanding on the risks that it may raise in a manufacturing organisation. Although various high level frameworks and definitions have been proposed to consolidate potential risks, practitioners struggle with understanding and implementing them. This lack of understanding exposes manufacturing to a multitude of risks, including the organisation, its workers, as well as suppliers and clients. In this paper, we explore and interpret the applicability of responsible, ethical, and trustworthy AI within the context of manufacturing. We then use a broadened adaptation of a machine learning lifecycle to discuss, through the use of illustrative examples, how each step may result in a given AI trustworthiness concern. We additionally propose a number of research questions to the manufacturing research community, in order to help guide future research so that the economic and societal benefits envisaged by AI in manufacturing are delivered safely and responsibly.

7.Flexible and Inherently Comprehensible Knowledge Representation for Data-Efficient Learning and Trustworthy Human-Machine Teaming in Manufacturing Environments

Authors:Vedran Galetić, Alistair Nottle

Abstract: Trustworthiness of artificially intelligent agents is vital for the acceptance of human-machine teaming in industrial manufacturing environments. Predictable behaviours and explainable (and understandable) rationale allow humans collaborating with (and building) these agents to understand their motivations and therefore validate decisions that are made. To that aim, we make use of G\"ardenfors's cognitively inspired Conceptual Space framework to represent the agent's knowledge using concepts as convex regions in a space spanned by inherently comprehensible quality dimensions. A simple typicality quantification model is built on top of it to determine fuzzy category membership and classify instances interpretably. We apply it on a use case from the manufacturing domain, using objects' physical properties obtained from cobots' onboard sensors and utilisation properties from crowdsourced commonsense knowledge available at public knowledge bases. Such flexible knowledge representation based on property decomposition allows for data-efficient representation learning of typically highly specialist or specific manufacturing artefacts. In such a setting, traditional data-driven (e.g., computer vision-based) classification approaches would struggle due to training data scarcity. This allows for comprehensibility of an AI agent's acquired knowledge by the human collaborator thus contributing to trustworthiness. We situate our approach within an existing explainability framework specifying explanation desiderata. We provide arguments for our system's applicability and appropriateness for different roles of human agents collaborating with the AI system throughout its design, validation, and operation.

8.Introspective Tips: Large Language Model for In-Context Decision Making

Authors:Liting Chen, Lu Wang, Hang Dong, Yali Du, Jie Yan, Fangkai Yang, Shuang Li, Pu Zhao, Si Qin, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

Abstract: The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

9.Tune-Mode ConvBN Blocks For Efficient Transfer Learning

Authors:Kaichao You, Anchang Bao, Guo Qin, Meng Cao, Ping Huang, Jiulong Shan, Mingsheng Long

Abstract: Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and model validation, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in both object detection and classification tasks, carried out across various datasets and model architectures, we demonstrate that the proposed Tune mode does not hurt the original performance while significantly reducing GPU memory footprint and training time, thereby contributing an efficient solution to transfer learning with convolutional networks.

10.Applying Ising Machines to Multi-objective QUBOs

Authors:Mayowa Ayodele, Richard Allmendinger, Manuel López-Ibáñez, Arnaud Liefooghe, Matthieu Parizy

Abstract: Multi-objective optimisation problems involve finding solutions with varying trade-offs between multiple and often conflicting objectives. Ising machines are physical devices that aim to find the absolute or approximate ground states of an Ising model. To apply Ising machines to multi-objective problems, a weighted sum objective function is used to convert multi-objective into single-objective problems. However, deriving scalarisation weights that archives evenly distributed solutions across the Pareto front is not trivial. Previous work has shown that adaptive weights based on dichotomic search, and one based on averages of previously explored weights can explore the Pareto front quicker than uniformly generated weights. However, these adaptive methods have only been applied to bi-objective problems in the past. In this work, we extend the adaptive method based on averages in two ways: (i)~we extend the adaptive method of deriving scalarisation weights for problems with two or more objectives, and (ii)~we use an alternative measure of distance to improve performance. We compare the proposed method with existing ones and show that it leads to the best performance on multi-objective Unconstrained Binary Quadratic Programming (mUBQP) instances with 3 and 4 objectives and that it is competitive with the best one for instances with 2 objectives.

11.Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Authors:Aran Nayebi, Rishi Rajalingham, Mehrdad Jazayeri, Guangyu Robert Yang

Abstract: Humans and animals have a rich and flexible understanding of the physical world, which enables them to infer the underlying dynamical trajectories of objects and events, plausible future states, and use that to plan and anticipate the consequences of actions. However, the neural mechanisms underlying these computations are unclear. We combine a goal-driven modeling approach with dense neurophysiological data and high-throughput human behavioral readouts to directly impinge on this question. Specifically, we construct and evaluate several classes of sensory-cognitive networks to predict the future state of rich, ethologically-relevant environments, ranging from self-supervised end-to-end models with pixel-wise or object-centric objectives, to models that future predict in the latent space of purely static image-based or dynamic video-based pretrained foundation models. We find strong differentiation across these model classes in their ability to predict neural and behavioral data both within and across diverse environments. In particular, we find that neural responses are currently best predicted by models trained to predict the future state of their environment in the latent space of pretrained foundation models optimized for dynamic scenes in a self-supervised manner. Notably, models that future predict in the latent space of video foundation models that are optimized to support a diverse range of sensorimotor tasks, reasonably match both human behavioral error patterns and neural dynamics across all environmental scenarios that we were able to test. Overall, these findings suggest that the neural mechanisms and behaviors of primate mental simulation are thus far most consistent with being optimized to future predict on dynamic, reusable visual representations that are useful for embodied AI more generally.

12.Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Authors:Yang You, Vincent Thomas, Francis Colas, Olivier Buffet

Abstract: Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.

13.Summarizing Strategy Card Game AI Competition

Authors:Jakub Kowalski, Radosław Miernik

Abstract: This paper concludes five years of AI competitions based on Legends of Code and Magic (LOCM), a small Collectible Card Game (CCG), designed with the goal of supporting research and algorithm development. The game was used in a number of events, including Community Contests on the CodinGame platform, and Strategy Card Game AI Competition at the IEEE Congress on Evolutionary Computation and IEEE Conference on Games. LOCM has been used in a number of publications related to areas such as game tree search algorithms, neural networks, evaluation functions, and CCG deckbuilding. We present the rules of the game, the history of organized competitions, and a listing of the participant and their approaches, as well as some general advice on organizing AI competitions for the research community. Although the COG 2022 edition was announced to be the last one, the game remains available and can be played using an online leaderboard arena.

1.Ambient Technology & Intelligence

Authors:Amos Okomayin, Tosin Ige

Abstract: Today, we have a mixture of young and older individuals, people with special needs, and people who can care for themselves. Over 1 billion people are estimated to be disabled; this figure corresponds to about 15% of the world's population, with 3.8% (approximately 190 million people) accounting for people aged 15 and up (Organization, 2011). The number of people with disabilities is upward due to the increase in chronic health conditions and many other things. These and other factors have made the need for proper care facilities urgent in today's society. Several care facilities are built to help people with disabilities live their everyday lives and not be left out of the community.

2.Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend

Authors:Chong Yu, Tao Chen, Zhongxue Gan

Abstract: Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.

3.Numeric Magnitude Comparison Effects in Large Language Models

Authors:Raj Sanjay Shah, Vijay Marupudi, Reba Koenen, Khushi Bhardwaj, Sashank Varma

Abstract: Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text. In contrast, neuroscience research has identified distinct neural representations for numbers and words. In this work, we investigate how well popular LLMs capture the magnitudes of numbers (e.g., that $4 < 5$) from a behavioral lens. Prior research on the representational capabilities of LLMs evaluates whether they show human-level performance, for instance, high overall accuracy on standard benchmarks. Here, we ask a different question, one inspired by cognitive science: How closely do the number representations of LLMscorrespond to those of human language users, who typically demonstrate the distance, size, and ratio effects? We depend on a linking hypothesis to map the similarities among the model embeddings of number words and digits to human response times. The results reveal surprisingly human-like representations across language models of different architectures, despite the absence of the neural circuitry that directly supports these representations in the human brain. This research shows the utility of understanding LLMs using behavioral benchmarks and points the way to future work on the number of representations of LLMs and their cognitive plausibility.

4.Transforming Human-Centered AI Collaboration: Redefining Embodied Agents Capabilities through Interactive Grounded Language Instructions

Authors:Shrestha Mohanty, Negar Arabzadeh, Julia Kiseleva, Artem Zholus, Milagro Teruel, Ahmed Awadallah, Yuxuan Sun, Kavya Srinet, Arthur Szlam

Abstract: Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly. This skill is evident from a young age as we acquire new abilities and solve problems by imitating others or following natural language instructions. The research community is actively pursuing the development of interactive "embodied agents" that can engage in natural conversations with humans and assist them with real-world tasks. These agents must possess the ability to promptly request feedback in case communication breaks down or instructions are unclear. Additionally, they must demonstrate proficiency in learning new vocabulary specific to a given domain. In this paper, we made the following contributions: (1) a crowd-sourcing tool for collecting grounded language instructions; (2) the largest dataset of grounded language instructions; and (3) several state-of-the-art baselines. These contributions are suitable as a foundation for further research.

5.Constructing a personalized AI assistant for shear wall layout using Stable Diffusion

Authors:Lufeng Wang, Jiepeng Liu, Guozhong Cheng, En Liu, Wei Chen

Abstract: Shear wall structures are widely used in high-rise residential buildings, and the layout of shear walls requires many years of design experience and iterative trial and error. Currently, there are methods based on heuristic algorithms, but they generate results too slowly. Those based on Generative Adversarial Networks (GANs) or Graph Neural Networks (GNNs) can only generate single arrangements and require large amounts of training data. At present, Stable Diffusion is being widely used, and by using the Low-Rank Adaptation (LoRA) method to fine-tune large models with small amounts of data, good generative results can be achieved. Therefore, this paper proposes a personalized AI assistant for shear wall layout based on Stable Diffusion, which has been proven to produce good generative results through testing.

6.AIwriting: Relations Between Image Generation and Digital Writing

Authors:Scott Rettberg, Talan Memmott, Jill Walker Rettberg, Jason Nelson, Patrick Lichty

Abstract: During 2022, both transformer-based AI text generation sys-tems such as GPT-3 and AI text-to-image generation systems such as DALL-E 2 and Stable Diffusion made exponential leaps forward and are unquestionably altering the fields of digital art and electronic literature. In this panel a group of electronic literature authors and theorists consider new oppor-tunities for human creativity presented by these systems and present new works have produced during the past year that specifically address these systems as environments for literary expressions that are translated through iterative interlocutive processes into visual representations. The premise that binds these presentations is that these systems and the works gener-ated must be considered from a literary perspective, as they originate in human writing. In works ranging from a visual memoir of the personal experience of a health crisis, to interac-tive web comics, to architectures based on abstract poetic language, to political satire, four artists explore the capabili-ties of these writing environments for new genres of literary artist practice, while a digital culture theorist considers the origins and effects of the particular training datasets of human language and images on which these new hybrid forms are based.

7.Non-deterministic approximation operators: ultimate operators, semi-equilibrium semantics and aggregates (full version)

Authors:Jesse Heyninck, Bart Bogaerts

Abstract: Approximation fixpoint theory (AFT) is an abstract and general algebraic framework for studying the semantics of non-monotonic logics. In recent work, AFT was generalized to non-deterministic operators, i.e.\ operators whose range are sets of elements rather than single elements. In this paper, we make three further contributions to non-deterministic AFT: (1) we define and study ultimate approximations of non-deterministic operators, (2) we give an algebraic formulation of the semi-equilibrium semantics by Amendola, et al., and (3) we generalize the characterisations of disjunctive logic programs to disjunctive logic programs with aggregates.

8.Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Authors:Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability.

9.A Generalist Dynamics Model for Control

Authors:Ingmar Schubert, Jingwei Zhang, Jake Bruce, Sarah Bechtle, Emilio Parisotto, Martin Riedmiller, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Nicolas Heess

Abstract: We investigate the use of transformer sequence models as dynamics models (TDMs) for control. In a number of experiments in the DeepMind control suite, we find that first, TDMs perform well in a single-environment learning setting when compared to baseline models. Second, TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist model is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist model is applied to an unseen environment without any further training. We further demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. This makes TDMs a promising ingredient for a foundation model of control.

10.A method for the ethical analysis of brain-inspired AI

Authors:Michele Farisco, Gianluca Baldassarre, Emilio Cartoni, Antonia Leach, Mihai A. Petrovici, Achim Rosemann, Arleen Salles, Bernd Stahl, Sacha J. van Albada

Abstract: Despite its successes, to date Artificial Intelligence (AI) is still characterized by a number of shortcomings with regards to different application domains and goals. These limitations are arguably both conceptual (e.g., related to underlying theoretical models, such as symbolic vs. connectionist), and operational (e.g., related to robustness and ability to generalize). Biologically inspired AI, and more specifically brain-inspired AI, promises to provide further biological aspects beyond those that are already traditionally included in AI, making it possible to assess and possibly overcome some of its present shortcomings. This article examines some conceptual, technical, and ethical issues raised by the development and use of brain-inspired AI. Against this background, the paper asks whether there is anything ethically unique about brain-inspired AI. The aim of the paper is to introduce a method that has a heuristic nature and that can be applied to identify and address the ethical issues arising from brain-inspired AI. The conclusion resulting from the application of this method is that, compared to traditional AI, brain-inspired AI raises new foundational ethical issues and some new practical ethical issues, and exacerbates some of the issues raised by traditional AI.

11.Prevention is better than cure: a case study of the abnormalities detection in the chest

Authors:Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek

Abstract: Prevention is better than cure. This old truth applies not only to the prevention of diseases but also to the prevention of issues with AI models used in medicine. The source of malfunctioning of predictive models often lies not in the training process but reaches the data acquisition phase or design of the experiment phase. In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Errors made at the data collection stage make it difficult to validate the model correctly. Based on this use case, we show how to monitor data and model balance (fairness) throughout the life cycle of a predictive model, from data acquisition to parity analysis of model scores.

12.Generalized Planning in PDDL Domains with Pretrained Large Language Models

Authors:Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Pack Kaelbling, Michael Katz

Abstract: Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.

13.Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Authors:Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Wenhai Wang, Shouling Ji

Abstract: Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although Neural Language Models achieve significant performance in this field, an emerging trend is combining neural models with external knowledge. Most previous approaches rely on the sentence-level retrieval and combination paradigm (retrieval of similar code snippets and use of the corresponding code and summary pairs) on the encoder side. However, this paradigm is coarse-grained and cannot directly take advantage of the high-quality retrieved summary tokens on the decoder side. In this paper, we explore a fine-grained token-level retrieval-augmented mechanism on the decoder side to help the vanilla neural model generate a better code summary. Furthermore, to mitigate the limitation of token-level retrieval on capturing contextual code semantics, we propose to integrate code semantics into summary tokens. Extensive experiments and human evaluation reveal that our token-level retrieval-augmented approach significantly improves performance and is more interpretive.

14.Lightweight Online Learning for Sets of Related Problems in Automated Reasoning

Authors:Haoze Wu, Christopher Hahn, Florian Lonsing, Makai Mann, Raghuram Ramanujan, Clark Barrett

Abstract: We present Self-Driven Strategy Learning (sdsl), a lightweight online learning methodology for automated reasoning tasks that involve solving a set of related problems. sdsl automatically gathers information, in form of a dataset, while solving earlier problems. It utilizes the learned data to adjust the solving strategy for later problems by fitting a machine learning model to the obtained data on the fly. We formally define the approach as a set of abstract transition rules. We describe a concrete instance of the sdsl calculus which uses conditional sampling for generating data and random forests as the underlying machine learning model. We implement the approach on top of the Kissat solver and show that the combination of Kissat+sdsl certifies larger bounds and finds more counter-examples than other state-of-the-art bounded model checking approaches on benchmarks obtained from the latest Hardware Model Checking Competition.

15.A Simple Generative Model of Logical Reasoning and Statistical Learning

Authors:Hiroyuki Kido

Abstract: Statistical learning and logical reasoning are two major fields of AI expected to be unified for human-like machine intelligence. Most existing work considers how to combine existing logical and statistical systems. However, there is no theory of inference so far explaining how basic approaches to statistical learning and logical reasoning stem from a common principle. Inspired by the fact that much empirical work in neuroscience suggests Bayesian (or probabilistic generative) approaches to brain function including learning and reasoning, we here propose a simple Bayesian model of logical reasoning and statistical learning. The theory is statistically correct as it satisfies Kolmogorov's axioms, is consistent with both Fenstad's representation theorem and maximum likelihood estimation and performs exact Bayesian inference with a linear-time complexity. The theory is logically correct as it is a data-driven generalisation of uncertain reasoning from consistency, possibility, inconsistency and impossibility. The theory is correct in terms of machine learning as its solution to generation and prediction tasks on the MNIST dataset is not only empirically reasonable but also theoretically correct against the K nearest neighbour method. We simply model how data causes symbolic knowledge in terms of its satisfiability in formal logic. Symbolic reasoning emerges as a result of the process of going the causality forwards and backwards. The forward and backward processes correspond to an interpretation and inverse interpretation in formal logic, respectively. The inverse interpretation differentiates our work from the mainstream often referred to as inverse entailment, inverse deduction or inverse resolution. The perspective gives new insights into learning and reasoning towards human-like machine intelligence.

16.SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation

Authors:Junkai Zhou, Liang Pang, Huawei Shen, Xueqi Cheng

Abstract: Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. However, for the persona-based dialogue generation task, consistency and coherence are also key factors, which are great challenges for language models. Existing works mainly focus on valuable data filtering, model structure modifying, or objective function designing, while their improvements are limited and hard to generalize to all types of pre-trained language models. However, we find that language models can produce consistent and coherent responses if we consider enough generations. Thus, the problems lay in large-scale response generation and target response selection. In this work, a simple but effective two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation. The over-sampling stage takes large-scale responses from existing trained models efficiently via off-the-shelf distilling and compressing methods, and the post-evaluation stage selects a good response based on multiple well-designed evaluation metrics from large-scale candidates. Experimental results show that the proposed plug-in SimOAP strategy improves the backbone models and outperforms the baseline strategies in both automatic and human evaluations.

17.Parallel development of social preferences in fish and machines

Authors:Joshua McGraw, Donsuk Lee, Justin Wood

Abstract: What are the computational foundations of social grouping? Traditional approaches to this question have focused on verbal reasoning or simple (low-dimensional) quantitative models. In the real world, however, social preferences emerge when high-dimensional learning systems (brains and bodies) interact with high-dimensional sensory inputs during an animal's embodied interactions with the world. A deep understanding of social grouping will therefore require embodied models that learn directly from sensory inputs using high-dimensional learning mechanisms. To this end, we built artificial neural networks (ANNs), embodied those ANNs in virtual fish bodies, and raised the artificial fish in virtual fish tanks that mimicked the rearing conditions of real fish. We then compared the social preferences that emerged in real fish versus artificial fish. We found that when artificial fish had two core learning mechanisms (reinforcement learning and curiosity-driven learning), artificial fish developed fish-like social preferences. Like real fish, the artificial fish spontaneously learned to prefer members of their own group over members of other groups. The artificial fish also spontaneously learned to self-segregate with their in-group, akin to self-segregation behavior seen in nature. Our results suggest that social grouping can emerge from three ingredients: (1) reinforcement learning, (2) intrinsic motivation, and (3) early social experiences with in-group members. This approach lays a foundation for reverse engineering animal-like social behavior with image-computable models, bridging the divide between high-dimensional sensory inputs and social preferences.

1.River of No Return: Graph Percolation Embeddings for Efficient Knowledge Graph Reasoning

Authors:Kai Wang, Siqiang Luo, Dan Lin

Abstract: We study Graph Neural Networks (GNNs)-based embedding techniques for knowledge graph (KG) reasoning. For the first time, we link the path redundancy issue in the state-of-the-art KG reasoning models based on path encoding and message passing to the transformation error in model training, which brings us new theoretical insights into KG reasoning, as well as high efficacy in practice. On the theoretical side, we analyze the entropy of transformation error in KG paths and point out query-specific redundant paths causing entropy increases. These findings guide us to maintain the shortest paths and remove redundant paths for minimized-entropy message passing. To achieve this goal, on the practical side, we propose an efficient Graph Percolation Process motivated by the percolation model in Fluid Mechanics, and design a lightweight GNN-based KG reasoning framework called Graph Percolation Embeddings (GraPE). GraPE outperforms previous state-of-the-art methods in both transductive and inductive reasoning tasks while requiring fewer training parameters and less inference time.

2.A Fusion Model: Towards a Virtual, Physical and Cognitive Integration and its Principles

Authors:Hao Lan Zhang, Yun Xue, Yifan Lu, Sanghyuk Lee

Abstract: Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), digital twin, Metaverse and other related digital technologies have attracted much attention in recent years. These new emerging technologies are changing the world significantly. This research introduces a fusion model, i.e. Fusion Universe (FU), where the virtual, physical, and cognitive worlds are merged together. Therefore, it is crucial to establish a set of principles for the fusion model that is compatible with our physical universe laws and principles. This paper investigates several aspects that could affect immersive and interactive experience; and proposes the fundamental principles for Fusion Universe that can integrate physical and virtual world seamlessly.

3.An efficient solver for ASP(Q)

Authors:Wolfgang Faber, Giuseppe Mazzotta, Francesco Ricca

Abstract: Answer Set Programming with Quantifiers ASP(Q) extends Answer Set Programming (ASP) to allow for declarative and modular modeling of problems from the entire polynomial hierarchy. The first implementation of ASP(Q), called qasp, was based on a translation to Quantified Boolean Formulae (QBF) with the aim of exploiting the well-developed and mature QBF-solving technology. However, the implementation of the QBF encoding employed in qasp is very general and might produce formulas that are hard to evaluate for existing QBF solvers because of the large number of symbols and sub-clauses. In this paper, we present a new implementation that builds on the ideas of qasp and features both a more efficient encoding procedure and new optimized encodings of ASP(Q) programs in QBF. The new encodings produce smaller formulas (in terms of the number of quantifiers, variables, and clauses) and result in a more efficient evaluation process. An algorithm selection strategy automatically combines several QBF-solving back-ends to further increase performance. An experimental analysis, conducted on known benchmarks, shows that the new system outperforms qasp.

4.A Survey on Causal Discovery: Theory and Practice

Authors:Alessio Zanga, Fabio Stella

Abstract: Understanding the laws that govern a phenomenon is the core of scientific progress. This is especially true when the goal is to model the interplay between different aspects in a causal fashion. Indeed, causal inference itself is specifically designed to quantify the underlying relationships that connect a cause to its effect. Causal discovery is a branch of the broader field of causality in which causal graphs is recovered from data (whenever possible), enabling the identification and estimation of causal effects. In this paper, we explore recent advancements in a unified manner, provide a consistent overview of existing algorithms developed under different settings, report useful tools and data, present real-world applications to understand why and how these methods can be fruitfully exploited.

5.Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients: A Causal Approach

Authors:Alessio Zanga, Alice Bernasconi, Peter J. F. Lucas, Hanny Pijnenborg, Casper Reijnen, Marco Scutari, Fabio Stella

Abstract: Assessing the pre-operative risk of lymph node metastases in endometrial cancer patients is a complex and challenging task. In principle, machine learning and deep learning models are flexible and expressive enough to capture the dynamics of clinical risk assessment. However, in this setting we are limited to observational data with quality issues, missing values, small sample size and high dimensionality: we cannot reliably learn such models from limited observational data with these sources of bias. Instead, we choose to learn a causal Bayesian network to mitigate the issues above and to leverage the prior knowledge on endometrial cancer available from clinicians and physicians. We introduce a causal discovery algorithm for causal Bayesian networks based on bootstrap resampling, as opposed to the single imputation used in related works. Moreover, we include a context variable to evaluate whether selection bias results in learning spurious associations. Finally, we discuss the strengths and limitations of our findings in light of the presence of missing data that may be missing-not-at-random, which is common in real-world clinical settings.

6.Finding an $ε$-close Variation of Parameters in Bayesian Networks

Authors:Bahare Salmani, Joost-Pieter Katoen

Abstract: This paper addresses the $\epsilon$-close parameter tuning problem for Bayesian Networks (BNs): find a minimal $\epsilon$-close amendment of probability entries in a given set of (rows in) conditional probability tables that make a given quantitative constraint on the BN valid. Based on the state-of-the-art "region verification" techniques for parametric Markov chains, we propose an algorithm whose capabilities go beyond any existing techniques. Our experiments show that $\epsilon$-close tuning of large BN benchmarks with up to 8 parameters is feasible. In particular, by allowing (i) varied parameters in multiple CPTs and (ii) inter-CPT parameter dependencies, we treat subclasses of parametric BNs that have received scant attention so far.

7.Unveiling the Potential of Counterfactuals Explanations in Employability

Authors:Raphael Mazzine Barbosa de Oliveira, Sofie Goethals, Dieter Brughmans, David Martens

Abstract: In eXplainable Artificial Intelligence (XAI), counterfactual explanations are known to give simple, short, and comprehensible justifications for complex model decisions. However, we are yet to see more applied studies in which they are applied in real-world cases. To fill this gap, this study focuses on showing how counterfactuals are applied to employability-related problems which involve complex machine learning algorithms. For these use cases, we use real data obtained from a public Belgian employment institution (VDAB). The use cases presented go beyond the mere application of counterfactuals as explanations, showing how they can enhance decision support, comply with legal requirements, guide controlled changes, and analyze novel insights.

8.Multi-Agent Reinforcement Learning: Methods, Applications, Visionary Prospects, and Challenges

Authors:Ziyuan Zhou, Guanjun Liu, Ying Tang

Abstract: Multi-agent reinforcement learning (MARL) is a widely used Artificial Intelligence (AI) technique. However, current studies and applications need to address its scalability, non-stationarity, and trustworthiness. This paper aims to review methods and applications and point out research trends and visionary prospects for the next decade. First, this paper summarizes the basic methods and application scenarios of MARL. Second, this paper outlines the corresponding research methods and their limitations on safety, robustness, generalization, and ethical constraints that need to be addressed in the practical applications of MARL. In particular, we believe that trustworthy MARL will become a hot research topic in the next decade. In addition, we suggest that considering human interaction is essential for the practical application of MARL in various societies. Therefore, this paper also analyzes the challenges while MARL is applied to human-machine interaction.

9.Neuro-Symbolic AI for Compliance Checking of Electrical Control Panels

Authors:Vito Barbara University of Calabria, Massimo Guarascio ICAR-CNR, Nicola Leone University of Calabria, Giuseppe Manco ICAR-CNR, Alessandro Quarta Sapienza University of Rome, Francesco Ricca University of Calabria, Ettore Ritacco University of Udine

Abstract: Artificial Intelligence plays a main role in supporting and improving smart manufacturing and Industry 4.0, by enabling the automation of different types of tasks manually performed by domain experts. In particular, assessing the compliance of a product with the relative schematic is a time-consuming and prone-to-error process. In this paper, we address this problem in a specific industrial scenario. In particular, we define a Neuro-Symbolic approach for automating the compliance verification of the electrical control panels. Our approach is based on the combination of Deep Learning techniques with Answer Set Programming (ASP), and allows for identifying possible anomalies and errors in the final product even when a very limited amount of training data is available. The experiments conducted on a real test case provided by an Italian Company operating in electrical control panel production demonstrate the effectiveness of the proposed approach.

10.An Empirical Study on the Language Modal in Visual Question Answering

Authors:Daowan Peng, Wei Wei, Xian-Ling Mao, Yuanyuan Fu, Dangyang Chen

Abstract: Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

11.Pragmatic Reasoning in Structured Signaling Games

Authors:Emil Carlsson, Devdatt Dubhashi

Abstract: In this work we introduce a structured signaling game, an extension of the classical signaling game with a similarity structure between meanings in the context, along with a variant of the Rational Speech Act (RSA) framework which we call structured-RSA (sRSA) for pragmatic reasoning in structured domains. We explore the behavior of the sRSA in the domain of color and show that pragmatic agents using sRSA on top of semantic representations, derived from the World Color Survey, attain efficiency very close to the information theoretic limit after only 1 or 2 levels of recursion. We also explore the interaction between pragmatic reasoning and learning in multi-agent reinforcement learning framework. Our results illustrate that artificial agents using sRSA develop communication closer to the information theoretic frontier compared to agents using RSA and just reinforcement learning. We also find that the ambiguity of the semantic representation increases as the pragmatic agents are allowed to perform deeper reasoning about each other during learning.

12.Curriculum Learning in Job Shop Scheduling using Reinforcement Learning

Authors:Constantin Waubert de Puiseau, Hasan Tercan, Tobias Meisen

Abstract: Solving job shop scheduling problems (JSSPs) with a fixed strategy, such as a priority dispatching rule, may yield satisfactory results for several problem instances but, nevertheless, insufficient results for others. From this single-strategy perspective finding a near optimal solution to a specific JSSP varies in difficulty even if the machine setup remains the same. A recent intensively researched and promising method to deal with difficulty variability is Deep Reinforcement Learning (DRL), which dynamically adjusts an agent's planning strategy in response to difficult instances not only during training, but also when applied to new situations. In this paper, we further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process. We base our approach on a state-of-the-art methodology that solves JSSP by means of DRL and graph neural network embeddings. Our work supplements the training routine of the agent by a curriculum learning strategy that ranks the problem instances shown during training by a new metric of problem instance difficulty. Our results show that certain curricula lead to significantly better performances of the DRL solutions. Agents trained on these curricula beat the top performance of those trained on randomly distributed training data, reaching 3.2% shorter average makespans.

13.People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance

Authors:Yizhi Liu Gordon, Weiguang Wang Gordon, Guodong Gordon, Gao, Ritu Agarwal

Abstract: Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.

14.Improving Link Prediction in Social Networks Using Local and Global Features: A Clustering-based Approach

Authors:Safiye Ghasemi, Amin Zarei

Abstract: Link prediction problem has increasingly become prominent in many domains such as social network analyses, bioinformatics experiments, transportation networks, criminal investigations and so forth. A variety of techniques has been developed for link prediction problem, categorized into 1) similarity based approaches which study a set of features to extract similar nodes; 2) learning based approaches which extract patterns from the input data; 3) probabilistic statistical approaches which optimize a set of parameters to establish a model which can best compute formation probability. However, existing literatures lack approaches which utilize strength of each approach by integrating them to achieve a much more productive one. To tackle the link prediction problem, we propose an approach based on the combination of first and second group methods; the existing studied works use just one of these categories. Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes, which enforce the approach more efficiency compared to approaches using mere measures. Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures which differentiate the strength of clusters; basically, the usage of local and global indices and the clustering information plays an imperative role in our link prediction process. Some extensive experiments held on real datasets including Facebook, Brightkite and HepTh indicate good performances of our proposal method. Besides, we have experimentally verified our approach with some previous techniques in the area to prove the supremacy of ours.

15.Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Authors:Kayla Boggess, Sarit Kraus, Lu Feng

Abstract: As multi-agent reinforcement learning (MARL) systems are increasingly deployed throughout society, it is imperative yet challenging for users to understand the emergent behaviors of MARL agents in complex environments. This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query, which specifies a sequence of tasks completed by agents with possible cooperation. The proposed approach encodes the temporal query as a PCTL logic formula and checks if the query is feasible under a given MARL policy via probabilistic model checking. Such explanations can help reconcile discrepancies between the actual and anticipated multi-agent behaviors. The proposed approach also generates correct and complete explanations to pinpoint reasons that make a user query infeasible. We have successfully applied the proposed approach to four benchmark MARL domains (up to 9 agents in one domain). Moreover, the results of a user study show that the generated explanations significantly improve user performance and satisfaction.

1.Can we forget how we learned? Representing states in iterated belief revision}

Authors:Paolo Liberatore

Abstract: The three most common representations of states in iterated belief revision are compared: explicit, by levels and by history. The first is a connected preorder between models, the second is a list of formulae representing equivalence classes, the third is the sequence of the previous revisions. The latter depends on the revision semantics and on history rewriting, and the latter depends on the allowed rewritings. All mechanisms represent all possible states. A rewritten history of lexicographic revision is more efficient than the other considered representations in terms of size with arbitrary history rewritings. Establishing the redundancy of such a history is a mild rewriting. It is coNP-complete in the general case, and is hard even on histories of two revisions or revisions of arbitrary length of Horn formulae, and is polynomial on histories of two Horn formulae. A minor technical result is a polynomial-time algorithm for establishing whether a Horn formula is equivalent to the negation of another Horn formula.

2.Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning

Authors:Hao Chen, Yiming Zhang, Qi Zhang, Hantao Yang, Xiaomeng Hu, Xuetao Ma, Yifan Yanggong, Junbo Zhao

Abstract: Instruction tuning for large language models (LLMs) has gained attention from researchers due to its ability to unlock the potential of LLMs in following instructions. While instruction tuning offers advantages for facilitating the adaptation of large language models (LLMs) to downstream tasks as a fine-tuning approach, training models with tens of millions or even billions of parameters on large amounts of data results in unaffordable computational costs. To address this, we focus on reducing the data used in LLM instruction tuning to decrease training costs and improve data efficiency, dubbed as Low Training Data Instruction Tuning (LTD Instruction Tuning). Specifically, this paper conducts a preliminary exploration into reducing the data used in LLM training and identifies several observations regarding task specialization for LLM training, such as the optimization of performance for a specific task, the number of instruction types required for instruction tuning, and the amount of data required for task-specific models. The results suggest that task-specific models can be trained using less than 0.5% of the original dataset, with a 2% improvement in performance over those trained on full task-related data.

3.Rounding Meets Approximate Model Counting

Authors:Jiong Yang, Kuldeep S. Meel

Abstract: The problem of model counting, also known as #SAT, is to compute the number of models or satisfying assignments of a given Boolean formula $F$. Model counting is a fundamental problem in computer science with a wide range of applications. In recent years, there has been a growing interest in using hashing-based techniques for approximate model counting that provide $(\varepsilon, \delta)$-guarantees: i.e., the count returned is within a $(1+\varepsilon)$-factor of the exact count with confidence at least $1-\delta$. While hashing-based techniques attain reasonable scalability for large enough values of $\delta$, their scalability is severely impacted for smaller values of $\delta$, thereby preventing their adoption in application domains that require estimates with high confidence. The primary contribution of this paper is to address the Achilles heel of hashing-based techniques: we propose a novel approach based on rounding that allows us to achieve a significant reduction in runtime for smaller values of $\delta$. The resulting counter, called RoundMC, achieves a substantial runtime performance improvement over the current state-of-the-art counter, ApproxMC. In particular, our extensive evaluation over a benchmark suite consisting of 1890 instances shows that RoundMC solves 204 more instances than ApproxMC, and achieves a $4\times$ speedup over ApproxMC.

4.Establishing Shared Query Understanding in an Open Multi-Agent System

Authors:Nikolaos Kondylidis, Ilaria Tiddi, Annette ten Teije

Abstract: We propose a method that allows to develop shared understanding between two agents for the purpose of performing a task that requires cooperation. Our method focuses on efficiently establishing successful task-oriented communication in an open multi-agent system, where the agents do not know anything about each other and can only communicate via grounded interaction. The method aims to assist researchers that work on human-machine interaction or scenarios that require a human-in-the-loop, by defining interaction restrictions and efficiency metrics. To that end, we point out the challenges and limitations of such a (diverse) setup, while also restrictions and requirements which aim to ensure that high task performance truthfully reflects the extent to which the agents correctly understand each other. Furthermore, we demonstrate a use-case where our method can be applied for the task of cooperative query answering. We design the experiments by modifying an established ontology alignment benchmark. In this example, the agents want to query each other, while representing different databases, defined in their own ontologies that contain different and incomplete knowledge. Grounded interaction here has the form of examples that consists of common instances, for which the agents are expected to have similar knowledge. Our experiments demonstrate successful communication establishment under the required restrictions, and compare different agent policies that aim to solve the task in an efficient manner.

5.A sequential transit network design algorithm with optimal learning under correlated beliefs

Authors:Gyugeun Yoon, Joseph Y. J. Chow

Abstract: Mobility service route design requires potential demand information to well accommodate travel demand within the service region. Transit planners and operators can access various data sources including household travel survey data and mobile device location logs. However, when implementing a mobility system with emerging technologies, estimating demand level becomes harder because of more uncertainties with user behaviors. Therefore, this study proposes an artificial intelligence-driven algorithm that combines sequential transit network design with optimal learning. An operator gradually expands its route system to avoid risks from inconsistency between designed routes and actual travel demand. At the same time, observed information is archived to update the knowledge that the operator currently uses. Three learning policies are compared within the algorithm: multi-armed bandit, knowledge gradient, and knowledge gradient with correlated beliefs. For validation, a new route system is designed on an artificial network based on public use microdata areas in New York City. Prior knowledge is reproduced from the regional household travel survey data. The results suggest that exploration considering correlations can achieve better performance compared to greedy choices in general. In future work, the problem may incorporate more complexities such as demand elasticity to travel time, no limitations to the number of transfers, and costs for expansion.

6.Growing and Serving Large Open-domain Knowledge Graphs

Authors:Ihab F. Ilyas, JP Lacerda, Yunyao Li, Umar Farooq Minhas, Ali Mousavi, Jeffrey Pound, Theodoros Rekatsinas, Chiraag Sumanth

Abstract: Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities service, and support for entity linking. We then describe how our platform, including graph embeddings, can be leveraged to create a Semantic Annotation service that links unstructured Web documents to entities in our KG. Semantic annotation of the Web effectively expands our knowledge graph with edges to open-domain Web content which can be used in various search and ranking problems. Finally, we leverage annotated Web documents to drive Open-domain Knowledge Extraction. This targeted extraction framework identifies important coverage issues in the KG, then finds relevant data sources for target entities on the Web and extracts missing information to enrich the KG. Finally, we describe adaptations to our knowledge platform needed to construct and serve private personal knowledge on-device. This includes private incremental KG construction, cross-device knowledge sync, and global knowledge enrichment.

7.Efficient Computation of General Modules for ALC Ontologies (Extended Version)

Authors:Hui Yang, Patrick Koopmann, Yue Ma, Nicole Bidoit

Abstract: We present a method for extracting general modules for ontologies formulated in the description logic ALC. A module for an ontology is an ideally substantially smaller ontology that preserves all entailments for a user-specified set of terms. As such, it has applications such as ontology reuse and ontology analysis. Different from classical modules, general modules may use axioms not explicitly present in the input ontology, which allows for additional conciseness. So far, general modules have only been investigated for lightweight description logics. We present the first work that considers the more expressive description logic ALC. In particular, our contribution is a new method based on uniform interpolation supported by some new theoretical results. Our evaluation indicates that our general modules are often smaller than classical modules and uniform interpolants computed by the state-of-the-art, and compared with uniform interpolants, can be computed in a significantly shorter time. Moreover, our method can be used for, and in fact improves, the computation of uniform interpolants and classical modules.

8.The Hardness of Reasoning about Probabilities and Causality

Authors:Benito van der Zander, Markus Bläser, Maciej Liśkiewicz

Abstract: We study formal languages which are capable of fully expressing quantitative probabilistic reasoning and do-calculus reasoning for causal effects, from a computational complexity perspective. We focus on satisfiability problems whose instance formulas allow expressing many tasks in probabilistic and causal inference. The main contribution of this work is establishing the exact computational complexity of these satisfiability problems. We introduce a new natural complexity class, named succ$\exists$R, which can be viewed as a succinct variant of the well-studied class $\exists$R, and show that the problems we consider are complete for succ$\exists$R. Our results imply even stronger algorithmic limitations than were proven by Fagin, Halpern, and Megiddo (1990) and Moss\'{e}, Ibeling, and Icard (2022) for some variants of the standard languages used commonly in probabilistic and causal inference.

9.What's the Problem, Linda? The Conjunction Fallacy as a Fairness Problem

Authors:Jose Alvarez Colmenares

Abstract: The field of Artificial Intelligence (AI) is focusing on creating automated decision-making (ADM) systems that operate as close as possible to human-like intelligence. This effort has pushed AI researchers into exploring cognitive fields like psychology. The work of Daniel Kahneman and the late Amos Tversky on biased human decision-making, including the study of the conjunction fallacy, has experienced a second revival because of this. Under the conjunction fallacy a human decision-maker will go against basic probability laws and rank as more likely a conjunction over one of its parts. It has been proven overtime through a set of experiments with the Linda Problem being the most famous one. Although this interdisciplinary effort is welcomed, we fear that AI researchers ignore the driving force behind the conjunction fallacy as captured by the Linda Problem: the fact that Linda must be stereotypically described as a woman. In this paper we revisit the Linda Problem and formulate it as a fairness problem. In doing so we introduce perception as a parameter of interest through the structural causal perception framework. Using an illustrative decision-making example, we showcase the proposed conceptual framework and its potential impact for developing fair ADM systems.

10.Deep Reinforcement Learning to Maximize Arterial Usage during Extreme Congestion

Authors:Ashutosh Dutta, Milan Jain, Arif Khan, Arun Sathanur

Abstract: Collisions, crashes, and other incidents on road networks, if left unmitigated, can potentially cause cascading failures that can affect large parts of the system. Timely handling such extreme congestion scenarios is imperative to reduce emissions, enhance productivity, and improve the quality of urban living. In this work, we propose a Deep Reinforcement Learning (DRL) approach to reduce traffic congestion on multi-lane freeways during extreme congestion. The agent is trained to learn adaptive detouring strategies for congested freeway traffic such that the freeway lanes along with the local arterial network in proximity are utilized optimally, with rewards being congestion reduction and traffic speed improvement. The experimental setup is a 2.6-mile-long 4-lane freeway stretch in Shoreline, Washington, USA with two exits and associated arterial roads simulated on a microscopic and continuous multi-modal traffic simulator SUMO (Simulation of Urban MObility) while using parameterized traffic profiles generated using real-world traffic data. Our analysis indicates that DRL-based controllers can improve average traffic speed by 21\% when compared to no-action during steep congestion. The study further discusses the trade-offs involved in the choice of reward functions, the impact of human compliance on agent performance, and the feasibility of knowledge transfer from one agent to other to address data sparsity and scaling issues.

1.Tracking Progress in Multi-Agent Path Finding

Authors:Bojie Shen, Zhe Chen, Muhammad Aamir Cheema, Daniel D. Harabor, Peter J. Stuckey

Abstract: Multi-Agent Path Finding (MAPF) is an important core problem for many new and emerging industrial applications. Many works appear on this topic each year, and a large number of substantial advancements and performance improvements have been reported. Yet measuring overall progress in MAPF is difficult: there are many potential competitors, and the computational burden for comprehensive experimentation is prohibitively large. Moreover, detailed data from past experimentation is usually unavailable. In this work, we introduce a set of methodological and visualisation tools which can help the community establish clear indicators for state-of-the-art MAPF performance and which can facilitate large-scale comparisons between MAPF solvers. Our objectives are to lower the barrier of entry for new researchers and to further promote the study of MAPF, since progress in the area and the main challenges are made much clearer.

2.SAT-Based PAC Learning of Description Logic Concepts

Authors:Balder ten Cate, Maurice Funk, Jean Christoph Jung, Carsten Lutz

Abstract: We propose bounded fitting as a scheme for learning description logic concepts in the presence of ontologies. A main advantage is that the resulting learning algorithms come with theoretical guarantees regarding their generalization to unseen examples in the sense of PAC learning. We prove that, in contrast, several other natural learning algorithms fail to provide such guarantees. As a further contribution, we present the system SPELL which efficiently implements bounded fitting for the description logic $\mathcal{ELH}^r$ based on a SAT solver, and compare its performance to a state-of-the-art learner.

3.MADDM: Multi-Advisor Dynamic Binary Decision-Making by Maximizing the Utility

Authors:Zhaori Guo, Timothy J. Norman, Enrico H. Gerding

Abstract: Being able to infer ground truth from the responses of multiple imperfect advisors is a problem of crucial importance in many decision-making applications, such as lending, trading, investment, and crowd-sourcing. In practice, however, gathering answers from a set of advisors has a cost. Therefore, finding an advisor selection strategy that retrieves a reliable answer and maximizes the overall utility is a challenging problem. To address this problem, we propose a novel strategy for optimally selecting a set of advisers in a sequential binary decision-making setting, where multiple decisions need to be made over time. Crucially, we assume no access to ground truth and no prior knowledge about the reliability of advisers. Specifically, our approach considers how to simultaneously (1) select advisors by balancing the advisors' costs and the value of making correct decisions, (2) learn the trustworthiness of advisers dynamically without prior information by asking multiple advisers, and (3) make optimal decisions without access to the ground truth, improving this over time. We evaluate our algorithm through several numerical experiments. The results show that our approach outperforms two other methods that combine state-of-the-art models.

4.An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations

Authors:Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu

Abstract: Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.

5.Question-Answering System Extracts Information on Injection Drug Use from Clinical Progress Notes

Authors:Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana Martins, Edmon Begoli, Gregory D. Peterson

Abstract: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU information can be indicated are unstructured free-text clinical progress notes. Although natural language processing (NLP) can efficiently extract this information from unstructured data, there are no validated tools. To address this gap in clinical information, we design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical progress notes. Unlike other methods discussed in the literature, the QA model is able to extract various types of information without being constrained by predefined entities, relations, or concepts. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. This paper also demonstrates the QA model's ability to extract IDU-related information on temporally out-of-distribution data. The results indicate that the majority (51%) of the extracted information by the QA model exactly matches the gold-standard answer and 73% of them contain the gold-standard answer with some additional surrounding words.

6.Python Tool for Visualizing Variability of Pareto Fronts over Multiple Runs

Authors:Shuhei Watanabe

Abstract: Hyperparameter optimization is crucial to achieving high performance in deep learning. On top of the performance, other criteria such as inference time or memory requirement often need to be optimized due to some practical reasons. This motivates research on multi-objective optimization (MOO). However, Pareto fronts of MOO methods are often shown without considering the variability caused by random seeds and this makes the performance stability evaluation difficult. Although there is a concept named empirical attainment surface to enable the visualization with uncertainty over multiple runs, there is no major Python package for empirical attainment surface. We, therefore, develop a Python package for this purpose and describe the usage. The package is available at

1.Identify, Estimate and Bound the Uncertainty of Reinforcement Learning for Autonomous Driving

Authors:Weitao Zhou, Zhong Cao, Nanshan Deng, Kun Jiang, Diange Yang

Abstract: Deep reinforcement learning (DRL) has emerged as a promising approach for developing more intelligent autonomous vehicles (AVs). A typical DRL application on AVs is to train a neural network-based driving policy. However, the black-box nature of neural networks can result in unpredictable decision failures, making such AVs unreliable. To this end, this work proposes a method to identify and protect unreliable decisions of a DRL driving policy. The basic idea is to estimate and constrain the policy's performance uncertainty, which quantifies potential performance drop due to insufficient training data or network fitting errors. By constraining the uncertainty, the DRL model's performance is always greater than that of a baseline policy. The uncertainty caused by insufficient data is estimated by the bootstrapped method. Then, the uncertainty caused by the network fitting error is estimated using an ensemble network. Finally, a baseline policy is added as the performance lower bound to avoid potential decision failures. The overall framework is called uncertainty-bound reinforcement learning (UBRL). The proposed UBRL is evaluated on DRL policies with different amounts of training data, taking an unprotected left-turn driving case as an example. The result shows that the UBRL method can identify potentially unreliable decisions of DRL policy. The UBRL guarantees to outperform baseline policy even when the DRL policy is not well-trained and has high uncertainty. Meanwhile, the performance of UBRL improves with more training data. Such a method is valuable for the DRL application on real-road driving and provides a metric to evaluate a DRL policy.

2.Scalable Coupling of Deep Learning with Logical Reasoning

Authors:Marianne Defresne, Sophie Barbe, Thomas Schiex

Abstract: In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. Our loss function solves one of the main limitations of Besag's pseudo-loglikelihood, enabling learning of high energies. We empirically show it is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and \textit{a posteriori} control over predictions.

1.HAHE: Hierarchical Attention for Hyper-Relational Knowledge Graphs in Global and Local Level

Authors:Haoran Luo, Haihong E, Yuhao Yang, Yikai Guo, Mingzhi Sun, Tianyu Yao, Zichen Tang, Kaiyang Wan, Meina Song, Wei Lin

Abstract: Link Prediction on Hyper-relational Knowledge Graphs (HKG) is a worthwhile endeavor. HKG consists of hyper-relational facts (H-Facts), composed of a main triple and several auxiliary attribute-value qualifiers, which can effectively represent factually comprehensive information. The internal structure of HKG can be represented as a hypergraph-based representation globally and a semantic sequence-based representation locally. However, existing research seldom simultaneously models the graphical and sequential structure of HKGs, limiting HKGs' representation. To overcome this limitation, we propose a novel Hierarchical Attention model for HKG Embedding (HAHE), including global-level and local-level attention. The global-level attention can model the graphical structure of HKG using hypergraph dual-attention layers, while the local-level attention can learn the sequential structure inside H-Facts via heterogeneous self-attention layers. Experiment results indicate that HAHE achieves state-of-the-art performance in link prediction tasks on HKG standard datasets. In addition, HAHE addresses the issue of HKG multi-position prediction for the first time, increasing the applicability of the HKG link prediction task. Our code is publicly available.

2.A data-driven rutting depth short-time prediction model with metaheuristic optimization for asphalt pavements based on RIOHTrack

Authors:Zhuoxuan Li, Iakov Korovin, Xinli Shi, Sergey Gorbachev, Nadezhda Gorbacheva, Wei Huang, Jinde Cao

Abstract: Rutting of asphalt pavements is a crucial design criterion in various pavement design guides. A good road transportation base can provide security for the transportation of oil and gas in road transportation. This study attempts to develop a robust artificial intelligence model to estimate different asphalt pavements' rutting depth clips, temperature, and load axes as primary characteristics. The experiment data were obtained from 19 asphalt pavements with different crude oil sources on a 2.038 km long full-scale field accelerated pavement test track (RIOHTrack, Road Track Institute) in Tongzhou, Beijing. In addition, this paper also proposes to build complex networks with different pavement rutting depths through complex network methods and the Louvain algorithm for community detection. The most critical structural elements can be selected from different asphalt pavement rutting data, and similar structural elements can be found. An extreme learning machine algorithm with residual correction (RELM) is designed and optimized using an independent adaptive particle swarm algorithm. The experimental results of the proposed method are compared with several classical machine learning algorithms, with predictions of Average Root Mean Squared Error, Average Mean Absolute Error, and Average Mean Absolute Percentage Error for 19 asphalt pavements reaching 1.742, 1.363, and 1.94\% respectively. The experiments demonstrate that the RELM algorithm has an advantage over classical machine learning methods in dealing with non-linear problems in road engineering. Notably, the method ensures the adaptation of the simulated environment to different levels of abstraction through the cognitive analysis of the production environment parameters.

3.FastDiagP: An Algorithm for Parallelized Direct Diagnosis

Authors:Viet-Man Le, Cristian Vidal Silva, Alexander Felfernig, David Benavides, José Galindo, Thi Ngoc Trang Tran

Abstract: Constraint-based applications attempt to identify a solution that meets all defined user requirements. If the requirements are inconsistent with the underlying constraint set, algorithms that compute diagnoses for inconsistent constraints should be implemented to help users resolve the "no solution could be found" dilemma. FastDiag is a typical direct diagnosis algorithm that supports diagnosis calculation without predetermining conflicts. However, this approach faces runtime performance issues, especially when analyzing complex and large-scale knowledge bases. In this paper, we propose a novel algorithm, so-called FastDiagP, which is based on the idea of speculative programming. This algorithm extends FastDiag by integrating a parallelization mechanism that anticipates and pre-calculates consistency checks requested by FastDiag. This mechanism helps to provide consistency checks with fast answers and boosts the algorithm's runtime performance. The performance improvements of our proposed algorithm have been shown through empirical results using the Linux- configuration knowledge base.

4.Data quality dimensions for fair AI

Authors:Camilla Quaresmini, Giuseppe Primiero

Abstract: AI systems are not intrinsically neutral and biases trickle in any type of technological tool. In particular when dealing with people, AI algorithms reflect technical errors originating with mislabeled data. As they feed wrong and discriminatory classifications, perpetuating structural racism and marginalization, these systems are not systematically guarded against bias. In this article we consider the problem of bias in AI systems from the point of view of Information Quality dimensions. We illustrate potential improvements of a bias mitigation tool in gender classification errors, referring to two typically difficult contexts: the classification of non-binary individuals and the classification of transgender individuals. The identification of data quality dimensions to implement in bias mitigation tool may help achieve more fairness. Hence, we propose to consider this issue in terms of completeness, consistency, timeliness and reliability, and offer some theoretical results.

1.Mixture of personality improved Spiking actor network for efficient multi-agent cooperation

Authors:Xiyun Li, Ziyi Ni, Jingqing Ruan, Linghui Meng, Jing Shi, Tielin Zhang, Bo Xu

Abstract: Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.

2.A Glimpse in ChatGPT Capabilities and its impact for AI research

Authors:Frank Joublin, Antonello Ceravola, Joerg Deigmoeller, Michael Gienger, Mathias Franzius, Julian Eggert

Abstract: Large language models (LLMs) have recently become a popular topic in the field of Artificial Intelligence (AI) research, with companies such as Google, Amazon, Facebook, Amazon, Tesla, and Apple (GAFA) investing heavily in their development. These models are trained on massive amounts of data and can be used for a wide range of tasks, including language translation, text generation, and question answering. However, the computational resources required to train and run these models are substantial, and the cost of hardware and electricity can be prohibitive for research labs that do not have the funding and resources of the GAFA. In this paper, we will examine the impact of LLMs on AI research. The pace at which such models are generated as well as the range of domains covered is an indication of the trend which not only the public but also the scientific community is currently experiencing. We give some examples on how to use such models in research by focusing on GPT3.5/ChatGPT3.4 and ChatGPT4 at the current state and show that such a range of capabilities in a single system is a strong sign of approaching general intelligence. Innovations integrating such models will also expand along the maturation of such AI systems and exhibit unforeseeable applications that will have important impacts on several aspects of our societies.

3.Building Interoperable Electronic Health Records as Purpose-Driven Knowledge Graphs

Authors:Simone Bocca, Alessio Zamboni, Gabor Bella, Yamini Chandrashekar, Mayukh Bagchi, Gabriel Kuper, Paolo Bouquet, Fausto Giunchiglia

Abstract: When building a new application we are increasingly confronted with the need of reusing and integrating pre-existing knowledge. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This is true also in domains, e.g., eHealth, where a lot of effort has been put into developing high-quality standards and reference ontologies, e.g. FHIR1. In this paper, we propose an integrated methodology, called iTelos, which enables data and knowledge reuse towards the construction of Interoperable Electronic Health Records (iEHR). The key intuition is that the data level and the schema level of an application should be developed independently, thus allowing for maximum flexibility in the reuse of the prior knowledge, but under the overall guidance of the needs to be satisfied, formalized as competence queries. This intuition is implemented by codifying all the requirements, including those concerning reuse, as part of a purpose defined a priori, which is then used to drive a middle-out development process where the application schema and data are continuously aligned. The proposed methodology is validated through its application to a large-scale case study.

4.Few-shot Link Prediction on N-ary Facts

Authors:Jiyao Wei, Saiping Guan, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Abstract: N-ary facts composed of a primary triple (head entity, relation, tail entity) and an arbitrary number of auxiliary attribute-value pairs, are prevalent in real-world knowledge graphs (KGs). Link prediction on n-ary facts is to predict a missing element in an n-ary fact. This helps populate and enrich KGs and further promotes numerous downstream applications. Previous studies usually require a substantial amount of high-quality data to understand the elements in n-ary facts. However, these studies overlook few-shot relations, which have limited labeled instances, yet are common in real-world scenarios. Thus, this paper introduces a new task, few-shot link prediction on n-ary facts. It aims to predict a missing entity in an n-ary fact with limited labeled instances. We further propose a model for Few-shot Link prEdict on N-ary facts, thus called FLEN, which consists of three modules: the relation learning, support-specific adjusting, and query inference modules. FLEN captures relation meta information from limited instances to predict a missing entity in a query instance. To validate the effectiveness of FLEN, we construct three datasets based on existing benchmark data. Our experimental results show that FLEN significantly outperforms existing related models in both few-shot link prediction on n-ary facts and binary facts.

1.Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey

Authors:Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, Fabian Suchanek

Abstract: General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric AI. Many of them are constructed pragmatically from Web sources, and are thus far from complete. This poses challenges for the consumption as well as the curation of their content. While several surveys target the problem of completing incomplete KBs, the first problem is arguably to know whether and where the KB is incomplete in the first place, and to which degree. In this survey we discuss how knowledge about completeness, recall, and negation in KBs can be expressed, extracted, and inferred. We cover (i) the logical foundations of knowledge representation and querying under partial closed-world semantics; (ii) the estimation of this information via statistical patterns; (iii) the extraction of information about recall from KBs and text; (iv) the identification of interesting negative statements; and (v) relaxed notions of relative recall. This survey is targeted at two types of audiences: (1) practitioners who are interested in tracking KB quality, focusing extraction efforts, and building quality-aware downstream applications; and (2) data management, knowledge base and semantic web researchers who wish to understand the state of the art of knowledge bases beyond the open-world assumption. Consequently, our survey presents both fundamental methodologies and their working, and gives practice-oriented recommendations on how to choose between different approaches for a problem at hand.

2.Measuring Rule-based LTLf Process Specifications: A Probabilistic Data-driven Approach

Authors:Alessio Cecconi, Luca Barbaro, Claudio Di Ciccio, Arik Senderovich

Abstract: Declarative process specifications define the behavior of processes by means of rules based on Linear Temporal Logic on Finite Traces (LTLf). In a mining context, these specifications are inferred from, and checked on, multi-sets of runs recorded by information systems (namely, event logs). To this end, being able to gauge the degree to which process data comply with a specification is key. However, existing mining and verification techniques analyze the rules in isolation, thereby disregarding their interplay. In this paper, we introduce a framework to devise probabilistic measures for declarative process specifications. Thereupon, we propose a technique that measures the degree of satisfaction of specifications over event logs. To assess our approach, we conduct an evaluation with real-world data, evidencing its applicability in discovery, checking, and drift detection contexts.

3.Egocentric Hierarchical Visual Semantics

Authors:Luca Erculiani, Andrea Bontempelli, Andrea Passerini, Fausto Giunchiglia

Abstract: We are interested in aligning how people think about objects and what machines perceive, meaning by this the fact that object recognition, as performed by a machine, should follow a process which resembles that followed by humans when thinking of an object associated with a certain concept. The ultimate goal is to build systems which can meaningfully interact with their users, describing what they perceive in the users' own terms. As from the field of Lexical Semantics, humans organize the meaning of words in hierarchies where the meaning of, e.g., a noun, is defined in terms of the meaning of a more general noun, its genus, and of one or more differentiating properties, its differentia. The main tenet of this paper is that object recognition should implement a hierarchical process which follows the hierarchical semantic structure used to define the meaning of words. We achieve this goal by implementing an algorithm which, for any object, recursively recognizes its visual genus and its visual differentia. In other words, the recognition of an object is decomposed in a sequence of steps where the locally relevant visual features are recognized. This paper presents the algorithm and a first evaluation.

4.Distributional Multi-Objective Decision Making

Authors:Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers

Abstract: For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.

5.Representation Learning for Person or Entity-centric Knowledge Graphs: an application in Healthcare

Authors:Christos Theodoropoulos, Natasha Mulligan, Thaddeus Stappenbeck, Joao Bettencourt-Silva

Abstract: Knowledge graphs (KGs) are a popular way to organise information based on ontologies or schemas and have been used across a variety of scenarios from search to recommendation. Despite advances in KGs, representing knowledge remains a non-trivial task across industries and it is especially challenging in the biomedical and healthcare domains due to complex interdependent relations between entities, heterogeneity, lack of standardization, and sparseness of data. KGs are used to discover diagnoses or prioritize genes relevant to disease, but they often rely on schemas that are not centred around a node or entity of interest, such as a person. Entity-centric KGs are relatively unexplored but hold promise in representing important facets connected to a central node and unlocking downstream tasks beyond graph traversal and reasoning, such as generating graph embeddings and training graph neural networks for a wide range of predictive tasks. This paper presents an end-to-end representation learning framework to extract entity-centric KGs from structured and unstructured data. We introduce a star-shaped ontology to represent the multiple facets of a person and use it to guide KG creation. Compact representations of the graphs are created leveraging graph neural networks and experiments are conducted using different levels of heterogeneity or explicitness. A readmission prediction task is used to evaluate the results of the proposed framework, showing a stable system, robust to missing data, that outperforms a range of baseline machine learning classifiers. We highlight that this approach has several potential applications across domains and is open-sourced. Lastly, we discuss lessons learned, challenges, and next steps for the adoption of the framework in practice.

1.Adaptive Learning Path Navigation Based on Knowledge Tracing and Reinforcement Learning

Authors:Jyun-Yi Chen, Saeed Saeedvand, I-Wei Lai

Abstract: This paper introduces the Adaptive Learning Path Navigation (ALPN) system, a scalable approach for creating adaptive learning paths within E-learning systems. The ALPN system employs an attention-based Knowledge Tracing (AKT) model to evaluate students' knowledge states and a decision-making model using Proximal Policy Optimization (PPO) to suggest customized learning materials. The proposed system accommodates students' needs by considering personalization parameters such as learning objectives, time constraints, and knowledge backgrounds. Through an iterative process of recommendation and knowledge state updating, the ALPN system produces highly adaptive learning paths. Experimental results reveal the outstanding performance of the proposed system, providing good insights into the future development of E-learning systems.

2.Multi-source Education Knowledge Graph Construction and Fusion for College Curricula

Authors:Zeju Li, Linya Cheng, Chunhong Zhang, Xinning Zhu, Hui Zhao

Abstract: The field of education has undergone a significant transformation due to the rapid advancements in Artificial Intelligence (AI). Among the various AI technologies, Knowledge Graphs (KGs) using Natural Language Processing (NLP) have emerged as powerful visualization tools for integrating multifaceted information. In the context of university education, the availability of numerous specialized courses and complicated learning resources often leads to inferior learning outcomes for students. In this paper, we propose an automated framework for knowledge extraction, visual KG construction, and graph fusion, tailored for the major of Electronic Information. Furthermore, we perform data analysis to investigate the correlation degree and relationship between courses, rank hot knowledge concepts, and explore the intersection of courses. Our objective is to enhance the learning efficiency of students and to explore new educational paradigms enabled by AI. The proposed framework is expected to enable students to better understand and appreciate the intricacies of their field of study by providing them with a comprehensive understanding of the relationships between the various concepts and courses.

1.GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

Authors:Noah Hollmann, Samuel Müller, Frank Hutter

Abstract: As the field of automated machine learning (AutoML) advances, it becomes increasingly important to include domain knowledge within these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to generate additional semantically meaningful features for tabular datasets based on their descriptions. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE enhances performance on 11 out of 14 datasets, ties on 2 and looses on 1 - boosting mean ROC AUC performance from 0.798 to 0.822 across all datasets. On the evaluated datasets, this improvement is similar to the average improvement achieved by using a random forest (AUC 0.782) instead of logistic regression (AUC 0.754). Furthermore, our method offers valuable insights into the rationale behind the generated features by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive (semi-)automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems. For reproducability, we release our code and a simple demo.

2.Assessing Trustworthiness of Autonomous Systems

Authors:Gregory Chance, Dhaminda B. Abeywickrama, Beckett LeClair, Owen Kerr, Kerstin Eder

Abstract: As Autonomous Systems (AS) become more ubiquitous in society, more responsible for our safety and our interaction with them more frequent, it is essential that they are trustworthy. Assessing the trustworthiness of AS is a mandatory challenge for the verification and development community. This will require appropriate standards and suitable metrics that may serve to objectively and comparatively judge trustworthiness of AS across the broad range of current and future applications. The meta-expression `trustworthiness' is examined in the context of AS capturing the relevant qualities that comprise this term in the literature. Recent developments in standards and frameworks that support assurance of autonomous systems are reviewed. A list of key challenges are identified for the community and we present an outline of a process that can be used as a trustworthiness assessment framework for AS.

3.Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects

Authors:Kehui Tan, Tianqi Pang, Chenyou Fan

Abstract: This perspective paper proposes a series of interactive scenarios that utilize Artificial Intelligence (AI) to enhance classroom teaching, such as dialogue auto-completion, knowledge and style transfer, and assessment of AI-generated content. By leveraging recent developments in Large Language Models (LLMs), we explore the potential of AI to augment and enrich teacher-student dialogues and improve the quality of teaching. Our goal is to produce innovative and meaningful conversations between teachers and students, create standards for evaluation, and improve the efficacy of AI-for-Education initiatives. In Section 3, we discuss the challenges of utilizing existing LLMs to effectively complete the educated tasks and present a unified framework for addressing diverse education dataset, processing lengthy conversations, and condensing information to better accomplish more downstream tasks. In Section 4, we summarize the pivoting tasks including Teacher-Student Dialogue Auto-Completion, Expert Teaching Knowledge and Style Transfer, and Assessment of AI-Generated Content (AIGC), providing a clear path for future research. In Section 5, we also explore the use of external and adjustable LLMs to improve the generated content through human-in-the-loop supervision and reinforcement learning. Ultimately, this paper seeks to highlight the potential for AI to aid the field of education and promote its further exploration.

4.Improving LaCAM for Scalable Eventually Optimal Multi-Agent Pathfinding

Authors:Keisuke Okumura

Abstract: This study extends the recently-developed LaCAM algorithm for multi-agent pathfinding (MAPF). LaCAM is a sub-optimal search-based algorithm that uses lazy successor generation to dramatically reduce the planning effort. We present two enhancements. First, we propose its anytime version, called LaCAM*, which eventually converges to optima, provided that solution costs are accumulated transition costs. Second, we improve the successor generation to quickly obtain initial solutions. Exhaustive experiments demonstrate their utility. For instance, LaCAM* sub-optimally solved 99% of the instances retrieved from the MAPF benchmark, where the number of agents varied up to a thousand, within ten seconds on a standard desktop PC, while ensuring eventual convergence to optima; developing a new horizon of MAPF algorithms.

5.Causal Discovery with Stage Variables for Health Time Series

Authors:Bharat Srikishan, Samantha Kleinberg

Abstract: Using observational data to learn causal relationships is essential when randomized experiments are not possible, such as in healthcare. Discovering causal relationships in time-series health data is even more challenging when relationships change over the course of a disease, such as medications that are most effective early on or for individuals with severe disease. Stage variables such as weeks of pregnancy, disease stages, or biomarkers like HbA1c, can influence what causal relationships are true for a patient. However, causal inference within each stage is often not possible due to limited amounts of data, and combining all data risks incorrect or missed inferences. To address this, we propose Causal Discovery with Stage Variables (CDSV), which uses stage variables to reweight data from multiple time-series while accounting for different causal relationships in each stage. In simulated data, CDSV discovers more causes with fewer false discoveries compared to baselines, in eICU it has a lower FDR than baselines, and in MIMIC-III it discovers more clinically relevant causes of high blood pressure.

1.Human Values in Multiagent Systems

Authors:Nardine Osman, Mark d'Inverno

Abstract: One of the major challenges we face with ethical AI today is developing computational systems whose reasoning and behaviour are provably aligned with human values. Human values, however, are notorious for being ambiguous, contradictory and ever-changing. In order to bridge this gap, and get us closer to the situation where we can formally reason about implementing values into AI, this paper presents a formal representation of values, grounded in the social sciences. We use this formal representation to articulate the key challenges for achieving value-aligned behaviour in multiagent systems (MAS) and a research roadmap for addressing them.

2.A computational framework of human values for ethical AI

Authors:Nardine Osman, Mark d'Inverno

Abstract: In the diverse array of work investigating the nature of human values from psychology, philosophy and social sciences, there is a clear consensus that values guide behaviour. More recently, a recognition that values provide a means to engineer ethical AI has emerged. Indeed, Stuart Russell proposed shifting AI's focus away from simply ``intelligence'' towards intelligence ``provably aligned with human values''. This challenge -- the value alignment problem -- with others including an AI's learning of human values, aggregating individual values to groups, and designing computational mechanisms to reason over values, has energised a sustained research effort. Despite this, no formal, computational definition of values has yet been proposed. We address this through a formal conceptual framework rooted in the social sciences, that provides a foundation for the systematic, integrated and interdisciplinary investigation into how human values can support designing ethical AI.

1.VSRQ: Quantitative Assessment Method for Safety Risk of Vehicle Intelligent Connected System

Authors:Tian Zhang, Wenshan Guan, Hao Miao, Xiujie Huang, Zhiquan Liu, Chaonan Wang, Quanlong Guan, Liangda Fang, Zhifei Duan

Abstract: The field of intelligent connected in modern vehicles continues to expand, and the functions of vehicles become more and more complex with the development of the times. This has also led to an increasing number of vehicle vulnerabilities and many safety issues. Therefore, it is particularly important to identify high-risk vehicle intelligent connected systems, because it can inform security personnel which systems are most vulnerable to attacks, allowing them to conduct more thorough inspections and tests. In this paper, we develop a new model for vehicle risk assessment by combining I-FAHP with FCA clustering: VSRQ model. We extract important indicators related to vehicle safety, use fuzzy cluster analys (FCA) combined with fuzzy analytic hierarchy process (FAHP) to mine the vulnerable components of the vehicle intelligent connected system, and conduct priority testing on vulnerable components to reduce risks and ensure vehicle safety. We evaluate the model on OpenPilot and experimentally demonstrate the effectiveness of the VSRQ model in identifying the safety of vehicle intelligent connected systems. The experiment fully complies with ISO 26262 and ISO/SAE 21434 standards, and our model has a higher accuracy rate than other models. These results provide a promising new research direction for predicting the security risks of vehicle intelligent connected systems and provide typical application tasks for VSRQ. The experimental results show that the accuracy rate is 94.36%, and the recall rate is 73.43%, which is at least 14.63% higher than all other known indicators.

2.Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

Authors:Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao

Abstract: With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applications. However, the overall impact of AI on agrifood systems remains unclear. In this paper, we thoroughly review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry. Firstly, we summarize the data acquisition methods in agrifood systems, including acquisition, storage, and processing techniques. Secondly, we present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery, covering topics such as agrifood classification, growth monitoring, yield prediction, and quality assessment. Furthermore, we highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI. We hope this survey could offer an overall picture to newcomers in the field and serve as a starting point for their further research.

3.Human Machine Co-adaption Interface via Cooperation Markov Decision Process System

Authors:Kairui Guo, Adrian Cheng, Yaqi Li, Jun Li, Rob Duffield, Steven W. Su

Abstract: This paper aims to develop a new human-machine interface to improve rehabilitation performance from the perspective of both the user (patient) and the machine (robot) by introducing the co-adaption techniques via model-based reinforcement learning. Previous studies focus more on robot assistance, i.e., to improve the control strategy so as to fulfill the objective of Assist-As-Needed. In this study, we treat the full process of robot-assisted rehabilitation as a co-adaptive or mutual learning process and emphasize the adaptation of the user to the machine. To this end, we proposed a Co-adaptive MDPs (CaMDPs) model to quantify the learning rates based on cooperative multi-agent reinforcement learning (MARL) in the high abstraction layer of the systems. We proposed several approaches to cooperatively adjust the Policy Improvement among the two agents in the framework of Policy Iteration. Based on the proposed co-adaptive MDPs, the simulation study indicates the non-stationary problem can be mitigated using various proposed Policy Improvement approaches.

4.An Ontology Design Pattern for Role-Dependent Names

Authors:Rushrukh Rayan, Cogan Shimizu, Pascal Hitzler

Abstract: We present an ontology design pattern for modeling Names as part of Roles, to capture scenarios where an Agent performs different Roles using different Names associated with the different Roles. Examples of an Agent performing a Role using different Names are rather ubiquitous, e.g., authors who write under different pseudonyms, or different legal names for citizens of more than one country. The proposed pattern is a modified merger of a standard Agent Role and a standard Name pattern stub.

5.Why Oatmeal is Cheap: Kolmogorov Complexity and Procedural Generation

Authors:Younès Rabii, Michael Cook

Abstract: Although procedural generation is popular among game developers, academic research on the topic has primarily focused on new applications, with some research into empirical analysis. In this paper we relate theoretical work in information theory to the generation of content for games. We prove that there is a relationship between the Kolomogorov complexity of the most complex artifact a generator can produce, and the size of that generator's possibility space. In doing so, we identify the limiting relationship between the knowledge encoded in a generator, the density of its output space, and the intricacy of the artifacts it produces. We relate our result to the experience of expert procedural generator designers, and illustrate it with some examples.

6.Continual Reasoning: Non-Monotonic Reasoning in Neurosymbolic AI using Continual Learning

Authors:Sofoklis Kyriakopoulos, Artur S. d'Avila Garcez

Abstract: Despite the extensive investment and impressive recent progress at reasoning by similarity, deep learning continues to struggle with more complex forms of reasoning such as non-monotonic and commonsense reasoning. Non-monotonicity is a property of non-classical reasoning typically seen in commonsense reasoning, whereby a reasoning system is allowed (differently from classical logic) to jump to conclusions which may be retracted later, when new information becomes available. Neural-symbolic systems such as Logic Tensor Networks (LTN) have been shown to be effective at enabling deep neural networks to achieve reasoning capabilities. In this paper, we show that by combining a neural-symbolic system with methods from continual learning, LTN can obtain a higher level of accuracy when addressing non-monotonic reasoning tasks. Continual learning is added to LTNs by adopting a curriculum of learning from knowledge and data with recall. We call this process Continual Reasoning, a new methodology for the application of neural-symbolic systems to reasoning tasks. Continual Reasoning is applied to a prototypical non-monotonic reasoning problem as well as other reasoning examples. Experimentation is conducted to compare and analyze the effects that different curriculum choices may have on overall learning and reasoning results. Results indicate significant improvement on the prototypical non-monotonic reasoning problem and a promising outlook for the proposed approach on statistical relational learning examples.

7.Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems

Authors:Stefan Kramer, Mattia Cerrato, Sašo Džeroski, Ross King

Abstract: The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.

8.Contextual Reasoning for Scene Generation (Technical Report)

Authors:Loris Bozzato, Thomas Eiter, Rafael Kiesel, Daria Stepanova

Abstract: We present a continuation to our previous work, in which we developed the MR-CKR framework to reason with knowledge overriding across contexts organized in multi-relational hierarchies. Reasoning is realized via ASP with algebraic measures, allowing for flexible definitions of preferences. In this paper, we show how to apply our theoretical work to real autonomous-vehicle scene data. Goal of this work is to apply MR-CKR to the problem of generating challenging scenes for autonomous vehicle learning. In practice, most of the scene data for AV learning models common situations, thus it might be difficult to capture cases where a particular situation occurs (e.g. partial occlusions of a crossing pedestrian). The MR-CKR model allows for data organization exploiting the multi-dimensionality of such data (e.g., temporal and spatial). Reasoning over multiple contexts enables the verification and configuration of scenes, using the combination of different scene ontologies. We describe a framework for semantically guided data generation, based on a combination of MR-CKR and Algebraic Measures. The framework is implemented in a proof-of-concept prototype exemplifying some cases of scene generation.

9.Calibrated Explanations: with Uncertainty Information and Counterfactuals

Authors:Helena Lofstrom, Tuwe Lofstrom, Ulf Johansson, Cecilia Sonstrod

Abstract: Artificial Intelligence (AI) has become an integral part of decision support systems (DSSs) in various domains, but the lack of transparency in the predictive models used in AI-based DSSs can lead to misuse or disuse. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance, but they suffer from drawbacks such as instability. To address these issues, we propose a new feature importance explanation method, Calibrated Explanations (CE), which is based on Venn-Abers and calibrates the underlying model while generating feature importance explanations. CE provides fast, reliable, stable, and robust explanations, along with uncertainty quantification of the probability estimates and feature importance weights. Furthermore, the method is model agnostic with easily understood conditional rules and can also generate counterfactual explanations with uncertainty quantification.

1.An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework

Authors:JaeYoon Kim, Junyu Xuan, Christy Liang, Farookh Hussain

Abstract: Most exploration research on reinforcement learning (RL) has paid attention to `the way of exploration', which is `how to explore'. The other exploration research, `when to explore', has not been the main focus of RL exploration research. \textcolor{black}{The issue of `when' of a monolithic exploration in the usual RL exploration behaviour binds an exploratory action to an exploitational action of an agent. Recently, a non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals.} The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously. We describe the initial research of an autonomous multi-mode exploration of non-monolithic behaviour in an options framework. The higher performance of our method is shown against the existing non-monolithic exploration method through comparative experimental results.

2.Uncertain Machine Ethical Decisions Using Hypothetical Retrospection

Authors:Simon Kolker, Louise Dennis, Ramon Fraga Pereira, Mengwei Xu

Abstract: We propose the use of the hypothetical retrospection argumentation procedure, developed by Sven Hansson, to improve existing approaches to machine ethical reasoning by accounting for probability and uncertainty from a position of Philosophy that resonates with humans. Actions are represented with a branching set of potential outcomes, each with a state, utility, and either a numeric or poetic probability estimate. Actions are chosen based on comparisons between sets of arguments favouring actions from the perspective of their branches, even those branches that led to an undesirable outcome. This use of arguments allows a variety of philosophical theories for ethical reasoning to be used, potentially in flexible combination with each other. We implement the procedure, applying consequentialist and deontological ethical theories, independently and concurrently, to an autonomous library system use case. We introduce a a preliminary framework that seems to meet the varied requirements of a machine ethics system: versatility under multiple theories and a resonance with humans that enables transparency and explainability.

3.Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

Authors:Axel Abels, Tom Lenaerts, Vito Trianni, Ann Nowé

Abstract: Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. We provide here new algorithms that explicitly consider and adapt to the relationship between problem instances and experts' knowledge. We first propose and highlight the drawbacks of a naive approach based on nearest neighbor queries. To address these drawbacks we then introduce a novel algorithm - expertise trees - that constructs decision trees enabling the learner to select appropriate models. We provide theoretical insights and empirically validate the improved performance of our novel approach on a range of problems for which existing methods proved to be inadequate.

1.Neurosymbolic AI - Why, What, and How

Authors:Amit Sheth, Kaushik Roy, Manas Gaur

Abstract: Humans interact with the environment using a combination of perception - transforming sensory inputs from their environment into symbols, and cognition - mapping symbols to knowledge about the environment for supporting abstraction, reasoning by analogy, and long-term planning. Human perception-inspired machine perception, in the context of AI, refers to large-scale pattern recognition from raw data using neural networks trained using self-supervised learning objectives such as next-word prediction or object recognition. On the other hand, machine cognition encompasses more complex computations, such as using knowledge of the environment to guide reasoning, analogy, and long-term planning. Humans can also control and explain their cognitive functions. This seems to require the retention of symbolic mappings from perception outputs to knowledge about their environment. For example, humans can follow and explain the guidelines and safety constraints driving their decision-making in safety-critical applications such as healthcare, criminal justice, and autonomous driving. This article introduces the rapidly emerging paradigm of Neurosymbolic AI combines neural networks and knowledge-guided symbolic approaches to create more capable and flexible AI systems. These systems have immense potential to advance both algorithm-level (e.g., abstraction, analogy, reasoning) and application-level (e.g., explainable and safety-constrained decision-making) capabilities of AI systems.

2.Explanation through Reward Model Reconciliation using POMDP Tree Search

Authors:Benjamin D. Kraske, Anshu Saksena, Anna L. Buczak, Zachary N. Sunberg

Abstract: As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (POMDP) planning and the implicit reward model assumed by a human user. Action discrepancies, differences in decisions made by an algorithm and user, are leveraged to estimate a user's objectives as expressed in weightings of a reward function.

1.Let the Chart Spark: Embedding Semantic Context into Chart with Text-to-Image Generative Model

Authors:Shishi Xiao, Suizi Huang, Yue Lin, Yilin Ye, Wei Zeng

Abstract: Pictorial visualization seamlessly integrates data and semantic context into visual representation, conveying complex information in a manner that is both engaging and informative. Extensive studies have been devoted to developing authoring tools to simplify the creation of pictorial visualizations. However, mainstream works mostly follow a retrieving-and-editing pipeline that heavily relies on retrieved visual elements from a dedicated corpus, which often compromise the data integrity. Text-guided generation methods are emerging, but may have limited applicability due to its predefined recognized entities. In this work, we propose ChartSpark, a novel system that embeds semantic context into chart based on text-to-image generative model. ChartSpark generates pictorial visualizations conditioned on both semantic context conveyed in textual inputs and data information embedded in plain charts. The method is generic for both foreground and background pictorial generation, satisfying the design practices identified from an empirical research into existing pictorial visualizations. We further develop an interactive visual interface that integrates a text analyzer, editing module, and evaluation module to enable users to generate, modify, and assess pictorial visualizations. We experimentally demonstrate the usability of our tool, and conclude with a discussion of the potential of using text-to-image generative model combined with interactive interface for visualization design.

2.Imbalanced Node Classification Beyond Homophilic Assumption

Authors:Jie Liu, Mengting He, Guangtao Wang, Nguyen Quoc Viet Hung, Xuequn Shang, Hongzhi Yin

Abstract: Imbalanced node classification widely exists in real-world networks where graph neural networks (GNNs) are usually highly inclined to majority classes and suffer from severe performance degradation on classifying minority class nodes. Various imbalanced node classification methods have been proposed recently which construct synthetic nodes and edges w.r.t. minority classes to balance the label and topology distribution. However, they are all based on the homophilic assumption that nodes of the same label tend to connect despite the wide existence of heterophilic edges in real-world graphs. Thus, they uniformly aggregate features from both homophilic and heterophilic neighbors and rely on feature similarity to generate synthetic edges, which cannot be applied to imbalanced graphs in high heterophily. To address this problem, we propose a novel GraphSANN for imbalanced node classification on both homophilic and heterophilic graphs. Firstly, we propose a unified feature mixer to generate synthetic nodes with both homophilic and heterophilic interpolation in a unified way. Next, by randomly sampling edges between synthetic nodes and existing nodes as candidate edges, we design an adaptive subgraph extractor to adaptively extract the contextual subgraphs of candidate edges with flexible ranges. Finally, we develop a multi-filter subgraph encoder that constructs different filter channels to discriminatively aggregate neighbor's information along the homophilic and heterophilic edges. Extensive experiments on eight datasets demonstrate the superiority of our model for imbalanced node classification on both homophilic and heterophilic graphs.

3.MultiZenoTravel: a Tunable Benchmark for Multi-Objective Planning with Known Pareto Front

Authors:Alexandre Quemy, Marc Schoenauer, Johann Dreo

Abstract: Multi-objective AI planning suffers from a lack of benchmarks exhibiting known Pareto Fronts. In this work, we propose a tunable benchmark generator, together with a dedicated solver that provably computes the true Pareto front of the resulting instances. First, we prove a proposition allowing us to characterize the optimal plans for a constrained version of the problem, and then show how to reduce the general problem to the constrained one. Second, we provide a constructive way to find all the Pareto-optimal plans and discuss the complexity of the algorithm. We provide an implementation that allows the solver to handle realistic instances in a reasonable time. Finally, as a practical demonstration, we used this solver to find all Pareto-optimal plans between the two largest airports in the world, considering the routes between the 50 largest airports, spherical distances between airports and a made-up risk.

4.Prompt Engineering for Healthcare: Methodologies and Applications

Authors:Jiaqi Wang, Enze Shi, Sigang Yu, Zihao Wu, Chong Ma, Haixing Dai, Qiushi Yang, Yanqing Kang, Jinru Wu, Huawen Hu, Chenxi Yue, Haiyang Zhang, Yiheng Liu, Xiang Li, Bao Ge, Dajiang Zhu, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang

Abstract: This review will introduce the latest advances in prompt engineering in the field of natural language processing (NLP) for the medical domain. First, we will provide a brief overview of the development of prompt engineering and emphasize its significant contributions to healthcare NLP applications such as question-answering systems, text summarization, and machine translation. With the continuous improvement of general large language models, the importance of prompt engineering in the healthcare domain is becoming increasingly prominent. The aim of this article is to provide useful resources and bridges for healthcare NLP researchers to better explore the application of prompt engineering in this field. We hope that this review can provide new ideas and inspire ample possibilities for research and application in medical NLP.

5.NeuralKG-ind: A Python Library for Inductive Knowledge Graph Representation Learning

Authors:Wen Zhang, Zhen Yao, Mingyang Chen, Zhiwei Huang, Huajun Chen

Abstract: Since the dynamic characteristics of knowledge graphs, many inductive knowledge graph representation learning (KGRL) works have been proposed in recent years, focusing on enabling prediction over new entities. NeuralKG-ind is the first library of inductive KGRL as an important update of NeuralKG library. It includes standardized processes, rich existing methods, decoupled modules, and comprehensive evaluation metrics. With NeuralKG-ind, it is easy for researchers and engineers to reproduce, redevelop, and compare inductive KGRL methods. The library, experimental methodologies, and model re-implementing results of NeuralKG-ind are all publicly released at .

6.Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Authors:Eneko Osaba, Esther Villar-Rodriguez, Sebastián V. Romero

Abstract: In this paper, a benchmark for real-world bin packing problems is proposed. This dataset is composed of 12 instances comprehending different levels of problem complexity regarding size (with the number of packages ranging from 38 to 53) and user-defined requirements. In fact, several real-world oriented restrictions have been considered for building these instances: i) items and bins dimensions, ii) weight restrictions, iii) affinities among packages categories iv) preferences for package ordering and v) load balancing. Besides the data, we also provide an own-developed Python script for the dataset generation, coined as Q4RealBPP-DataGen. The benchmark was firstly proposed to evaluate quantum solvers, therefore the characteristic of this set of instances were designed according to the current limitations of quantum devices. Additionally, the dataset generator is included to allow the construction of general-purpose benchmarks. The data introduced on this paper provides a baseline that will encourage quantum computing researchers to work on real-world bin packing problems

7.LitCQD: Multi-Hop Reasoning in Incomplete Knowledge Graphs with Numeric Literals

Authors:Caglar Demir, Michel Wiebesiek, Renzhong Lu, Axel-Cyrille Ngonga Ngomo, Stefan Heindorf

Abstract: Most real-world knowledge graphs, including Wikidata, DBpedia, and Yago are incomplete. Answering queries on such incomplete graphs is an important, but challenging problem. Recently, a number of approaches, including complex query decomposition (CQD), have been proposed to answer complex, multi-hop queries with conjunctions and disjunctions on such graphs. However, all state-of-the-art approaches only consider graphs consisting of entities and relations, neglecting literal values. In this paper, we propose LitCQD -- an approach to answer complex, multi-hop queries where both the query and the knowledge graph can contain numeric literal values: LitCQD can answer queries having numerical answers or having entity answers satisfying numerical constraints. For example, it allows to query (1)~persons living in New York having a certain age, and (2)~the average age of persons living in New York. We evaluate LitCQD on query types with and without literal values. To evaluate LitCQD, we generate complex, multi-hop queries and their expected answers on a version of the FB15k-237 dataset that was extended by literal values.

8.A New Class of Explanations for Classifiers with Non-Binary Features

Authors:Chunxi Ji, Adnan Darwiche

Abstract: Two types of explanations have received significant attention in the literature recently when analyzing the decisions made by classifiers. The first type explains why a decision was made and is known as a sufficient reason for the decision, also an abductive or PI-explanation. The second type explains why some other decision was not made and is known as a necessary reason for the decision, also a contrastive or counterfactual explanation. These explanations were defined for classifiers with binary, discrete and, in some cases, continuous features. We show that these explanations can be significantly improved in the presence of non-binary features, leading to a new class of explanations that relay more information about decisions and the underlying classifiers. Necessary and sufficient reasons were also shown to be the prime implicates and implicants of the complete reason for a decision, which can be obtained using a quantification operator. We show that our improved notions of necessary and sufficient reasons are also prime implicates and implicants but for an improved notion of complete reason obtained by a new quantification operator that we define and study in this paper.

9.Metric Temporal Equilibrium Logic over Timed Traces

Authors:Arvid Becker, Pedro Cabalar, Martín Diéguez, Torsten Schaub, Anna Schuhmann

Abstract: In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. However, timing constraints are important in many applications like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time temporal equilibrium logic, in which temporal operators are constrained by intervals over natural numbers. The resulting Metric Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. To this end, we define a translation of metric formulas into monadic first-order formulas and give a correspondence between their models in Metric Equilibrium Logic and Monadic Quantified Equilibrium Logic, respectively. Interestingly, our translation provides a blue print for implementation in terms of ASP modulo difference constraints.

10.Comparison of SAT-based and ASP-based Algorithms for Inconsistency Measurement

Authors:Isabelle Kuhlmann, Anna Gessler, Vivien Laszlo, Matthias Thimm

Abstract: We present algorithms based on satisfiability problem (SAT) solving, as well as answer set programming (ASP), for solving the problem of determining inconsistency degrees in propositional knowledge bases. We consider six different inconsistency measures whose respective decision problems lie on the first level of the polynomial hierarchy. Namely, these are the contension inconsistency measure, the forgetting-based inconsistency measure, the hitting set inconsistency measure, the max-distance inconsistency measure, the sum-distance inconsistency measure, and the hit-distance inconsistency measure. In an extensive experimental analysis, we compare the SAT-based and ASP-based approaches with each other, as well as with a set of naive baseline algorithms. Our results demonstrate that overall, both the SAT-based and the ASP-based approaches clearly outperform the naive baseline methods in terms of runtime. The results further show that the proposed ASP-based approaches perform superior to the SAT-based ones with regard to all six inconsistency measures considered in this work. Moreover, we conduct additional experiments to explain the aforementioned results in greater detail.

11.Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Authors:Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, Korinna Bade

Abstract: Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.

12.Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers

Authors:Johannes Czech, Jannis Blüml, Kristian Kersting

Abstract: While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero.

13.Are Emergent Abilities of Large Language Models a Mirage?

Authors:Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

Abstract: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not. Thus, our alternative suggests that existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale. We present our explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how similar metric decisions suggest apparent emergent abilities on vision tasks in diverse deep network architectures (convolutional, autoencoder, transformers). In all three analyses, we find strong supporting evidence that emergent abilities may not be a fundamental property of scaling AI models.

1.Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

Authors:Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin

Abstract: Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

2.Interpretable Neural-Symbolic Concept Reasoning

Authors:Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Charlotte Magister, Alberto Tonda, Pietro Lio', Frederic Precioso, Mateja Jamnik, Giuseppe Marra

Abstract: Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. In DCR, neural networks do not make task predictions directly, but they build syntactic rule structures using concept embeddings. DCR then executes these rules on meaningful concept truth degrees to provide a final interpretable and semantically-consistent prediction in a differentiable manner. Our experiments show that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance.

3.Categorical Foundations of Explainable AI: A Unifying Formalism of Structures and Semantics

Authors:Pietro Barbiero, Stefano Fioravanti, Francesco Giannini, Alberto Tonda, Pietro Lio, Elena Di Lavore

Abstract: Explainable AI (XAI) aims to answer ethical and legal questions associated with the deployment of AI models. However, a considerable number of domain-specific reviews highlight the need of a mathematical foundation for the key notions in the field, considering that even the term "explanation" still lacks a precise definition. These reviews also advocate for a sound and unifying formalism for explainable AI, to avoid the emergence of ill-posed questions, and to help researchers navigate a rapidly growing body of knowledge. To the authors knowledge, this paper is the first attempt to fill this gap by formalizing a unifying theory of XAI. Employing the framework of category theory, and feedback monoidal categories in particular, we first provide formal definitions for all essential terms in explainable AI. Then we propose a taxonomy of the field following the proposed structure, showing how the introduced theory can be used to categorize all the main classes of XAI systems currently studied in literature. In summary, the foundation of XAI proposed in this paper represents a significant tool to properly frame future research lines, and a precious guidance for new researchers approaching the field.

4.Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

Authors:Junlin Lu, Patrick Mannion, Karl Mason

Abstract: Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.

5.Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

Authors:Junlin Lu

Abstract: It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations. The algorithm is evaluated in three environments with two baseline methods. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system, where one of the agents is enabled to infer the preferences of an opponent using our preference inference algorithm.

6.Why not both? Complementing explanations with uncertainty, and the role of self-confidence in Human-AI collaboration

Authors:Ioannis Papantonis, Vaishak Belle

Abstract: AI and ML models have already found many applications in critical domains, such as healthcare and criminal justice. However, fully automating such high-stakes applications can raise ethical or fairness concerns. Instead, in such cases, humans should be assisted by automated systems so that the two parties reach a joint decision, stemming out of their interaction. In this work we conduct an empirical study to identify how uncertainty estimates and model explanations affect users' reliance, understanding, and trust towards a model, looking for potential benefits of bringing the two together. Moreover, we seek to assess how users' behaviour is affected by their own self-confidence in their abilities to perform a certain task, while we also discuss how the latter may distort the outcome of an analysis based on agreement and switching percentages.

7.Standpoint Linear Temporal Logic

Authors:Nicola Gigante, Lucia {Gomez Alvarez}, Tim S. Lyon

Abstract: Many complex scenarios require the coordination of agents possessing unique points of view and distinct semantic commitments. In response, standpoint logic (SL) was introduced in the context of knowledge integration, allowing one to reason with diverse and potentially conflicting viewpoints by means of indexed modalities. Another multi-modal logic of import is linear temporal logic (LTL) - a formalism used to express temporal properties of systems and processes, having prominence in formal methods and fields related to artificial intelligence. In this paper, we present standpoint linear temporal logic (SLTL), a new logic that combines the temporal features of LTL with the multi-perspective modelling capacity of SL. We define the logic SLTL, its syntax, and its semantics, establish its decidability and complexity, and provide a terminating tableau calculus to automate SLTL reasoning. Conveniently, this offers a clear path to extend existing LTL reasoners with practical reasoning support for temporal reasoning in multi-perspective settings.

8.Large Language Models Are State-of-the-Art Evaluators of Code Generation

Authors:Terry Yue Zhuo

Abstract: Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine translation and summarization, their applicability in code generation tasks remains limited without human involvement. The complexity of programming concepts required for such tasks makes it difficult to develop evaluation metrics that align with human judgment. Token-matching-based metrics, such as BLEU, have demonstrated weak correlations with human practitioners in code generation tasks. Moreover, the utilization of human-written test suites to evaluate functional correctness can be challenging in domains with low resources. To overcome these obstacles, we propose a new evaluation framework based on the GPT-3.5 (\texttt{GPT-3.5-turbo}), for code generation assessments. Our framework addresses the limitations of existing approaches by achieving superior correlations with functional correctness and human preferences, without the need for test oracles or references. We evaluate the efficacy of our framework on two different tasks and four programming languages, comparing its performance with the state-of-the-art CodeBERTScore metric, which relies on a pre-trained model. Our results demonstrate that our framework surpasses CodeBERTScore, delivering high levels of accuracy and consistency across various programming languages and tasks. We also make our evaluation framework and datasets available to the public at \url{}, encouraging further research in the evaluation of code generation.

9.Pushing the Boundaries of Tractable Multiperspective Reasoning: A Deduction Calculus for Standpoint EL+

Authors:Lucía {Gómez Álvarez}, Sebastian Rudolph, Hannes Strass

Abstract: Standpoint EL is a multi-modal extension of the popular description logic EL that allows for the integrated representation of domain knowledge relative to diverse standpoints or perspectives. Advantageously, its satisfiability problem has recently been shown to be in PTime, making it a promising framework for large-scale knowledge integration. In this paper, we show that we can further push the expressivity of this formalism, arriving at an extended logic, called Standpoint EL+, which allows for axiom negation, role chain axioms, self-loops, and other features, while maintaining tractability. This is achieved by designing a satisfiability-checking deduction calculus, which at the same time addresses the need for practical algorithms. We demonstrate the feasibility of our calculus by presenting a prototypical Datalog implementation of its deduction rules.

10.ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT

Authors:Solomon Ubani, Suleyman Olcay Polat, Rodney Nielsen

Abstract: In this paper, we investigate the use of data obtained from prompting a large generative language model, ChatGPT, to generate synthetic training data with the aim of augmenting data in low resource scenarios. We show that with appropriate task-specific ChatGPT prompts, we outperform the most popular existing approaches for such data augmentation. Furthermore, we investigate methodologies for evaluating the similarity of the augmented data generated from ChatGPT with the aim of validating and assessing the quality of the data generated.

1.A Portrait of Emotion: Empowering Self-Expression through AI-Generated Art

Authors:Yoon Kyung Lee, Yong-Ha Park, Sowon Hahn

Abstract: We investigated the potential and limitations of generative artificial intelligence (AI) in reflecting the authors' cognitive processes through creative expression. The focus is on the AI-generated artwork's ability to understand human intent (alignment) and visually represent emotions based on criteria such as creativity, aesthetic, novelty, amusement, and depth. Results show a preference for images based on the descriptions of the authors' emotions over the main events. We also found that images that overrepresent specific elements or stereotypes negatively impact AI alignment. Our findings suggest that AI could facilitate creativity and the self-expression of emotions. Our research framework with generative AIs can help design AI-based interventions in related fields (e.g., mental health education, therapy, and counseling).

2.Evaluation of Regularization-based Continual Learning Approaches: Application to HAR

Authors:Bonpagna Kann UGA, M-PSI, Sandra Castellanos-Paez UGA, M-PSI, Philippe Lalanda UGA, M-PSI

Abstract: Pervasive computing allows the provision of services in many important areas, including the relevant and dynamic field of health and well-being. In this domain, Human Activity Recognition (HAR) has gained a lot of attention in recent years. Current solutions rely on Machine Learning (ML) models and achieve impressive results. However, the evolution of these models remains difficult, as long as a complete retraining is not performed. To overcome this problem, the concept of Continual Learning is very promising today and, more particularly, the techniques based on regularization. These techniques are particularly interesting for their simplicity and their low cost. Initial studies have been conducted and have shown promising outcomes. However, they remain very specific and difficult to compare. In this paper, we provide a comprehensive comparison of three regularization-based methods that we adapted to the HAR domain, highlighting their strengths and limitations. Our experiments were conducted on the UCI HAR dataset and the results showed that no single technique outperformed all others in all scenarios considered.

3.Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate Representation

Authors:Krishnam Hasija, Shrishti Pradhan, Manasi Patwardhan, Raveendra Kumar Medicherla, Lovekesh Vig, Ravindra Naik

Abstract: In this paper, we define a neuro-symbolic approach to address the task of finding semantically similar clones for the codes of the legacy programming language COBOL, without training data. We define a meta-model that is instantiated to have an Intermediate Representation (IR) in the form of Abstract Syntax Trees (ASTs) common across codes in C and COBOL. We linearize the IRs using Structure Based Traversal (SBT) to create sequential inputs. We further fine-tune UnixCoder, the best-performing model for zero-shot cross-programming language code search, for the Code Cloning task with the SBT IRs of C code-pairs, available in the CodeNet dataset. This allows us to learn latent representations for the IRs of the C codes, which are transferable to the IRs of the COBOL codes. With this fine-tuned UnixCoder, we get a performance improvement of 12.85 [email protected] over the pre-trained UniXCoder model, in a zero-shot setting, on the COBOL test split synthesized from the CodeNet dataset. This demonstrates the efficacy of our meta-model based approach to facilitate cross-programming language transfer.

4.Conjunctive Query Based Constraint Solving For Feature Model Configuration

Authors:Alexander Felfernig, Viet-Man Le, Sebastian Lubos

Abstract: Feature model configuration can be supported on the basis of various types of reasoning approaches. Examples thereof are SAT solving, constraint solving, and answer set programming (ASP). Using these approaches requires technical expertise of how to define and solve the underlying configuration problem. In this paper, we show how to apply conjunctive queries typically supported by today's relational database systems to solve constraint satisfaction problems (CSP) and -- more specifically -- feature model configuration tasks. This approach allows the application of a wide-spread database technology to solve configuration tasks and also allows for new algorithmic approaches when it comes to the identification and resolution of inconsistencies.

5.Optimizing Energy Efficiency in Metro Systems Under Uncertainty Disturbances Using Reinforcement Learning

Authors:Haiqin Xie, Cheng Wang, Shicheng Li, Yue Zhang, Shanshan Wang, Xiaoping Lu

Abstract: In the realm of urban transportation, metro systems serve as crucial and sustainable means of public transit. However, their substantial energy consumption poses a challenge to the goal of sustainability. Disturbances such as delays and passenger flow changes can further exacerbate this issue by negatively affecting energy efficiency in metro systems. To tackle this problem, we propose a policy-based reinforcement learning approach that reschedules the metro timetable and optimizes energy efficiency in metro systems under disturbances by adjusting the dwell time and cruise speed of trains. Our experiments conducted in a simulation environment demonstrate the superiority of our method over baseline methods, achieving a traction energy consumption reduction of up to 10.9% and an increase in regenerative braking energy utilization of up to 47.9%. This study provides an effective solution to the energy-saving problem of urban rail transit.

6.The Roles of Symbols in Neural-based AI: They are Not What You Think!

Authors:Daniel L. Silver, Tom M. Mitchell

Abstract: We propose that symbols are first and foremost external communication tools used between intelligent agents that allow knowledge to be transferred in a more efficient and effective manner than having to experience the world directly. But, they are also used internally within an agent through a form of self-communication to help formulate, describe and justify subsymbolic patterns of neural activity that truly implement thinking. Symbols, and our languages that make use of them, not only allow us to explain our thinking to others and ourselves, but also provide beneficial constraints (inductive bias) on learning about the world. In this paper we present relevant insights from neuroscience and cognitive science, about how the human brain represents symbols and the concepts they refer to, and how today's artificial neural networks can do the same. We then present a novel neuro-symbolic hypothesis and a plausible architecture for intelligent agents that combines subsymbolic representations for symbols and concepts for learning and reasoning. Our hypothesis and associated architecture imply that symbols will remain critical to the future of intelligent systems NOT because they are the fundamental building blocks of thought, but because they are characterizations of subsymbolic processes that constitute thought.

7.Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning

Authors:Tobias Müller, Milena Zahn, Florian Matthes

Abstract: The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic potential. However, this requires collaboration between multiple parties owning data silos. Setting up collaborative business models is complex and often a reason for failure. Current literature lacks guidelines on which aspects must be considered to successfully realize collaborative AI projects. This research investigates the challenges of prevailing collaborative business models and distinct aspects of Federated Machine Learning. Through a systematic literature review, focus group, and expert interviews, we provide a systemized collection of socio-technical challenges and an extended Business Model Canvas for the initial viability assessment of collaborative AI projects.

8.Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery

Authors:Debadutta Dash, Rahul Thapa, Juan M. Banda, Akshay Swaminathan, Morgan Cheatham, Mehr Kashyap, Nikesh Kotecha, Jonathan H. Chen, Saurabh Gombar, Lance Downing, Rachel Pedreira, Ethan Goh, Angel Arnaout, Garret Kenn Morris, Honor Magon, Matthew P Lungren, Eric Horvitz, Nigam H. Shah

Abstract: Despite growing interest in using large language models (LLMs) in healthcare, current explorations do not assess the real-world utility and safety of LLMs in clinical settings. Our objective was to determine whether two LLMs can serve information needs submitted by physicians as questions to an informatics consultation service in a safe and concordant manner. Sixty six questions from an informatics consult service were submitted to GPT-3.5 and GPT-4 via simple prompts. 12 physicians assessed the LLM responses' possibility of patient harm and concordance with existing reports from an informatics consultation service. Physician assessments were summarized based on majority vote. For no questions did a majority of physicians deem either LLM response as harmful. For GPT-3.5, responses to 8 questions were concordant with the informatics consult report, 20 discordant, and 9 were unable to be assessed. There were 29 responses with no majority on "Agree", "Disagree", and "Unable to assess". For GPT-4, responses to 13 questions were concordant, 15 discordant, and 3 were unable to be assessed. There were 35 responses with no majority. Responses from both LLMs were largely devoid of overt harm, but less than 20% of the responses agreed with an answer from an informatics consultation service, responses contained hallucinated references, and physicians were divided on what constitutes harm. These results suggest that while general purpose LLMs are able to provide safe and credible responses, they often do not meet the specific information need of a given question. A definitive evaluation of the usefulness of LLMs in healthcare settings will likely require additional research on prompt engineering, calibration, and custom-tailoring of general purpose models.

1.Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning

Authors:Hao Dong, Zhiyuan Ning, Pengyang Wang, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou, Yanjie Fu

Abstract: Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.

2.Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Authors:Min Yang, Guanjun Liu, Ziyuan Zhou

Abstract: Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.

3.Disagreement amongst counterfactual explanations: How transparency can be deceptive

Authors:Dieter Brughmans, Lissa Melis, David Martens

Abstract: Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 datasets, using 12 explanation-generating methods, for two black-box models, yielding over 192.0000 explanations. Our study finds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the dataset characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency

4.On the Computation of Meaning, Language Models and Incomprehensible Horrors

Authors:Michael Timothy Bennett

Abstract: We integrate foundational theories of meaning with a mathematical formalism of artificial general intelligence (AGI) to offer a comprehensive mechanistic explanation of meaning, communication, and symbol emergence. This synthesis holds significance for both AGI and broader debates concerning the nature of language, as it unifies pragmatics, logical truth conditional semantics, Peircean semiotics, and a computable model of enactive cognition, addressing phenomena that have traditionally evaded mechanistic explanation. By examining the conditions under which a machine can generate meaningful utterances or comprehend human meaning, we establish that the current generation of language models do not possess the same understanding of meaning as humans nor intend any meaning that we might attribute to their responses. To address this, we propose simulating human feelings and optimising models to construct weak representations. Our findings shed light on the relationship between meaning and intelligence, and how we can build machines that comprehend and intend meaning.

5.Empowering Wildlife Guardians: An Equitable Digital Stewardship and Reward System for Biodiversity Conservation using Deep Learning and 3/4G Camera Traps

Authors:Paul Fergus, Carl Chalmers, Steven Longmore, Serge Wich, Carmen Warmenhove, Jonathan Swart, Thuto Ngongwane, André Burger, Jonathan Ledgard, Erik Meijaard

Abstract: The biodiversity of our planet is under threat, with approximately one million species expected to become extinct within decades. The reason; negative human actions, which include hunting, overfishing, pollution, and the conversion of land for urbanisation and agricultural purposes. Despite significant investment from charities and governments for activities that benefit nature, global wildlife populations continue to decline. Local wildlife guardians have historically played a critical role in global conservation efforts and have shown their ability to achieve sustainability at various levels. In 2021, COP26 recognised their contributions and pledged US$1.7 billion per year; however, this is a fraction of the global biodiversity budget available (between US$124 billion and US$143 billion annually) given they protect 80% of the planets biodiversity. This paper proposes a radical new solution based on "Interspecies Money," where animals own their own money. Creating a digital twin for each species allows animals to dispense funds to their guardians for the services they provide. For example, a rhinoceros may release a payment to its guardian each time it is detected in a camera trap as long as it remains alive and well. To test the efficacy of this approach 27 camera traps were deployed over a 400km2 area in Welgevonden Game Reserve in Limpopo Province in South Africa. The motion-triggered camera traps were operational for ten months and, using deep learning, we managed to capture images of 12 distinct animal species. For each species, a makeshift bank account was set up and credited with {\pounds}100. Each time an animal was captured in a camera and successfully classified, 1 penny (an arbitrary amount - mechanisms still need to be developed to determine the real value of species) was transferred from the animal account to its associated guardian.

6.A optimization framework for herbal prescription planning based on deep reinforcement learning

Authors:Kuo Yang, Zecong Yu, Xin Su, Xiong He, Ning Wang, Qiguang Zheng, Feidie Yu, Zhuang Liu, Tiancai Wen, Xuezhong Zhou

Abstract: Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM.

7.Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Authors:Roger Creus Castanyer

Abstract: Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project:

8.Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images

Authors:Zeyu Lu, Di Huang, Lei Bai, Xihui Liu, Jingjing Qu, Wanli Ouyang

Abstract: Photos serve as a way for humans to record what they experience in their daily lives, and they are often regarded as trustworthy sources of information. However, there is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos, which can create confusion and diminish trust in photographs. This study aims to answer the question of whether the current state-of-the-art AI-based visual content generation models can consistently deceive human eyes and convey false information. By conducting a high-quality quantitative study with fifty participants, we reveal, for the first time, that humans cannot distinguish between real photos and AI-created fake photos to a significant degree 38.7%. Our study also finds that an individual's background, such as their gender, age, and experience with AI-generated content (AIGC), does not significantly affect their ability to distinguish AI-generated images from real photographs. However, we do observe that there tend to be certain defects in AI-generated images that serve as cues for people to distinguish between real and fake photos. We hope that our study can raise awareness of the potential risks of AI-generated images and encourage further research to prevent the spread of false information. From a positive perspective, AI-generated images have the potential to revolutionize various industries and create a better future for humanity if they are used and regulated properly.

1.Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization

Authors:Mingli Zhu, Shaokui Wei, Li Shen, Yanbo Fan, Baoyuan Wu

Abstract: Backdoor defense, which aims to detect or mitigate the effect of malicious triggers introduced by attackers, is becoming increasingly critical for machine learning security and integrity. Fine-tuning based on benign data is a natural defense to erase the backdoor effect in a backdoored model. However, recent studies show that, given limited benign data, vanilla fine-tuning has poor defense performance. In this work, we provide a deep study of fine-tuning the backdoored model from the neuron perspective and find that backdoorrelated neurons fail to escape the local minimum in the fine-tuning process. Inspired by observing that the backdoorrelated neurons often have larger norms, we propose FTSAM, a novel backdoor defense paradigm that aims to shrink the norms of backdoor-related neurons by incorporating sharpness-aware minimization with fine-tuning. We demonstrate the effectiveness of our method on several benchmark datasets and network architectures, where it achieves state-of-the-art defense performance. Overall, our work provides a promising avenue for improving the robustness of machine learning models against backdoor attacks.

2.Human intuition as a defense against attribute inference

Authors:Marcin Waniek, Navya Suri, Abdullah Zameek, Bedoor AlShebli, Talal Rahwan

Abstract: Attribute inference - the process of analyzing publicly available data in order to uncover hidden information - has become a major threat to privacy, given the recent technological leap in machine learning. One way to tackle this threat is to strategically modify one's publicly available data in order to keep one's private information hidden from attribute inference. We evaluate people's ability to perform this task, and compare it against algorithms designed for this purpose. We focus on three attributes: the gender of the author of a piece of text, the country in which a set of photos was taken, and the link missing from a social network. For each of these attributes, we find that people's effectiveness is inferior to that of AI, especially when it comes to hiding the attribute in question. Moreover, when people are asked to modify the publicly available information in order to hide these attributes, they are less likely to make high-impact modifications compared to AI. This suggests that people are unable to recognize the aspects of the data that are critical to an inference algorithm. Taken together, our findings highlight the limitations of relying on human intuition to protect privacy in the age of AI, and emphasize the need for algorithmic support to protect private information from attribute inference.

3.Towards a Praxis for Intercultural Ethics in Explainable AI

Authors:Chinasa T. Okolo

Abstract: Explainable AI (XAI) is often promoted with the idea of helping users understand how machine learning models function and produce predictions. Still, most of these benefits are reserved for those with specialized domain knowledge, such as machine learning developers. Recent research has argued that making AI explainable can be a viable way of making AI more useful in real-world contexts, especially within low-resource domains in the Global South. While AI has transcended borders, a limited amount of work focuses on democratizing the concept of explainable AI to the "majority world", leaving much room to explore and develop new approaches within this space that cater to the distinct needs of users within culturally and socially-diverse regions. This article introduces the concept of an intercultural ethics approach to AI explainability. It examines how cultural nuances impact the adoption and use of technology, the factors that impede how technical concepts such as AI are explained, and how integrating an intercultural ethics approach in the development of XAI can improve user understanding and facilitate efficient usage of these methods.

4.Data-driven Knowledge Fusion for Deep Multi-instance Learning

Authors:Yu-Xuan Zhang, Zhengchun Zhou, Xingxing He, Avik Ranjan Adhikary, Bapi Dutta

Abstract: Multi-instance learning (MIL) is a widely-applied technique in practical applications that involve complex data structures. MIL can be broadly categorized into two types: traditional methods and those based on deep learning. These approaches have yielded significant results, especially with regards to their problem-solving strategies and experimental validation, providing valuable insights for researchers in the MIL field. However, a considerable amount of knowledge is often trapped within the algorithm, leading to subsequent MIL algorithms that solely rely on the model's data fitting to predict unlabeled samples. This results in a significant loss of knowledge and impedes the development of more intelligent models. In this paper, we propose a novel data-driven knowledge fusion for deep multi-instance learning (DKMIL) algorithm. DKMIL adopts a completely different idea from existing deep MIL methods by analyzing the decision-making of key samples in the data set (referred to as the data-driven) and using the knowledge fusion module designed to extract valuable information from these samples to assist the model's training. In other words, this module serves as a new interface between data and the model, providing strong scalability and enabling the use of prior knowledge from existing algorithms to enhance the learning ability of the model. Furthermore, to adapt the downstream modules of the model to more knowledge-enriched features extracted from the data-driven knowledge fusion module, we propose a two-level attention module that gradually learns shallow- and deep-level features of the samples to achieve more effective classification. We will prove the scalability of the knowledge fusion module while also verifying the efficacy of the proposed architecture by conducting experiments on 38 data sets across 6 categories.

5.Development of a Trust-Aware User Simulator for Statistical Proactive Dialog Modeling in Human-AI Teams

Authors:Matthias Kraus, Ron Riekenbrauck, Wolfgang Minker

Abstract: The concept of a Human-AI team has gained increasing attention in recent years. For effective collaboration between humans and AI teammates, proactivity is crucial for close coordination and effective communication. However, the design of adequate proactivity for AI-based systems to support humans is still an open question and a challenging topic. In this paper, we present the development of a corpus-based user simulator for training and testing proactive dialog policies. The simulator incorporates informed knowledge about proactive dialog and its effect on user trust and simulates user behavior and personal information, including socio-demographic features and personality traits. Two different simulation approaches were compared, and a task-step-based approach yielded better overall results due to enhanced modeling of sequential dependencies. This research presents a promising avenue for exploring and evaluating appropriate proactive strategies in a dialog game setting for improving Human-AI teams.

6.MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures

Authors:Arthur Vervaet

Abstract: Within today's large-scale systems, one anomaly can impact millions of users. Detecting such events in real-time is essential to maintain the quality of services. It allows the monitoring team to prevent or diminish the impact of a failure. Logs are a core part of software development and maintenance, by recording detailed information at runtime. Such log data are universally available in nearly all computer systems. They enable developers as well as system maintainers to monitor and dissect anomalous events. For Cloud computing companies and large online platforms in general, growth is linked to the scaling potential. Automatizing the anomaly detection process is a promising way to ensure the scalability of monitoring capacities regarding the increasing volume of logs generated by modern systems. In this paper, we will introduce MoniLog, a distributed approach to detect real-time anomalies within large-scale environments. It aims to detect sequential and quantitative anomalies within a multi-source log stream. MoniLog is designed to structure a log stream and perform the monitoring of anomalous sequences. Its output classifier learns from the administrator's actions to label and evaluate the criticality level of anomalies.

7.Geometric Relational Embeddings: A Survey

Authors:Bo Xiong, Mojtaba Nayyeri, Ming Jin, Yunjie He, Michael Cochez, Shirui Pan, Steffen Staab

Abstract: Geometric relational embeddings map relational data as geometric objects that combine vector information suitable for machine learning and structured/relational information for structured/relational reasoning, typically in low dimensions. Their preservation of relational structures and their appealing properties and interpretability have led to their uptake for tasks such as knowledge graph completion, ontology and hierarchy reasoning, logical query answering, and hierarchical multi-label classification. We survey methods that underly geometric relational embeddings and categorize them based on (i) the embedding geometries that are used to represent the data; and (ii) the relational reasoning tasks that they aim to improve. We identify the desired properties (i.e., inductive biases) of each kind of embedding and discuss some potential future work.

8.Hierarchical State Abstraction Based on Structural Information Principles

Authors:Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu

Abstract: State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

9.Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey

Authors:Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, Weilin Luo

Abstract: Reinforcement Learning(RL) has achieved tremendous development in recent years, but still faces significant obstacles in addressing complex real-life problems due to the issues of poor system generalization, low sample efficiency as well as safety and interpretability concerns. The core reason underlying such dilemmas can be attributed to the fact that most of the work has focused on the computational aspect of value functions or policies using a representational model to describe atomic components of rewards, states and actions etc, thus neglecting the rich high-level declarative domain knowledge of facts, relations and rules that can be either provided a priori or acquired through reasoning over time. Recently, there has been a rapidly growing interest in the use of Knowledge Representation and Reasoning(KRR) methods, usually using logical languages, to enable more abstract representation and efficient learning in RL. In this survey, we provide a preliminary overview on these endeavors that leverage the strengths of KRR to help solving various problems in RL, and discuss the challenging open problems and possible directions for future work in this area.

10.Combining Monte Carlo Tree Search and Heuristic Search for Weighted Vertex Coloring

Authors:Cyril Grelier, Olivier Goudet, Jin-Kao Hao

Abstract: This work investigates the Monte Carlo Tree Search (MCTS) method combined with dedicated heuristics for solving the Weighted Vertex Coloring Problem. In addition to the basic MCTS algorithm, we study several MCTS variants where the conventional random simulation is replaced by other simulation strategies including greedy and local search heuristics. We conduct experiments on well-known benchmark instances to assess these combined MCTS variants. We provide empirical evidence to shed light on the advantages and limits of each simulation strategy. This is an extension of the work of Grelier and al. presented at EvoCOP2022.

11.Stubborn: An Environment for Evaluating Stubbornness between Agents with Aligned Incentives

Authors:Ram Rachum, Yonatan Nakar, Reuth Mirsky

Abstract: Recent research in multi-agent reinforcement learning (MARL) has shown success in learning social behavior and cooperation. Social dilemmas between agents in mixed-sum settings have been studied extensively, but there is little research into social dilemmas in fullycooperative settings, where agents have no prospect of gaining reward at another agent's expense. While fully-aligned interests are conducive to cooperation between agents, they do not guarantee it. We propose a measure of "stubbornness" between agents that aims to capture the human social behavior from which it takes its name: a disagreement that is gradually escalating and potentially disastrous. We would like to promote research into the tendency of agents to be stubborn, the reactions of counterpart agents, and the resulting social dynamics. In this paper we present Stubborn, an environment for evaluating stubbornness between agents with fully-aligned incentives. In our preliminary results, the agents learn to use their partner's stubbornness as a signal for improving the choices that they make in the environment.

12.A Survey on Multi-Resident Activity Recognition in Smart Environments

Authors:Farhad MortezaPour Shiri, Thinagaran Perumal, Norwati Mustapha, Raihani Mohamed, Mohd Anuaruddin Bin Ahmadon, Shingo Yamaguchi

Abstract: Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be addressed in order to effectively utilize HAR systems in multi-resident environments. One of the key challenges is accurately associating sensor observations with the identities of the individuals involved, which can be particularly difficult when residents are engaging in complex and collaborative activities. This paper provides a brief overview of the design and implementation of HAR systems, including a summary of the various data collection devices and approaches used for human activity identification. It also reviews previous research on the use of these systems in multi-resident environments and offers conclusions on the current state of the art in the field.

1.RPLKG: Robust Prompt Learning with Knowledge Graph

Authors:Yewon Kim, YongTaek Lim, Dokyung Yoon, KyungWoo Song

Abstract: Large-scale pre-trained models have been known that they are transferable, and they generalize well on the unseen dataset. Recently, multimodal pre-trained models such as CLIP show significant performance improvement in diverse experiments. However, when the labeled dataset is limited, the generalization of a new dataset or domain is still challenging. To improve the generalization performance on few-shot learning, there have been diverse efforts, such as prompt learning and adapter. However, the current few-shot adaptation methods are not interpretable, and they require a high computation cost for adaptation. In this study, we propose a new method, robust prompt learning with knowledge graph (RPLKG). Based on the knowledge graph, we automatically design diverse interpretable and meaningful prompt sets. Our model obtains cached embeddings of prompt sets after one forwarding from a large pre-trained model. After that, model optimizes the prompt selection processes with GumbelSoftmax. In this way, our model is trained using relatively little memory and learning time. Also, RPLKG selects the optimal interpretable prompt automatically, depending on the dataset. In summary, RPLKG is i) interpretable, ii) requires small computation resources, and iii) easy to incorporate prior human knowledge. To validate the RPLKG, we provide comprehensive experimental results on few-shot learning, domain generalization and new class generalization setting. RPLKG shows a significant performance improvement compared to zero-shot learning and competitive performance against several prompt learning methods using much lower resources.

2.Approximate Shielding of Atari Agents for Safe Exploration

Authors:Alexander W. Goodall, Francesco Belardinelli

Abstract: Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.

3.Semantics, Ontology and Explanation

Authors:Giancarlo Guizzardi, Nicola Guarino

Abstract: The terms 'semantics' and 'ontology' are increasingly appearing together with 'explanation', not only in the scientific literature, but also in organizational communication. However, all of these terms are also being significantly overloaded. In this paper, we discuss their strong relation under particular interpretations. Specifically, we discuss a notion of explanation termed ontological unpacking, which aims at explaining symbolic domain descriptions (conceptual models, knowledge graphs, logical specifications) by revealing their ontological commitment in terms of their assumed truthmakers, i.e., the entities in one's ontology that make the propositions in those descriptions true. To illustrate this idea, we employ an ontological theory of relations to explain (by revealing the hidden semantics of) a very simple symbolic model encoded in the standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in properly supporting semantic interoperability tasks. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the area of Artificial Intelligence.

1.Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning

Authors:Chenglu Sun AI Lab, Netease, Yichi Zhang AI Lab, Netease, Yu Zhang AI Lab, Netease, Ziling Lu AI Lab, Netease, Jingbin Liu AI Lab, Netease, Sijia Xu AI Lab, Netease, Weidong Zhang AI Lab, Netease

Abstract: Asymmetrical multiplayer (AMP) game is a popular game genre which involves multiple types of agents competing or collaborating with each other in the game. It is difficult to train powerful agents that can defeat top human players in AMP games by typical self-play training method because of unbalancing characteristics in their asymmetrical environments. We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game. We designed adaptive data adjustment (ADA) and environment randomization (ER) to optimize the AET process. We tested our method in a complex AMP game named Tom \& Jerry, and our AIs trained without using any human data can achieve a win rate of 98.5% against top human players over 65 matches. The ablation experiments indicated that the proposed modules are beneficial to the framework.

2.Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks

Authors:Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, Gareth Tyson

Abstract: The release of ChatGPT has uncovered a range of possibilities whereby large language models (LLMs) can substitute human intelligence. In this paper, we seek to understand whether ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks. Such an achievement could significantly reduce the cost and complexity of social computing research. As such, we use ChatGPT to re-label five seminal datasets covering stance detection (2x), sentiment analysis, hate speech, and bot detection. Our results highlight that ChatGPT does have the potential to handle these data annotation tasks, although a number of challenges remain. ChatGPT obtains an average precision 0.609. Performance is highest for the sentiment analysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, we show that performance varies substantially across individual labels. We believe this work can open up new lines of analysis and act as a basis for future research into the exploitation of ChatGPT for human annotation tasks.

3.Towards a Benchmark for Scientific Understanding in Humans and Machines

Authors:Kristian Gonzalez Barman, Sascha Caron, Tom Claassen, Henk de Regt

Abstract: Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. The Scientific Understanding Benchmark (SUB), which is formed by a set of these tests, allows for the evaluation and comparison of different approaches. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

4.On the Potential of Artificial Intelligence Chatbots for Data Exploration of Federated Bioinformatics Knowledge Graphs

Authors:Ana-Claudia Sima, Tarcisio Mendes de Farias

Abstract: In this paper, we present work in progress on the role of artificial intelligence (AI) chatbots, such as ChatGPT, in facilitating data access to federated knowledge graphs. In particular, we provide examples from the field of bioinformatics, to illustrate the potential use of Conversational AI to describe datasets, as well as generate and explain (federated) queries across datasets for the benefit of domain experts.

1.An Ecosystem for Personal Knowledge Graphs: A Survey and Research Roadmap

Authors:Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Lajewska, Trond Linjordet

Abstract: This paper presents an ecosystem for personal knowledge graphs (PKG), commonly defined as resources of structured information about entities related to an individual, their attributes, and the relations between them. PKGs are a key enabler of secure and sophisticated personal data management and personalized services. However, there are challenges that need to be addressed before PKGs can achieve widespread adoption. One of the fundamental challenges is the very definition of what constitutes a PKG, as there are multiple interpretations of the term. We propose our own definition of a PKG, emphasizing the aspects of (1) data ownership by a single individual and (2) the delivery of personalized services as the primary purpose. We further argue that a holistic view of PKGs is needed to unlock their full potential, and propose a unified framework for PKGs, where the PKG is a part of a larger ecosystem with clear interfaces towards data services and data sources. A comprehensive survey and synthesis of existing work is conducted, with a mapping of the surveyed work into the proposed unified ecosystem. Finally, we identify open challenges and research opportunities for the ecosystem as a whole, as well as for the specific aspects of PKGs, which include population, representation and management, and utilization.

2.End-to-End Policy Gradient Method for POMDPs and Explainable Agents

Authors:Soichiro Nishimori, Sotetsu Koyamada, Shin Ishii

Abstract: Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.

1.Addressing Variable Dependency in GNN-based SAT Solving

Authors:Zhiyuan Yan, Min Li, Zhengyuan Shi, Wenjie Zhang, Yingcong Chen, Hongce Zhang

Abstract: Boolean satisfiability problem (SAT) is fundamental to many applications. Existing works have used graph neural networks (GNNs) for (approximate) SAT solving. Typical GNN-based end-to-end SAT solvers predict SAT solutions concurrently. We show that for a group of symmetric SAT problems, the concurrent prediction is guaranteed to produce a wrong answer because it neglects the dependency among Boolean variables in SAT problems. % We propose AsymSAT, a GNN-based architecture which integrates recurrent neural networks to generate dependent predictions for variable assignments. The experiment results show that dependent variable prediction extends the solving capability of the GNN-based method as it improves the number of solved SAT instances on large test sets.

2.Impact Of Explainable AI On Cognitive Load: Insights From An Empirical Study

Authors:Lukas-Valentin Herm

Abstract: While the emerging research field of explainable artificial intelligence (XAI) claims to address the lack of explainability in high-performance machine learning models, in practice, XAI targets developers rather than actual end-users. Unsurprisingly, end-users are often unwilling to use XAI-based decision support systems. Similarly, there is limited interdisciplinary research on end-users' behavior during XAI explanations usage, rendering it unknown how explanations may impact cognitive load and further affect end-user performance. Therefore, we conducted an empirical study with 271 prospective physicians, measuring their cognitive load, task performance, and task time for distinct implementation-independent XAI explanation types using a COVID-19 use case. We found that these explanation types strongly influence end-users' cognitive load, task performance, and task time. Further, we contextualized a mental efficiency metric, ranking local XAI explanation types best, to provide recommendations for future applications and implications for sociotechnical XAI research.

3.PaTeCon: A Pattern-Based Temporal Constraint Mining Method for Conflict Detection on Knowledge Graphs

Authors:Jianhao Chen, Junyang Ren, Wentao Ding, Yuzhong Qu

Abstract: Temporal facts, the facts for characterizing events that hold in specific time periods, are attracting rising attention in the knowledge graph (KG) research communities. In terms of quality management, the introduction of time restrictions brings new challenges to maintaining the temporal consistency of KGs and detecting potential temporal conflicts. Previous studies rely on manually enumerated temporal constraints to detect conflicts, which are labor-intensive and may have granularity issues. We start from the common pattern of temporal facts and constraints and propose a pattern-based temporal constraint mining method, PaTeCon. PaTeCon uses automatically determined graph patterns and their relevant statistical information over the given KG instead of human experts to generate time constraints. Specifically, PaTeCon dynamically attaches class restriction to candidate constraints according to their measuring scores.We evaluate PaTeCon on two large-scale datasets based on Wikidata and Freebase respectively. The experimental results show that pattern-based automatic constraint mining is powerful in generating valuable temporal constraints.

1.Automatic Textual Explanations of Concept Lattices

Authors:Johannes Hirth, Viktoria Horn, Gerd Stumme, Tom Hanika

Abstract: Lattices and their order diagrams are an essential tool for communicating knowledge and insights about data. This is in particular true when applying Formal Concept Analysis. Such representations, however, are difficult to comprehend by untrained users and in general in cases where lattices are large. We tackle this problem by automatically generating textual explanations for lattices using standard scales. Our method is based on the general notion of ordinal motifs in lattices for the special case of standard scales. We show the computational complexity of identifying a small number of standard scales that cover most of the lattice structure. For these, we provide textual explanation templates, which can be applied to any occurrence of a scale in any data domain. These templates are derived using principles from human-computer interaction and allow for a comprehensive textual explanation of lattices. We demonstrate our approach on the spices planner data set, which is a medium sized formal context comprised of fifty-six meals (objects) and thirty-seven spices (attributes). The resulting 531 formal concepts can be covered by means of about 100 standard scales.

2.Compositional Probabilistic and Causal Inference using Tractable Circuit Models

Authors:Benjie Wang, Marta Kwiatkowska

Abstract: Probabilistic circuits (PCs) are a class of tractable probabilistic models, which admit efficient inference routines depending on their structural properties. In this paper, we introduce md-vtrees, a novel structural formulation of (marginal) determinism in structured decomposable PCs, which generalizes previously proposed classes such as probabilistic sentential decision diagrams. Crucially, we show how mdvtrees can be used to derive tractability conditions and efficient algorithms for advanced inference queries expressed as arbitrary compositions of basic probabilistic operations, such as marginalization, multiplication and reciprocals, in a sound and generalizable manner. In particular, we derive the first polytime algorithms for causal inference queries such as backdoor adjustment on PCs. As a practical instantiation of the framework, we propose MDNets, a novel PC architecture using md-vtrees, and empirically demonstrate their application to causal inference.

3.'That Darned Sandstorm': A Study of Procedural Generation through Archaeological Storytelling

Authors:Florence Smith Nicholls, Michael Cook

Abstract: Procedural content generation has been applied to many domains, especially level design, but the narrative affordances of generated game environments are comparatively understudied. In this paper we present our first attempt to study these effects through the lens of what we call a generative archaeology game that prompts the player to archaeologically interpret the generated content of the game world. We report on a survey that gathered qualitative and quantitative data on the experiences of 187 participants playing the game Nothing Beside Remains. We provide some preliminary analysis of our intentional attempt to prompt player interpretation, and the unintentional effects of a glitch on the player experience of the game.

4.Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic Approach

Authors:Rishi Hazra, Luc De Raedt

Abstract: Despite numerous successes in Deep Reinforcement Learning (DRL), the learned policies are not interpretable. Moreover, since DRL does not exploit symbolic relational representations, it has difficulties in coping with structural changes in its environment (such as increasing the number of objects). Relational Reinforcement Learning, on the other hand, inherits the relational representations from symbolic planning to learn reusable policies. However, it has so far been unable to scale up and exploit the power of deep neural networks. We propose Deep Explainable Relational Reinforcement Learning (DERRL), a framework that exploits the best of both -- neural and symbolic worlds. By resorting to a neuro-symbolic approach, DERRL combines relational representations and constraints from symbolic planning with deep learning to extract interpretable policies. These policies are in the form of logical rules that explain how each decision (or action) is arrived at. Through several experiments, in setups like the Countdown Game, Blocks World, Gridworld, and Traffic, we show that the policies learned by DERRL can be applied to different configurations and contexts, hence generalizing to environmental modifications.

1.FairRec: Fairness Testing for Deep Recommender Systems

Authors:Huizhong Guo, Jinfeng Li, Jingyi Wang, Xiangyu Liu, Dongxia Wang, Zehong Hu, Rong Zhang, Hui Xue

Abstract: Deep learning-based recommender systems (DRSs) are increasingly and widely deployed in the industry, which brings significant convenience to people's daily life in different ways. However, recommender systems are also shown to suffer from multiple issues,e.g., the echo chamber and the Matthew effect, of which the notation of "fairness" plays a core role.While many fairness notations and corresponding fairness testing approaches have been developed for traditional deep classification models, they are essentially hardly applicable to DRSs. One major difficulty is that there still lacks a systematic understanding and mapping between the existing fairness notations and the diverse testing requirements for deep recommender systems, not to mention further testing or debugging activities. To address the gap, we propose FairRec, a unified framework that supports fairness testing of DRSs from multiple customized perspectives, e.g., model utility, item diversity, item popularity, etc. We also propose a novel, efficient search-based testing approach to tackle the new challenge, i.e., double-ended discrete particle swarm optimization (DPSO) algorithm, to effectively search for hidden fairness issues in the form of certain disadvantaged groups from a vast number of candidate groups. Given the testing report, by adopting a simple re-ranking mitigation strategy on these identified disadvantaged groups, we show that the fairness of DRSs can be significantly improved. We conducted extensive experiments on multiple industry-level DRSs adopted by leading companies. The results confirm that FairRec is effective and efficient in identifying the deeply hidden fairness issues, e.g., achieving 95% testing accuracy with half to 1/8 time.

2.On Existential First Order Queries Inference on Knowledge Graphs

Authors:Hang Yin, Zihao Wang, Yangqiu Song

Abstract: Reasoning on knowledge graphs is a challenging task because it utilizes observed information to predict the missing one. Specifically, answering first-order logic formulas is of particular interest because of its clear syntax and semantics. Recently, the query embedding method has been proposed which learns the embedding of a set of entities and treats logic operations as set operations. Though there has been much research following the same methodology, it lacks a systematic inspection from the standpoint of logic. In this paper, we characterize the scope of queries investigated previously and precisely identify the gap between it and the whole family of existential formulas. Moreover, we develop a new dataset containing ten new formulas and discuss the new challenges coming simultaneously. Finally, we propose a new search algorithm from fuzzy logic theory which is capable of solving new formulas and outperforming the previous methods in existing formulas.

3.Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning

Authors:Yash Satsangi, Paniz Behboudian

Abstract: A key challenge for a reinforcement learning (RL) agent is to incorporate external/expert1 advice in its learning. The desired goals of an algorithm that can shape the learning of an RL agent with external advice include (a) maintaining policy invariance; (b) accelerating the learning of the agent; and (c) learning from arbitrary advice [3]. To address this challenge this paper formulates the problem of incorporating external advice in RL as a multi-armed bandit called shaping-bandits. The reward of each arm of shaping bandits corresponds to the return obtained by following the expert or by following a default RL algorithm learning on the true environment reward.We show that directly applying existing bandit and shaping algorithms that do not reason about the non-stationary nature of the underlying returns can lead to poor results. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES (LPIES) three different shaping algorithms built on different assumptions that reason about the long-term consequences of following the expert policy or the default RL algorithm. Our experiments in four different settings show that these proposed algorithms achieve the above-mentioned goals whereas the other algorithms fail to do so.

4.Just Tell Me: Prompt Engineering in Business Process Management

Authors:Kiran Busch, Alexander Rochlitzer, Diana Sola, Henrik Leopold

Abstract: GPT-3 and several other language models (LMs) can effectively address various natural language processing (NLP) tasks, including machine translation and text summarization. Recently, they have also been successfully employed in the business process management (BPM) domain, e.g., for predictive process monitoring and process extraction from text. This, however, typically requires fine-tuning the employed LM, which, among others, necessitates large amounts of suitable training data. A possible solution to this problem is the use of prompt engineering, which leverages pre-trained LMs without fine-tuning them. Recognizing this, we argue that prompt engineering can help bring the capabilities of LMs to BPM research. We use this position paper to develop a research agenda for the use of prompt engineering for BPM research by identifying the associated potentials and challenges.

1.IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function

Authors:Shivani Bathla, Vinita Vasudevan

Abstract: Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.

2.Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

Authors:Yang Chen, Liangxuan Guo, Shan Yu

Abstract: Being able to create meaningful symbols and proficiently use them for higher cognitive functions such as communication, reasoning, planning, etc., is essential and unique for human intelligence. Current deep neural networks are still far behind human's ability to create symbols for such higher cognitive functions. Here we propose a solution, named SEA-net, to endow neural networks with ability of symbol creation, semantic understanding and communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that enables the system to acquire new functions purely by symbolic manipulation or communication. In addition, we found that these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We hope that it will be instrumental in producing more capable systems in the future that can synergize the strengths of connectionist and symbolic approaches for AI.

3.Power-seeking can be probable and predictive for trained agents

Authors:Victoria Krakovna, Janos Kramar

Abstract: Power-seeking behavior is a key source of risk from advanced AI, but our theoretical understanding of this phenomenon is relatively limited. Building on existing theoretical results demonstrating power-seeking incentives for most reward functions, we investigate how the training process affects power-seeking incentives and show that they are still likely to hold for trained agents under some simplifying assumptions. We formally define the training-compatible goal set (the set of goals consistent with the training rewards) and assume that the trained agent learns a goal from this set. In a setting where the trained agent faces a choice to shut down or avoid shutdown in a new situation, we prove that the agent is likely to avoid shutdown. Thus, we show that power-seeking incentives can be probable (likely to arise for trained agents) and predictive (allowing us to predict undesirable behavior in new situations).

1.Multi-agent Policy Reciprocity with Theoretical Guarantee

Authors:Haozhi Wang, Yinchuan Li, Qing Wang, Yunfeng Shao, Jianye Hao

Abstract: Modern multi-agent reinforcement learning (RL) algorithms hold great potential for solving a variety of real-world problems. However, they do not fully exploit cross-agent knowledge to reduce sample complexity and improve performance. Although transfer RL supports knowledge sharing, it is hyperparameter sensitive and complex. To solve this problem, we propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states. We then define an adjacency space for mismatched states and design a plug-and-play module for value iteration, which enables agents to infer more precise returns. To improve the scalability of PR, deep PR is proposed for continuous control tasks. Moreover, theoretical analysis shows that agents can asymptotically reach consensus through individual perceived rewards and converge to an optimal value function, which implies the stability and effectiveness of PR, respectively. Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.

2.Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics

Authors:Philipp Dominic Siedler

Abstract: Finding a balance between collaboration and competition is crucial for artificial agents in many real-world applications. We investigate this using a Multi-Agent Reinforcement Learning (MARL) setup on the back of a high-impact problem. The accumulation and yearly growth of plastic in the ocean cause irreparable damage to many aspects of oceanic health and the marina system. To prevent further damage, we need to find ways to reduce macroplastics from known plastic patches in the ocean. Here we propose a Graph Neural Network (GNN) based communication mechanism that increases the agents' observation space. In our custom environment, agents control a plastic collecting vessel. The communication mechanism enables agents to develop a communication protocol using a binary signal. While the goal of the agent collective is to clean up as much as possible, agents are rewarded for the individual amount of macroplastics collected. Hence agents have to learn to communicate effectively while maintaining high individual performance. We compare our proposed communication mechanism with a multi-agent baseline without the ability to communicate. Results show communication enables collaboration and increases collective performance significantly. This means agents have learned the importance of communication and found a balance between collaboration and competition.

1.If consciousness is dynamically relevant, artificial intelligence isn't conscious

Authors:Johannes Kleiner, Tim Ludwig

Abstract: We demonstrate that if consciousness is relevant for the temporal evolution of a system's states -- that is, if it is dynamically relevant -- then AI systems cannot be conscious. That is because AI systems run on CPUs, GPUs, TPUs or other processors which have been designed and verified to adhere to computational dynamics that systematically preclude or suppress deviations. The design and verification preclude or suppress, in particular, potential consciousness-related dynamical effects, so that if consciousness is dynamically relevant, AI systems cannot be conscious.

2.Artificial Collective Intelligence Engineering: a Survey of Concepts and Perspectives

Authors:Roberto Casadei

Abstract: Collectiveness is an important property of many systems--both natural and artificial. By exploiting a large number of individuals, it is often possible to produce effects that go far beyond the capabilities of the smartest individuals, or even to produce intelligent collective behaviour out of not-so-intelligent individuals. Indeed, collective intelligence, namely the capability of a group to act collectively in a seemingly intelligent way, is increasingly often a design goal of engineered computational systems--motivated by recent techno-scientific trends like the Internet of Things, swarm robotics, and crowd computing, just to name a few. For several years, the collective intelligence observed in natural and artificial systems has served as a source of inspiration for engineering ideas, models, and mechanisms. Today, artificial and computational collective intelligence are recognised research topics, spanning various techniques, kinds of target systems, and application domains. However, there is still a lot of fragmentation in the research panorama of the topic within computer science, and the verticality of most communities and contributions makes it difficult to extract the core underlying ideas and frames of reference. The challenge is to identify, place in a common structure, and ultimately connect the different areas and methods addressing intelligent collectives. To address this gap, this paper considers a set of broad scoping questions providing a map of collective intelligence research, mostly by the point of view of computer scientists and engineers. Accordingly, it covers preliminary notions, fundamental concepts, and the main research perspectives, identifying opportunities and challenges for researchers on artificial and computational collective intelligence engineering.

3.Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Authors:Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov

Abstract: Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.

4.Human-AI Co-Creation Approach to Find Forever Chemicals Replacements

Authors:Juliana Jansen Ferreira, Vinícius Segura, Joana G. R. Souza, Gabriel D. J. Barbosa, João Gallas, Renato Cerqueira, Dmitry Zubarev

Abstract: Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of subject matter experts to accelerate the material discovery. Our co-creation process starts with the interaction between the subject matter experts and a generative model that can generate new molecule designs. In this position paper, we discuss our hypothesis that these subject matter experts can benefit from a more iterative interaction with the generative model, asking for smaller samples and ``guiding'' the exploration of the discovery space with their knowledge.

1.NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking

Authors:Jason Yik, Soikat Hasan Ahmed, Zergham Ahmed, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu, Douwe den Blanken, Petrut Bogdan, Sander Bohte, Younes Bouhadjar, Sonia Buckley, Gert Cauwenberghs, Federico Corradi, Guido de Croon, Andreea Danielescu, Anurag Daram, Mike Davies, Yigit Demirag, Jason Eshraghian, Jeremy Forest, Steve Furber, Michael Furlong, Aditya Gilra, Giacomo Indiveri, Siddharth Joshi, Vedant Karia, Lyes Khacef, James C. Knight, Laura Kriener, Rajkumar Kubendran, Dhireesha Kudithipudi, Gregor Lenz, Rajit Manohar, Christian Mayr, Konstantinos Michmizos, Dylan Muir, Emre Neftci, Thomas Nowotny, Fabrizio Ottati, Ayca Ozcelikkale, Noah Pacik-Nelson, Priyadarshini Panda, Sun Pao-Sheng, Melika Payvand, Christian Pehle, Mihai A. Petrovici, Christoph Posch, Alpha Renner, Yulia Sandamirskaya, Clemens JS Schaefer, André van Schaik, Johannes Schemmel, Catherine Schuman, Jae-sun Seo, Sumit Bam Shrestha, Manolis Sifalakis, Amos Sironi, Kenneth Stewart, Terrence C. Stewart, Philipp Stratmann, Guangzhi Tang, Jonathan Timcheck, Marian Verhelst, Craig M. Vineyard, Bernhard Vogginger, Amirreza Yousefzadeh, Biyan Zhou, Fatima Tuz Zohora, Charlotte Frenkel, Vijay Janapa Reddi

Abstract: The field of neuromorphic computing holds great promise in terms of advancing computing efficiency and capabilities by following brain-inspired principles. However, the rich diversity of techniques employed in neuromorphic research has resulted in a lack of clear standards for benchmarking, hindering effective evaluation of the advantages and strengths of neuromorphic methods compared to traditional deep-learning-based methods. This paper presents a collaborative effort, bringing together members from academia and the industry, to define benchmarks for neuromorphic computing: NeuroBench. The goals of NeuroBench are to be a collaborative, fair, and representative benchmark suite developed by the community, for the community. In this paper, we discuss the challenges associated with benchmarking neuromorphic solutions, and outline the key features of NeuroBench. We believe that NeuroBench will be a significant step towards defining standards that can unify the goals of neuromorphic computing and drive its technological progress. Please visit for the latest updates on the benchmark tasks and metrics.

2.Artificial Intelligence/Operations Research Workshop 2 Report Out

Authors:John Dickerson, Bistra Dilkina, Yu Ding, Swati Gupta, Pascal Van Hentenryck, Sven Koenig, Ramayya Krishnan, Radhika Kulkarni, Catherine Gill, Haley Griffin, Maddy Hunter, Ann Schwartz

Abstract: This workshop Report Out focuses on the foundational elements of trustworthy AI and OR technology, and how to ensure all AI and OR systems implement these elements in their system designs. Four sessions on various topics within Trustworthy AI were held, these being Fairness, Explainable AI/Causality, Robustness/Privacy, and Human Alignment and Human-Computer Interaction. Following discussions of each of these topics, workshop participants also brainstormed challenge problems which require the collaboration of AI and OR researchers and will result in the integration of basic techniques from both fields to eventually benefit societal needs.

3.Ordinal Motifs in Lattices

Authors:Johannes Hirth, Viktoria Horn, Gerd Stumme, Tom Hanika

Abstract: Lattices are a commonly used structure for the representation and analysis of relational and ontological knowledge. In particular, the analysis of these requires a decomposition of a large and high-dimensional lattice into a set of understandably large parts. With the present work we propose /ordinal motifs/ as analytical units of meaning. We study these ordinal substructures (or standard scales) through (full) scale-measures of formal contexts from the field of formal concept analysis. We show that the underlying decision problems are NP-complete and provide results on how one can incrementally identify ordinal motifs to save computational effort. Accompanying our theoretical results, we demonstrate how ordinal motifs can be leveraged to retrieve basic meaning from a medium sized ordinal data set.

4.EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System

Authors:Yanlin Qi, Gengchen Mai, Rui Zhu, Michael Zhang

Abstract: Over the past decade, the electric vehicle industry has experienced unprecedented growth and diversification, resulting in a complex ecosystem. To effectively manage this multifaceted field, we present an EV-centric knowledge graph (EVKG) as a comprehensive, cross-domain, extensible, and open geospatial knowledge management system. The EVKG encapsulates essential EV-related knowledge, including EV adoption, electric vehicle supply equipment, and electricity transmission network, to support decision-making related to EV technology development, infrastructure planning, and policy-making by providing timely and accurate information and analysis. To enrich and contextualize the EVKG, we integrate the developed EV-relevant ontology modules from existing well-known knowledge graphs and ontologies. This integration enables interoperability with other knowledge graphs in the Linked Data Open Cloud, enhancing the EVKG's value as a knowledge hub for EV decision-making. Using six competency questions, we demonstrate how the EVKG can be used to answer various types of EV-related questions, providing critical insights into the EV ecosystem. Our EVKG provides an efficient and effective approach for managing the complex and diverse EV industry. By consolidating critical EV-related knowledge into a single, easily accessible resource, the EVKG supports decision-makers in making informed choices about EV technology development, infrastructure planning, and policy-making. As a flexible and extensible platform, the EVKG is capable of accommodating a wide range of data sources, enabling it to evolve alongside the rapidly changing EV landscape.

1.The Governance of Physical Artificial Intelligence

Authors:Yingbo Li, Anamaria-Beatrice Spulber, Yucong Duan

Abstract: Physical artificial intelligence can prove to be one of the most important challenges of the artificial intelligence. The governance of physical artificial intelligence would define its responsible intelligent application in the society.

2.FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

Authors:Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang

Abstract: We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80\% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

3.Revisiting Dense Retrieval with Unanswerable Counterfactuals

Authors:Yongho Song, Dahyun Lee, Kyungjae Lee, Jinyeong Yeo

Abstract: The retriever-reader framework is popular for open-domain question answering (ODQA), where a retriever samples for the reader a set of relevant candidate passages from a large corpus. A key assumption behind this method is that high relevance scores from the retriever likely indicate high answerability from the reader, which implies a high probability that the retrieved passages contain answers to a given question. In this work, we empirically dispel this belief and observe that recent dense retrieval models based on DPR often rank unanswerable counterfactual passages higher than their answerable original passages. To address such answer-unawareness in dense retrievers, we seek to use counterfactual samples as additional training resources to better synchronize the relevance measurement of DPR with the answerability of question-passage pairs. Specifically, we present counterfactually-Pivoting Contrastive Learning (PiCL), a novel representation learning approach for passage retrieval that leverages counterfactual samples as pivots between positive and negative samples in their learned embedding space. We incorporate PiCL into the retriever training to show the effectiveness of PiCL on ODQA benchmarks and the robustness of the learned models.

4.Almost optimal manipulation of a pair of alternatives

Authors:Jacek Szybowski, Konrad Kułakowski, Sebastian Ernst

Abstract: The role of an expert in the decision-making process is crucial, as the final recommendation depends on his disposition, clarity of mind, experience, and knowledge of the problem. However, the recommendation also depends on their honesty. But what if the expert is dishonest? Then, the answer on how difficult it is to manipulate in a given case becomes essential. In the presented work, we consider manipulation of a ranking obtained by comparing alternatives in pairs. More specifically, we propose an algorithm for finding an almost optimal way to swap the positions of two selected alternatives. Thanks to this, it is possible to determine how difficult such manipulation is in a given case. Theoretical considerations are illustrated by a practical example.

5.Retention Is All You Need

Authors:Karishma Mohiuddin, Mirza Ariful Alam, Mirza Mohtashim Alam, Pascal Welke, Michael Martin, Jens Lehmann, Sahar Vahdati

Abstract: Skilled employees are usually seen as the most important pillar of an organization. Despite this, most organizations face high attrition and turnover rates. While several machine learning models have been developed for analyzing attrition and its causal factors, the interpretations of those models remain opaque. In this paper, we propose the HR-DSS approach, which stands for Human Resource Decision Support System, and uses explainable AI for employee attrition problems. The system is designed to assist human resource departments in interpreting the predictions provided by machine learning models. In our experiments, eight machine learning models are employed to provide predictions, and the results achieved by the best-performing model are further processed by the SHAP explainability process. We optimize both the correctness and explanation of the results. Furthermore, using "What-if-analysis", we aim to observe plausible causes for attrition of an individual employee. The results show that by adjusting the specific dominant features of each individual, employee attrition can turn into employee retention through informative business decisions. Reducing attrition is not only a problem for any specific organization but also, in some countries, becomes a significant societal problem that impacts the well-being of both employers and employees.

6.BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

Authors:Jun Wu, Xuesong Ye, Man Yan Yuet

Abstract: A persistently popular topic in online social networks is the rapid and accurate discovery of bot accounts to prevent their invasion and harassment of genuine users. We propose a unified embedding framework called BOTTRINET, which utilizes textual content posted by accounts for bot detection based on the assumption that contexts naturally reveal account personalities and habits. Content is abundant and valuable if the system efficiently extracts bot-related information using embedding techniques. Beyond the general embedding framework that generates word, sentence, and account embeddings, we design a triplet network to tune the raw embeddings (produced by traditional natural language processing techniques) for better classification performance. We evaluate detection accuracy and f1score on a real-world dataset CRESCI2017, comprising three bot account categories and five bot sample sets. Our system achieves the highest average accuracy of 98.34% and f1score of 97.99% on two content-intensive bot sets, outperforming previous work and becoming state-of-the-art. It also makes a breakthrough on four content-less bot sets, with an average accuracy improvement of 11.52% and an average f1score increase of 16.70%.

7.Synthetic Data in Healthcare

Authors:Daniel McDuff, Theodore Curran, Achuta Kadambi

Abstract: Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real data, for training and testing systems. Synthetic data are particularly attractive in cases where the availability of ``real'' training examples might be a bottleneck. While the volume of data in healthcare is growing exponentially, creating datasets for novel tasks and/or that reflect a diverse set of conditions and causal relationships is not trivial. Furthermore, these data are highly sensitive and often patient specific. Recent research has begun to illustrate the potential for synthetic data in many areas of medicine, but no systematic review of the literature exists. In this paper, we present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.

8.When do you need Chain-of-Thought Prompting for ChatGPT?

Authors:Jiuhai Chen, Lichang Chen, Heng Huang, Tianyi Zhou

Abstract: Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models~(LLMs). For example, by simply adding CoT instruction ``Let's think step-by-step'' to each input query of MultiArith dataset, GPT-3's accuracy can be improved from 17.7\% to 78.7\%. However, it is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT. Surprisingly, on ChatGPT, CoT is no longer effective for certain tasks such as arithmetic reasoning while still keeping effective on other reasoning tasks. Moreover, on the former tasks, ChatGPT usually achieves the best performance and can generate CoT even without being instructed to do so. Hence, it is plausible that ChatGPT has already been trained on these tasks with CoT and thus memorized the instruction so it implicitly follows such an instruction when applied to the same queries, even without CoT. Our analysis reflects a potential risk of overfitting/bias toward instructions introduced in IFT, which becomes more common in training LLMs. In addition, it indicates possible leakage of the pretraining recipe, e.g., one can verify whether a dataset and instruction were used in training ChatGPT. Our experiments report new baseline results of ChatGPT on a variety of reasoning tasks and shed novel insights into LLM's profiling, instruction memorization, and pretraining dataset leakage.

1.Bodily expressed emotion understanding through integrating Laban movement analysis

Authors:Chenyan Wu, Dolzodmaa Davaasuren, Tal Shafir, Rachelle Tsachor, James Z. Wang

Abstract: Body movements carry important information about a person's emotions or mental state and are essential in daily communication. Enhancing the ability of machines to understand emotions expressed through body language can improve the communication of assistive robots with children and elderly users, provide psychiatric professionals with quantitative diagnostic and prognostic assistance, and aid law enforcement in identifying deception. This study develops a high-quality human motor element dataset based on the Laban Movement Analysis movement coding system and utilizes that to jointly learn about motor elements and emotions. Our long-term ambition is to integrate knowledge from computing, psychology, and performing arts to enable automated understanding and analysis of emotion and mental state through body language. This work serves as a launchpad for further research into recognizing emotions through analysis of human movement.

2.MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs

Authors:Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Hai-Tao Zheng, Juanzi Li, Jie Tang

Abstract: Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient to meet the need for these potential solutions due to their ignorance of complete exercising contexts, fine-grained concepts, and cognitive labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge concepts, and over 12 million behavioral records. Specifically, we propose a framework to guarantee a high-quality and comprehensive annotation of fine-grained concepts and cognitive labels. The statistical and experimental results indicate that our dataset provides the basis for the future improvements of existing methods. Moreover, to support the convenient usage for researchers, we release a set of tools for data querying, model adaption, and even the extension of our repository, which are now available at

1.Implementing Dynamic Programming in Computability Logic Web

Authors:Keehang Kwon

Abstract: We present a novel definition of an algorithm and its corresponding algorithm language called CoLweb. The merit of CoLweb [1] is that it makes algorithm design so versatile. That is, it forces us to a high-level, proof-carrying, distributed-style approach to algorithm design for both non-distributed computing and distributed one. We argue that this approach simplifies algorithm design. In addition, it unifies other approaches including recursive logical/functional algorithms, imperative algorithms, object-oriented imperative algorithms, neural-nets, interaction nets, proof-carrying code, etc. As an application, we refine Horn clause definitions into two kinds: blind-univerally-quantified (BUQ) ones and parallel-universally-quantified (PUQ) ones. BUQ definitions corresponds to the traditional ones such as those in Prolog where knowledgebase is $not$ expanding and its proof procedure is based on the backward chaining. On the other hand, in PUQ definitions, knowledgebase is $expanding$ and its proof procedure leads to forward chaining and {\it automatic memoization}.

2.A Brief Review of Explainable Artificial Intelligence in Healthcare

Authors:Zahra Sadeghi, Roohallah Alizadehsani, Mehmet Akif Cifci, Samina Kausar, Rizwan Rehman, Priyakshi Mahanta, Pranjal Kumar Bora, Ammar Almasri, Rami S. Alkhawaldeh, Sadiq Hussain, Bilal Alatas, Afshin Shoeibi, Hossein Moosaei, Milan Hladik, Saeid Nahavandi, Panos M. Pardalos

Abstract: XAI refers to the techniques and methods for building AI applications which assist end users to interpret output and predictions of AI models. Black box AI applications in high-stakes decision-making situations, such as medical domain have increased the demand for transparency and explainability since wrong predictions may have severe consequences. Model explainability and interpretability are vital successful deployment of AI models in healthcare practices. AI applications' underlying reasoning needs to be transparent to clinicians in order to gain their trust. This paper presents a systematic review of XAI aspects and challenges in the healthcare domain. The primary goals of this study are to review various XAI methods, their challenges, and related machine learning models in healthcare. The methods are discussed under six categories: Features-oriented methods, global methods, concept models, surrogate models, local pixel-based methods, and human-centric methods. Most importantly, the paper explores XAI role in healthcare problems to clarify its necessity in safety-critical applications. The paper intends to establish a comprehensive understanding of XAI-related applications in the healthcare field by reviewing the related experimental results. To facilitate future research for filling research gaps, the importance of XAI models from different viewpoints and their limitations are investigated.

3.Regularization of the policy updates for stabilizing Mean Field Games

Authors:Talal Algumaei, Ruben Solozabal, Reda Alami, Hakim Hacid, Merouane Debbah, Martin Takac

Abstract: This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm \textit{Mean Field Proximal Policy Optimization (MF-PPO)}, and we empirically show the effectiveness of our method in the OpenSpiel framework.

4.G2PTL: A Pre-trained Model for Delivery Address and its Applications in Logistics System

Authors:Lixia Wu, Jianlin Liu, Junhong Lou, Haoyuan Hu, Jianbin Zheng, Haomin Wen, Chao Song, Shu He

Abstract: Text-based delivery addresses, as the data foundation for logistics systems, contain abundant and crucial location information. How to effectively encode the delivery address is a core task to boost the performance of downstream tasks in the logistics system. Pre-trained Models (PTMs) designed for Natural Language Process (NLP) have emerged as the dominant tools for encoding semantic information in text. Though promising, those NLP-based PTMs fall short of encoding geographic knowledge in the delivery address, which considerably trims down the performance of delivery-related tasks in logistic systems such as Cainiao. To tackle the above problem, we propose a domain-specific pre-trained model, named G2PTL, a Geography-Graph Pre-trained model for delivery address in Logistics field. G2PTL combines the semantic learning capabilities of text pre-training with the geographical-relationship encoding abilities of graph modeling. Specifically, we first utilize real-world logistics delivery data to construct a large-scale heterogeneous graph of delivery addresses, which contains abundant geographic knowledge and delivery information. Then, G2PTL is pre-trained with subgraphs sampled from the heterogeneous graph. Comprehensive experiments are conducted to demonstrate the effectiveness of G2PTL through four downstream tasks in logistics systems on real-world datasets. G2PTL has been deployed in production in Cainiao's logistics system, which significantly improves the performance of delivery-related tasks.

5.PAC-Based Formal Verification for Out-of-Distribution Data Detection

Authors:Mohit Prashant, Arvind Easwaran

Abstract: Cyber-physical systems (CPS) like autonomous vehicles, that utilize learning components, are often sensitive to noise and out-of-distribution (OOD) instances encountered during runtime. As such, safety critical tasks depend upon OOD detection subsystems in order to restore the CPS to a known state or interrupt execution to prevent safety from being compromised. However, it is difficult to guarantee the performance of OOD detectors as it is difficult to characterize the OOD aspect of an instance, especially in high-dimensional unstructured data. To distinguish between OOD data and data known to the learning component through the training process, an emerging technique is to incorporate variational autoencoders (VAE) within systems and apply classification or anomaly detection techniques on their latent spaces. The rationale for doing so is the reduction of the data domain size through the encoding process, which benefits real-time systems through decreased processing requirements, facilitates feature analysis for unstructured data and allows more explainable techniques to be implemented. This study places probably approximately correct (PAC) based guarantees on OOD detection using the encoding process within VAEs to quantify image features and apply conformal constraints over them. This is used to bound the detection error on unfamiliar instances with user-defined confidence. The approach used in this study is to empirically establish these bounds by sampling the latent probability distribution and evaluating the error with respect to the constraint violations that are encountered. The guarantee is then verified using data generated from CARLA, an open-source driving simulator.

6.An Embedding-based Approach to Inconsistency-tolerant Reasoning with Inconsistent Ontologies

Authors:Keyu Wang, Site Li, Jiaye Li, Guilin Qi, Qiu Ji

Abstract: Inconsistency handling is an important issue in knowledge management. Especially in ontology engineering, logical inconsistencies may occur during ontology construction. A natural way to reason with an inconsistent ontology is to utilize the maximal consistent subsets of the ontology. However, previous studies on selecting maximum consistent subsets have rarely considered the semantics of the axioms, which may result in irrational inference. In this paper, we propose a novel approach to reasoning with inconsistent ontologies in description logics based on the embeddings of axioms. We first give a method for turning axioms into distributed semantic vectors to compute the semantic connections between the axioms. We then define an embedding-based method for selecting the maximum consistent subsets and use it to define an inconsistency-tolerant inference relation. We show the rationality of our inference relation by considering some logical properties. Finally, we conduct experiments on several ontologies to evaluate the reasoning power of our inference relation. The experimental results show that our embedding-based method can outperform existing inconsistency-tolerant reasoning methods based on maximal consistent subsets.

7.Using Language Models For Knowledge Acquisition in Natural Language Reasoning Problems

Authors:Fangzhen Lin, Ziyi Shou, Chengcai Chen

Abstract: For a natural language problem that requires some non-trivial reasoning to solve, there are at least two ways to do it using a large language model (LLM). One is to ask it to solve it directly. The other is to use it to extract the facts from the problem text and then use a theorem prover to solve it. In this note, we compare the two methods using ChatGPT and GPT4 on a series of logic word puzzles, and conclude that the latter is the right approach.

8.Grid-SD2E: A General Grid-Feedback in a System for Cognitive Learning

Authors:Jingyi Feng, Chenming Zhang

Abstract: Comprehending how the brain interacts with the external world through generated neural signals is crucial for determining its working mechanism, treating brain diseases, and understanding intelligence. Although many theoretical models have been proposed, they have thus far been difficult to integrate and develop. In this study, we were inspired in part by grid cells in creating a more general and robust grid module and constructing an interactive and self-reinforcing cognitive system together with Bayesian reasoning, an approach called space-division and exploration-exploitation with grid-feedback (Grid-SD2E). Here, a grid module can be used as an interaction medium between the outside world and a system, as well as a self-reinforcement medium within the system. The space-division and exploration-exploitation (SD2E) receives the 0/1 signals of a grid through its space-division (SD) module. The system described in this paper is also a theoretical model derived from experiments conducted by other researchers and our experience on neural decoding. Herein, we analyse the rationality of the system based on the existing theories in both neuroscience and cognitive science, and attempt to propose special and general rules to explain the different interactions between people and between people and the external world. What's more, based on this model, the smallest computing unit is extracted, which is analogous to a single neuron in the brain.

9.Robustness Benchmark of Road User Trajectory Prediction Models for Automated Driving

Authors:Manuel Muñoz Sánchez, Emilia Silvas, Jos Elfring, René van de Molengraft

Abstract: Accurate and robust trajectory predictions of road users are needed to enable safe automated driving. To do this, machine learning models are often used, which can show erratic behavior when presented with previously unseen inputs. In this work, two environment-aware models (MotionCNN and MultiPath++) and two common baselines (Constant Velocity and an LSTM) are benchmarked for robustness against various perturbations that simulate functional insufficiencies observed during model deployment in a vehicle: unavailability of road information, late detections, and noise. Results show significant performance degradation under the presence of these perturbations, with errors increasing up to +1444.8\% in commonly used trajectory prediction evaluation metrics. Training the models with similar perturbations effectively reduces performance degradation, with error increases of up to +87.5\%. We argue that despite being an effective mitigation strategy, data augmentation through perturbations during training does not guarantee robustness towards unforeseen perturbations, since identification of all possible on-road complications is unfeasible. Furthermore, degrading the inputs sometimes leads to more accurate predictions, suggesting that the models are unable to learn the true relationships between the different elements in the data.

10.Risk-Aware Distributed Multi-Agent Reinforcement Learning

Authors:Abdullah Al Maruf, Luyao Niu, Bhaskar Ramasubramanian, Andrew Clark, Radha Poovendran

Abstract: Autonomous cyber and cyber-physical systems need to perform decision-making, learning, and control in unknown environments. Such decision-making can be sensitive to multiple factors, including modeling errors, changes in costs, and impacts of events in the tails of probability distributions. Although multi-agent reinforcement learning (MARL) provides a framework for learning behaviors through repeated interactions with the environment by minimizing an average cost, it will not be adequate to overcome the above challenges. In this paper, we develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions. We use the conditional value-at-risk (CVaR) to characterize the cost function that is being minimized, and define a Bellman operator to characterize the value function associated to a given state-action pair. We prove that this operator satisfies a contraction property, and that it converges to the optimal value function. We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus. We identify several challenges that arise in the implementation of the CVaR QD-Learning algorithm, and present solutions to overcome these. We evaluate the CVaR QD-Learning algorithm through simulations, and demonstrate the effect of a risk parameter on value functions at consensus.