arXiv daily

Machine Learning (cs.LG)

Wed, 30 Aug 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.FedCiR: Client-Invariant Representation Learning for Federated Non-IID Features

Authors:Zijian Li, Zehong Lin, Jiawei Shao, Yuyi Mao, Jun Zhang

Abstract: Federated learning (FL) is a distributed learning paradigm that maximizes the potential of data-driven models for edge devices without sharing their raw data. However, devices often have non-independent and identically distributed (non-IID) data, meaning their local data distributions can vary significantly. The heterogeneity in input data distributions across devices, commonly referred to as the feature shift problem, can adversely impact the training convergence and accuracy of the global model. To analyze the intrinsic causes of the feature shift problem, we develop a generalization error bound in FL, which motivates us to propose FedCiR, a client-invariant representation learning framework that enables clients to extract informative and client-invariant features. Specifically, we improve the mutual information term between representations and labels to encourage representations to carry essential classification knowledge, and diminish the mutual information term between the client set and representations conditioned on labels to promote representations of clients to be client-invariant. We further incorporate two regularizers into the FL framework to bound the mutual information terms with an approximate global representation distribution to compensate for the absence of the ground-truth global representation distribution, thus achieving informative and client-invariant feature extraction. To achieve global representation distribution approximation, we propose a data-free mechanism performed by the server without compromising privacy. Extensive experiments demonstrate the effectiveness of our approach in achieving client-invariant representation learning and solving the data heterogeneity issue.

2.Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

Authors:Hritik Bansal, John Dang, Aditya Grover

Abstract: Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback presents a structural design choice between ratings (e.g., score Response A on a scale of 1-7) and rankings (e.g., is Response A better than Response B?). In this work, we analyze the effect of this design choice for the alignment and evaluation of LLMs. We uncover an inconsistency problem wherein the preferences inferred from ratings and rankings significantly disagree 60% for both human and AI annotators. Our subsequent analysis identifies various facets of annotator biases that explain this phenomena, such as human annotators would rate denser responses higher while preferring accuracy during pairwise judgments. To our surprise, we also observe that the choice of feedback protocol also has a significant effect on the evaluation of aligned LLMs. In particular, we find that LLMs that leverage rankings data for alignment (say model X) are preferred over those that leverage ratings data (say model Y), with a rank-based evaluation protocol (is X/Y's response better than reference response?) but not with a rating-based evaluation protocol (score Rank X/Y's response on a scale of 1-7). Our findings thus shed light on critical gaps in methods for evaluating the real-world utility of language models and their strong dependence on the feedback protocol used for alignment. Our code and data are available at https://github.com/Hritikbansal/sparse_feedback.

3.Federated Two Stage Decoupling With Adaptive Personalization Layers

Authors:Hangyu Zhu, Yuxiang Fan, Zhenping Xie

Abstract: Federated learning has gained significant attention due to its groundbreaking ability to enable distributed learning while maintaining privacy constraints. However, as a consequence of data heterogeneity among decentralized devices, it inherently experiences significant learning degradation and slow convergence speed. Therefore, it is natural to employ the concept of clustering homogeneous clients into the same group, allowing only the model weights within each group to be aggregated. While most existing clustered federated learning methods employ either model gradients or inference outputs as metrics for client partitioning, with the goal of grouping similar devices together, may still have heterogeneity within each cluster. Moreover, there is a scarcity of research exploring the underlying reasons for determining the appropriate timing for clustering, resulting in the common practice of assigning each client to its own individual cluster, particularly in the context of highly non independent and identically distributed (Non-IID) data. In this paper, we introduce a two-stage decoupling federated learning algorithm with adaptive personalization layers named FedTSDP, where client clustering is performed twice according to inference outputs and model weights, respectively. Hopkins amended sampling is adopted to determine the appropriate timing for clustering and the sampling weight of public unlabeled data. In addition, a simple yet effective approach is developed to adaptively adjust the personalization layers based on varying degrees of data skew. Experimental results show that our proposed method has reliable performance on both IID and non-IID scenarios.

4.MSGNN: Multi-scale Spatio-temporal Graph Neural Network for Epidemic Forecasting

Authors:Mingjie Qiu, Zhiyi Tan, Bing-kun Bao

Abstract: Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecast-ing models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) Current models broaden receptive fields by scaling the depth of GNNs, which is insuffi-cient to preserve the semantics of long-range connectivity between distant but epidemic related areas. (2) Previous approaches model epidemics within single spatial scale, while ignoring the multi-scale epidemic pat-terns derived from different scales. To address these deficiencies, we devise the Multi-scale Spatio-temporal Graph Neural Network (MSGNN) based on an innovative multi-scale view. To be specific, in the proposed MSGNN model, we first devise a novel graph learning module, which directly captures long-range connectivity from trans-regional epidemic signals and integrates them into a multi-scale graph. Based on the learned multi-scale graph, we utilize a newly designed graph convolution module to exploit multi-scale epidemic patterns. This module allows us to facilitate multi-scale epidemic modeling by mining both scale-shared and scale-specific pat-terns. Experimental results on forecasting new cases of COVID-19 in United State demonstrate the superiority of our method over state-of-arts. Further analyses and visualization also show that MSGNN offers not only accurate, but also robust and interpretable forecasting result.

5.Domain Generalization without Excess Empirical Risk

Authors:Ozan Sener, Vladlen Koltun

Abstract: Given data from diverse sets of distinct distributions, domain generalization aims to learn models that generalize to unseen distributions. A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty. We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization. We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk. This change guarantees that the domain generalization penalty cannot impair optimization of the empirical risk, i.e., in-distribution performance. To solve the proposed optimization problem, we demonstrate an exciting connection to rate-distortion theory and utilize its tools to design an efficient method. Our approach can be applied to any penalty-based domain generalization method, and we demonstrate its effectiveness by applying it to three examplar methods from the literature, showing significant improvements.

6.Minimum Width for Deep, Narrow MLP: A Diffeomorphism and the Whitney Embedding Theorem Approach

Authors:Geonho Hwang

Abstract: Recently, there has been significant attention on determining the minimum width for the universal approximation property of deep, narrow MLPs. Among these challenges, approximating a continuous function under the uniform norm is important and challenging, with the gap between its lower and upper bound being hard to narrow. In this regard, we propose a novel upper bound for the minimum width, given by $\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$, to achieve uniform approximation in deep narrow MLPs, where $0\leq \alpha(\sigma)\leq 2$ represents the constant depending on the activation function. We demonstrate this bound through two key proofs. First, we establish that deep, narrow MLPs with little additional width can approximate diffeomorphisms. Secondly, we utilize the Whitney embedding theorem to show that any continuous function can be approximated by embeddings, further decomposed into linear transformations and diffeomorphisms.

7.Towards One-Shot Learning for Text Classification using Inductive Logic Programming

Authors:Ghazal Afroozi Milani University of Surrey, Daniel Cyrus University of Surrey, Alireza Tamaddoni-Nezhad University of Surrey

Abstract: With the ever-increasing potential of AI to perform personalised tasks, it is becoming essential to develop new machine learning techniques which are data-efficient and do not require hundreds or thousands of training data. In this paper, we explore an Inductive Logic Programming approach for one-shot text classification. In particular, we explore the framework of Meta-Interpretive Learning (MIL), along with using common-sense background knowledge extracted from ConceptNet. Results indicate that MIL can learn text classification rules from a small number of training examples. Moreover, the higher complexity of chosen examples, the higher accuracy of the outcome.

8.Cyclophobic Reinforcement Learning

Authors:Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling

Abstract: In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.

9.Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs

Authors:Jiani Liu, Qinghua Tao, Ce Zhu, Yipeng Liu, Xiaolin Huang, Johan A. K. Suykens

Abstract: Multitask learning (MTL) leverages task-relatedness to enhance performance. With the emergence of multimodal data, tasks can now be referenced by multiple indices. In this paper, we employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices and preserve their structural relations. Based on this representation, we propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs), where the CP factorization is deployed over the coefficient tensor. Our approach allows to model the task relation through a linear combination of shared factors weighted by task-specific factors and is generalized to both classification and regression problems. Through the alternating optimization scheme and the Lagrangian function, each subproblem is transformed into a convex problem, formulated as a quadratic programming or linear system in the dual form. In contrast to previous MTL frameworks, our decision function in the dual induces a weighted kernel function with a task-coupling term characterized by the similarities of the task-specific factors, better revealing the explicit relations across tasks in MTL. Experimental results validate the effectiveness and superiority of our proposed methods compared to existing state-of-the-art approaches in MTL. The code of implementation will be available at https://github.com/liujiani0216/TSVM-MTL.

10.Consensus of state of the art mortality prediction models: From all-cause mortality to sudden death prediction

Authors:Dr Yola Jones, Dr Fani Deligianni, Dr Jeff Dalton, Dr Pierpaolo Pellicori, Professor John G F Cleland

Abstract: Worldwide, many millions of people die suddenly and unexpectedly each year, either with or without a prior history of cardiovascular disease. Such events are sparse (once in a lifetime), many victims will not have had prior investigations for cardiac disease and many different definitions of sudden death exist. Accordingly, sudden death is hard to predict. This analysis used NHS Electronic Health Records (EHRs) for people aged $\geq$50 years living in the Greater Glasgow and Clyde (GG\&C) region in 2010 (n = 380,000) to try to overcome these challenges. We investigated whether medical history, blood tests, prescription of medicines, and hospitalisations might, in combination, predict a heightened risk of sudden death. We compared the performance of models trained to predict either sudden death or all-cause mortality. We built six models for each outcome of interest: three taken from state-of-the-art research (BEHRT, Deepr and Deep Patient), and three of our own creation. We trained these using two different data representations: a language-based representation, and a sparse temporal matrix. We used global interpretability to understand the most important features of each model, and compare how much agreement there was amongst models using Rank Biased Overlap. It is challenging to account for correlated variables without increasing the complexity of the interpretability technique. We overcame this by clustering features into groups and comparing the most important groups for each model. We found the agreement between models to be much higher when accounting for correlated variables. Our analysis emphasises the challenge of predicting sudden death and emphasises the need for better understanding and interpretation of machine learning models applied to healthcare applications.

11.Application of Zone Method based Machine Learning and Physics-Informed Neural Networks in Reheating Furnaces

Authors:Ujjal Kr Dutta, Aldo Lipani, Chuan Wang, Yukun Hu

Abstract: Despite the high economic relevance of Foundation Industries, certain components like Reheating furnaces within their manufacturing chain are energy-intensive. Notable energy consumption reduction could be obtained by reducing the overall heating time in furnaces. Computer-integrated Machine Learning (ML) and Artificial Intelligence (AI) powered control systems in furnaces could be enablers in achieving the Net-Zero goals in Foundation Industries for sustainable manufacturing. In this work, due to the infeasibility of achieving good quality data in scenarios like reheating furnaces, classical Hottel's zone method based computational model has been used to generate data for ML and Deep Learning (DL) based model training via regression. It should be noted that the zone method provides an elegant way to model the physical phenomenon of Radiative Heat Transfer (RHT), the dominating heat transfer mechanism in high-temperature processes inside heating furnaces. Using this data, an extensive comparison among a wide range of state-of-the-art, representative ML and DL methods has been made against their temperature prediction performances in varying furnace environments. Owing to their holistic balance among inference times and model performance, DL stands out among its counterparts. To further enhance the Out-Of-Distribution (OOD) generalization capability of the trained DL models, we propose a Physics-Informed Neural Network (PINN) by incorporating prior physical knowledge using a set of novel Energy-Balance regularizers. Our setup is a generic framework, is geometry-agnostic of the 3D structure of the underlying furnace, and as such could accommodate any standard ML regression model, to serve as a Digital Twin of the underlying physical processes, for transitioning Foundation Industries towards Industry 4.0.

12.Advanced Deep Regression Models for Forecasting Time Series Oil Production

Authors:Siavash Hosseini, Thangarajah Akilan

Abstract: Global oil demand is rapidly increasing and is expected to reach 106.3 million barrels per day by 2040. Thus, it is vital for hydrocarbon extraction industries to forecast their production to optimize their operations and avoid losses. Big companies have realized that exploiting the power of deep learning (DL) and the massive amount of data from various oil wells for this purpose can save a lot of operational costs and reduce unwanted environmental impacts. In this direction, researchers have proposed models using conventional machine learning (ML) techniques for oil production forecasting. However, these techniques are inappropriate for this problem as they can not capture historical patterns found in time series data, resulting in inaccurate predictions. This research aims to overcome these issues by developing advanced data-driven regression models using sequential convolutions and long short-term memory (LSTM) units. Exhaustive analyses are conducted to select the optimal sequence length, model hyperparameters, and cross-well dataset formation to build highly generalized robust models. A comprehensive experimental study on Volve oilfield data validates the proposed models. It reveals that the LSTM-based sequence learning model can predict oil production better than the 1-D convolutional neural network (CNN) with mean absolute error (MAE) and R2 score of 111.16 and 0.98, respectively. It is also found that the LSTM-based model performs better than all the existing state-of-the-art solutions and achieves a 37% improvement compared to a standard linear regression, which is considered the baseline model in this work.

13.survex: an R package for explaining machine learning survival models

Authors:Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N. Wright, Przemysław Biecek

Abstract: Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explaining any survival model by applying explainable artificial intelligence techniques. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, transparency and responsibility may be promoted in sensitive areas, such as biomedical research and healthcare applications.

14.Spatial Graph Coarsening: Weather and Weekday Prediction with London's Bike-Sharing Service using GNN

Authors:Yuta Sato, Pak Hei Lam, Shruti Gupta, Fareesah Hussain

Abstract: This study introduced the use of Graph Neural Network (GNN) for predicting the weather and weekday of a day in London, from the dataset of Santander Cycles bike-sharing system as a graph classification task. The proposed GNN models newly introduced (i) a concatenation operator of graph features with trained node embeddings and (ii) a graph coarsening operator based on geographical contiguity, namely "Spatial Graph Coarsening". With the node features of land-use characteristics and number of households around the bike stations and graph features of temperatures in the city, our proposed models outperformed the baseline model in cross-entropy loss and accuracy of the validation dataset.