Aliasing in Convnets: A Frame-Theoretic Perspective

0upvotes

By: Daniel Haider, Vincent Lostanlen, Martin Ehler, Nicki Holighaus, Peter Balazs

Using a stride in a convolutional layer inherently introduces aliasing, which has implications for numerical stability and statistical generalization. While techniques such as the parametrizations via paraunitary systems have been used to promote orthogonal convolution and thus ensure Parseval stability, a general analysis of aliasing and its effects on the stability has not been done in this context. In this article, we adapt a frame-theoret... more

Machine LearningJuly 9, 2025 4:12am

4 SciCasts by .

Comments (0)
Views (0)

Differential Mamba

0upvotes

By: Nadav Schneider, Itamar Zimerman, Eliya Nachmani

Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hallucinations, weakening long-range and retrieval abilities, and reducing robustness. Recent work has shown that differential design can mitigate this issue in Transformers, improving their effectiveness across various applications. In this paper, we explore ... more

Machine LearningJuly 9, 2025 3:13am

Comments (0)
Views (0)

QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions

0upvotes

By: Zhun Deng, Thomas P Zollo, Benjamin Eyre, Amogh Inamdar, David Madras, Richard Zemel

As machine learning models grow increasingly competent, their predictions can supplement scarce or expensive data in various important domains. In support of this paradigm, algorithms have emerged to combine a small amount of high-fidelity observed data with a much larger set of imputed model outputs to estimate some quantity of interest. Yet current hybrid-inference tools target only means or single quantiles, limiting their applicability fo... more

Machine LearningJuly 8, 2025 10:41am

Comments (0)
Views (1)

Replacing thinking with tool usage enables reasoning in small language models

0upvotes

By: Corrado Rainone, Tim Bakker, Roland Memisevic

Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of "thoughts" expressed in natural language. In this pap... more

Machine LearningJuly 8, 2025 6:56am

Comments (0)
Views (1)

Meta-Learning Transformers to Improve In-Context Generalization

0upvotes

By: Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an altern... more

Machine LearningJuly 8, 2025 6:12am

Comments (0)
Views (1)

Train-before-Test Harmonizes Language Model Rankings

0upvotes

By: Guanhua Zhang, Ricardo Dominguez-Olmedo, Moritz Hardt

Existing language model benchmarks provide contradictory model rankings, even for benchmarks that aim to capture similar skills. This dilemma of conflicting rankings hampers model selection, clouds model comparisons, and adds confusion to a growing ecosystem of competing models. Recent work attributed ranking disagreement to the phenomenon of training on the test task: As released, different models exhibit a different level of preparation for... more

Machine LearningJuly 8, 2025 6:12am

Comments (0)
Views (1)

Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens

0upvotes

By: Konstantin Nikolaou, Sven Krippendorf, Samuel Tovey, Christian Holm

Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK). This analysis establishes a link between performance scaling and the internal dynamics of neural networks. Our findings of s... more

Machine LearningJuly 8, 2025 5:35am

Comments (0)
Views (1)

Cascade: Token-Sharded Private LLM Inference

0upvotes

By: Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal

As LLMs continue to increase in parameter size, the computational resources required to run them are available to fewer parties. Therefore, third-party inference services -- where LLMs are hosted by third parties with significant computational resources -- are becoming increasingly popular. However, third party inference raises critical concerns about user data privacy. To mitigate these risks, privacy researchers have developed provably secu... more

Machine LearningJuly 8, 2025 5:35am

Comments (0)
Views (7)

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

0upvotes

By: Kunyu Zhang, Qiang Li, Shujian Yu

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatica... more

Machine LearningJuly 5, 2025 5:42pm

Comments (0)
Views (3)

Replicable Distribution Testing

0upvotes

By: Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye

We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributi... more

Machine LearningJuly 5, 2025 5:11pm

Comments (0)
Views (3)

In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization

0upvotes

By: Thiti Suttaket, Stanley Kok

Survival analysis is an important problem in healthcare because it models the relationship between an individual's covariates and the onset time of an event of interest (e.g., death). It is important for survival models to be well-calibrated (i.e., for their predicted probabilities to be close to ground-truth probabilities) because badly calibrated systems can result in erroneous clinical decisions. Existing survival models are typically cali... more

Machine LearningJuly 4, 2025 5:56am

Comments (0)
Views (4)

High-Order Deep Meta-Learning with Category-Theoretic Interpretation

0upvotes

By: David H. Mguni

We introduce a new hierarchical deep learning framework for recursive higher-order meta-learning that enables neural networks (NNs) to construct, solve, and generalise across hierarchies of tasks. Central to this approach is a generative mechanism that creates \emph{virtual tasks} -- synthetic problem instances designed to enable the meta-learner to learn \emph{soft constraints} and unknown generalisable rules across related tasks. Crucially,... more

Machine LearningJuly 4, 2025 3:56am

Comments (0)
Views (4)

LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

0upvotes

By: Yuchen Ma, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are... more

Machine LearningJuly 4, 2025 3:26am

Comments (0)
Views (7)

Fast and Simplex: 2-Simplicial Attention in Triton

0upvotes

By: Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is... more

Machine LearningJuly 4, 2025 3:17am

Comments (0)
Views (5)

Understanding and Improving Length Generalization in Recurrent Models

0upvotes

By: Ricardo Buitrago Ruiz, Albert Gu

Recently, recurrent models such as state space models and linear attention have become popular due to their linear complexity in the sequence length. Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths-i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to s... more

Machine LearningJuly 4, 2025 3:17am

Comments (0)
Views (5)

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

0upvotes

By: Ruiyang Zhou, Shuozhe Li, Amy Zhang, Liu Leqi

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model's initial ability to produce positive samples. They primarily refine what the model al... more

Machine LearningJuly 4, 2025 3:16am

Comments (0)
Views (4)

Revisiting Learning Rate Control

0upvotes

By: Micha Henheik, Theresa Eimer, Marius Lindauer

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed... more

Machine LearningJuly 3, 2025 6:41am