Computer Vision and Pattern Recognition

ActionParty: Multi-Subject Action Binding in Generative Video Games

ActionParty: Multi-Subject Action Binding in G...

Computer Vision and Pattern Recognition
Avatar
Alexander Pondaven
38 views
No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

No Hard Negatives Required: Concept Centric Le...

Computer Vision and Pattern Recognition
Avatar
Hai Pham*
38 views
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Do VLMs Need Vision Transformers? Evaluating S...

Computer Vision and Pattern Recognition
Avatar
librarian
43 views
SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

SAVeS: Steering Safety Judgments in Vision-Lan...

Computer Vision and Pattern Recognition
Avatar
librarian
49 views
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

DreamPartGen: Semantically Grounded Part-Level...

Computer Vision and Pattern Recognition
Avatar
librarian
52 views
Near-perfect photo-ID of the Hula painted frog with zero-shot deep local-feature matching

Near-perfect photo-ID of the Hula painted frog...

Computer Vision and Pattern Recognition
Avatar
yoavram
116 views
Multilayer Graph Approach to Deep Subspace Clustering

Multilayer Graph Approach to Deep Subspace Clu...

Computer Vision and Pattern Recognition
Avatar
lovro-sindicic
119 views
Label-independent hyperparameter-free self-supervised single-view deep subspace clustering

Label-independent hyperparameter-free self-sup...

Computer Vision and Pattern Recognition
Avatar
lovro-sindicic
131 views
PersonaLive! Expressive Portrait Image Animation for Live Streaming

PersonaLive! Expressive Portrait Image Animati...

Computer Vision and Pattern Recognition
Avatar
Grisha Samokhin
135 views
Mull-Tokens: Modality-Agnostic Latent Thinking

Mull-Tokens: Modality-Agnostic Latent Thinking

Computer Vision and Pattern Recognition
Avatar
librarian
151 views
Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection

Linear Gaussian Bounding Box Representation an...

Computer Vision and Pattern Recognition
Avatar
rahulraj Kk
131 views
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer
  Memory

Point3R: Streaming 3D Reconstruction with Expl...

Computer Vision and Pattern Recognition
Avatar
librarian
440 views
FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation

FADRM: Fast and Accurate Data Residual Matchin...

Computer Vision and Pattern Recognition
Avatar
librarian
411 views
HalluSegBench: Counterfactual Visual Reasoning for Segmentation
  Hallucination Evaluation

HalluSegBench: Counterfactual Visual Reasoning...

Computer Vision and Pattern Recognition
Avatar
librarian
497 views
Whole-Body Conditioned Egocentric Video Prediction

Whole-Body Conditioned Egocentric Video Prediction

Computer Vision and Pattern Recognition
Avatar
librarian
469 views
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven
  Thinking and Visual Drawing

Reinforcing Spatial Reasoning in Vision-Langua...

Computer Vision and Pattern Recognition
Avatar
librarian
560 views
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over
  Videos

Outside Knowledge Conversational Video (OKCV) ...

Computer Vision and Pattern Recognition
Avatar
librarian
439 views
Decoupling the Image Perception and Multimodal Reasoning for Reasoning
  Segmentation with Digital Twin Representations

Decoupling the Image Perception and Multimodal...

Computer Vision and Pattern Recognition
Avatar
librarian
589 views
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via
  Spatial Reasoning

Direct Numerical Layout Generation for 3D Indo...

Computer Vision and Pattern Recognition
Avatar
librarian
608 views
Refer to Anything with Vision-Language Prompts

Refer to Anything with Vision-Language Prompts

Computer Vision and Pattern Recognition
Avatar
Shengcao Cao
587 views
Thinking with Generated Images

Thinking with Generated Images

Computer Vision and Pattern Recognition
Avatar
librarian
559 views
Let Androids Dream of Electric Sheep: A Human-like Image Implication
  Understanding and Reasoning Framework

Let Androids Dream of Electric Sheep: A Human-...

Computer Vision and Pattern Recognition
Avatar
Anastasia Kokkanen
623 views
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Delving into RL for Image Generation with CoT:...

Computer Vision and Pattern Recognition
Avatar
librarian
556 views
Let Androids Dream of Electric Sheep: A Human-like Image Implication
  Understanding and Reasoning Framework

Let Androids Dream of Electric Sheep: A Human-...

Computer Vision and Pattern Recognition
Avatar
librarian
575 views
SpatialScore: Towards Unified Evaluation for Multimodal Spatial
  Understanding

SpatialScore: Towards Unified Evaluation for M...

Computer Vision and Pattern Recognition
Avatar
Haoning Wu
585 views
VTBench: Evaluating Visual Tokenizers for Autoregressive Image
  Generation

VTBench: Evaluating Visual Tokenizers for Auto...

Computer Vision and Pattern Recognition
Avatar
librarian
623 views
Does Feasibility Matter? Understanding the Impact of Feasibility on
  Synthetic Training Data

Does Feasibility Matter? Understanding the Imp...

Computer Vision and Pattern Recognition
Avatar
librarian
552 views
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal
  Mathematical Reasoning

MathCoder-VL: Bridging Vision and Code for Enh...

Computer Vision and Pattern Recognition
Avatar
librarian
626 views
StreamBridge: Turning Your Offline Video Large Language Model into a
  Proactive Streaming Assistant

StreamBridge: Turning Your Offline Video Large...

Computer Vision and Pattern Recognition
Avatar
librarian
584 views
Flow-GRPO: Training Flow Matching Models via Online RL

Flow-GRPO: Training Flow Matching Models via O...

Computer Vision and Pattern Recognition
Avatar
Jie Liu
846 views
DEIM: DETR with Improved Matching for Fast Convergence

DEIM: DETR with Improved Matching for Fast Con...

Computer Vision and Pattern Recognition
Avatar
huang shihua
661 views
DEIM: DETR with Improved Matching for Fast Convergence

DEIM: DETR with Improved Matching for Fast Con...

Computer Vision and Pattern Recognition
Avatar
huang shihua
605 views