Predictive vision-language integration in the human visual cortex
Predictive vision-language integration in the human visual cortex
Li, S.; Jin, Z.; Zhang, R.-Y.; Gu, S.; Li, Y.
AbstractIntegrating linguistic and visual information is a core function of human cognition, yet how information from these two modalities interacts in the brain remains largely unknown. Competing frameworks, including the hub-and-spoke model and Bayesian theories such as predictive coding, offer conflicting accounts of how the brain achieves multimodal integration. To address this question, we collected a large-scale fMRI dataset and leveraged state-of-the-art AI systems to construct encoding models that probe how the human brain matches and integrates linguistic and visual information. We found that prior information from one modality can modulate neural responses in another, even in the early visual cortex (EVC). Integration neural response in EVC is governed by prediction errors consistent with predictive coding theory. Enhanced and suppressed neural responses to semantically matched cross-modal stimuli were found in distinct EVC populations, with suppression population carrying denser, behaviorally relevant semantic information. Both populations support semantic integration with distinct temporal dynamics and representational structures. These findings provide representational- and computational-level insights into how the brain integrates information across modalities, revealing unified principles of information processing that link biological and artificial intelligence.