Learning hierarchical sequence representations across human cortex and hippocampus

Learning hierarchical sequence representations across human cortex and hippocampus

Humans experience sensory input continuously as segmented units of words and events. The ability of the brain to discover regularities is known as statistical learning. This concept can be represented at multiple levels including transitional probabilities and the identity of units. In a new report now published on Science Advances, Simon Henin and a team of scientists at the New York University School of Medicine, Yale University and the Max Planck Institute in the U.S. and Germany recorded sequence encoding in the cortex and hippocampus of human subjects exposed to auditory and visual sequences with temporal (time-based) regularities. Using early processing, they tracked lower-level features such as syllables and learned units including words, while later processing could only track learning units. The findings showed the existence of multiple parallel computational systems in humans to assist learning across organized cortico-hippocampal units.

Understanding the code of speech

We receive and experience continuous input from the world in digestible chunks. For example, with language, humans can acquire and extract meaningful sequences including sentences, words and phrases from a continuous stream of sounds without clear acoustic boundaries or pauses between linguistic elements. This segmentation happens incidentally and effortlessly as a core building block during development. The behavior of learning transitional probabilities between syllables or shapes in infants or adults are known as “statistical learning”. However, the mechanism of the brain supporting such cognitive functions are poorly understood. It is well known for brain regions such as the hippocampus and the inferior frontal gyrus (IFG) to aid in visual and auditory statistical learning. To understand this process, Henin et al. conducted intracranial recordings from 23 human epilepsy patients to provide mechanistic insight into the fundamental process of human learning relative to cortical areas that respond to the structure of the world. The findings highlighted neural frequency tagging (NFT) as a versatile tool to investigate incidental learning in preverbal and nonverbal patient populations.

Pattern similarity results during auditory SL. Multidimensional scaling (MDS) of the distances between syllabic responses across electrodes showing significant (A) word + syll responses and (B) word-only responses, as well as (C) across electrodes from the hippocampus. Individual words are color-coded; subscripts represent ordinal position (e.g., “tu1pi2ro3”). Dot-dashed ellipses indicate grouping by TP, solid ellipses outline grouping by ordinal position, and dashed ellipses indicate grouping at the level of the individual words (color-coded). (D) Quantification of multivariate similarity for syllables in the auditory SL task. Left: Similarity by TP. Greater within-class similarity indicates stronger grouping of syllables with low TP (0.33) than syllables with high TP (1.0). A Friedman test indicated a main effect of electrode type on TP similarity (χ2 = 22.03, P < 0.001). Middle: Within versus between similarity for ordinal position. Greater within-class similarity indicates stronger grouping of syllables holding the same first, second, or third position in a word. A Friedman test indicated a significant main effect of electrode type (χ2 = 790.35, P < 0.001). Right: Within versus between similarity for word identity. Greater within-class similarity indicates grouping of syllables into individual words. A Friedman test indicated a significant main effect of electrode type (χ2 = 265.29, P < 0.001). ***P < 0.001 and **P < 0.01, Bonferroni-corrected Wilcoxon rank sum test; error bars denote the population SEM. Credit: Science Advances, doi: 10.1126/sciadv.abc4530

Behavioral evidence of auditory statistical learning

Henin et al. studied the neural circuits and computation underlying statical learning by presenting 17 participants with auditory streams of syllables after manipulating the structure of the sequence. The team placed each syllable into the first, second and third position of a three-syllable word or a triplet in such structured streams. The resulting transitional probabilities were low and uniform without a word level of segmentation. During the auditory tasks, they generated 12 consonant-vowel syllables using MacTalk and concatenated them using MATLAB software to create two sequences: a structured and random word sequence. In the structured sequence, Henin et al. manipulated the transitional probabilities between syllables so that four hidden words could be embedded in sequence to create a continuous artificial language stream. They represented the underlying syllable presentation rate at 4 Hz and the word rate at 1.33 Hz. The team did not inform the participants of the structure but asked them to perform a cover task instead, where they indicated syllable repetitions randomly embedded in the auditory streams.

Neural tracking of auditory statistical learning

Henin et al. obtained direct neurophysiological signals from 1898 intracranial electrodes in 17 participants to comprehensively cover the frontal parietal, occipital and temporal lobes as well as the hippocampus in both hemispheres. The participants performed a two-alternative forced choice (2AFC) task where they listened to the two audio segments presented one after the other to select the stream containing one of the hidden words. The scientists noted the responses to originate predominantly in somatosensory/motor and temporal cortices. On average, they noted significantly increased word-rate coherence in the structured stream but not in the random stream, to support the sensitive and robust applications of NFT (neural frequency tagging) to assess online statistical learning. Using NFT, they tracked the representation of segmented units at two hierarchical levels of the stream and then tested the within-electrode phase coherence in the field potential and gamma band in the respective structured and random streams. Using electrocorticography, they showed the location of both words and syllable coherence to have occurred mainly in the superior temporal gyrus (STG) with smaller clusters in the motor cortex and pars opercularis. In parallel, the other tuning profile reflected electrodes with significant coherence exclusively at the word rate only with locations in the inferior frontal gyrus and the anterior temporal lobe (ATL). The anatomical grouping highlighted the neuroanatomy of the auditory processing hierarchy.

Analyzing auditory statistical learning and testing visual statistical learning.

To understand the results of neural frequency tagging (NFT), Henin et al. examined the segmentation driving the outcome, and based this on three statistical cues in the stream; including (1) transitional probabilities, (2) ordinal position or (3) word identity to facilitate unique cognitive functions. As with auditory statistical learning tasks, the team performed visual statistical learning tasks with the patient groups, where the team formed fractals using similar sets of images as those used in previous work. As before, the participants were not informed of the structure, but they performed a cover task. Henin et al. then used NFT to identify the brain areas exhibiting statistical learning in neurophysiological recordings from 1606 intracranial electrodes in 12 patients to cover the frontal, parietal temporal and occipital cortex. As with auditory statistical learning, they observed anatomical and hierarchical segregation between two temporal tuning profiles of electrodes, where one showed significant entrainment at the fractal and pair rates—mostly clustered in the occipital and parietal cortex, while the other showed significant entrainment of pair only rates, in the frontal, parietal and temporal cortex.

Pattern similarity results during visual SL. MDS of the distances between responses to individual fractals across (A) pair-only, (B) pair + fractal, and (C) hippocampal electrodes. Pairs are color-coded; odd numbers refer to the first position, and even numbers refer to the second position. Dot-dashed ellipses outline grouping by TP/ordinal position in pair + fractal electrodes. Solid ellipses outline grouping by TP/ordinal position in pair-only electrodes. Dashed ellipses indicate grouping by pair in pair-only and hippocampal electrodes. (D) Comparison of multivariate pattern similarity for fractals in the visual SL task. Left: Within versus between similarity for low versus high TP. Greater within-class similarity indicates stronger grouping of fractals with a low TP (0.33) over fractals with a high TP (1.0). A Friedman test indicated a main effect of electrode type on TP similarity (χ2 = 19.3, P < 0.001). Middle: Within versus between similarity for ordinal position. Greater within-class similarity indicates grouping of fractals holding the same first or second position in a pair. A Friedman test indicated a main effect of electrode type (χ2 = 122.2, P < 0.001). Right: Within versus between similarity for pair identity. Greater within-class similarity indicates grouping of fractals into pairs. A Friedman test indicated a main effect of electrode type (χ2 = 40.04, P < 0.001). ***P < 001 and *P < 0.05, Wilcoxon rank sum test; error bars denote the population SEM. Credit: Science Advances, doi: 10.1126/sciadv.abc4530

Outlook

In this way, Simon Henin and colleagues used intracranial recordings in humans to describe how the brain tracks and learns structure within sensory information. The statistical learning process accompanied rapid changes in neural representations reflected in two functionally and anatomically distinct brain responses. These distinct responses revealed an anatomical hierarchy, which they mapped into early sensory processing stages in the superior temporal gyrus and occipital cortex. The team also mapped late, amodal processing stages in the inferior frontal gyrus and anterior temporal lobe. The patients extracted and represented nested structures within sensory streams in the brain in as little as two minutes, even when they were not aware of the process.


Source: Read Full Article