<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2767-0279</journal-id>
<journal-title-group>
<journal-title>Glossa Psycholinguistics</journal-title>
</journal-title-group>
<issn pub-type="epub">2767-0279</issn>
<publisher>
<publisher-name>eScholarship Publishing</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5070/G601140229</article-id>
<article-categories>
<subj-group>
<subject>Regular article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Strong evidence for maintenance of gradient representations during language processing</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Bushong</surname>
<given-names>Wednesday</given-names>
</name>
<email>wb104@wellesley.edu</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Wellesley College, US</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-05-05">
<day>05</day>
<month>05</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<elocation-id>15</elocation-id>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2025 The Author(s)</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://glossapsycholinguistics.journalpub.escholarship.org/articles/10.5070/G601140229/"/>
<abstract>
<p>To what degree listeners can maintain gradient subcategorical information about speech input in memory over time has been a matter of considerable debate. The literature has largely lacked formal computational models of potential mechanisms against which to compare human behavior. Here, we formalize several competing cognitive models of this process and quantitatively compare them to data from a series of behavioral experiments. We find consistently strong evidence in favor of models which allow for maintenance of subcategorical information over the course of an utterance. These results suggests that listeners are able to maintain relatively fine-grained details about prior linguistic input over long perceptual timescales. This work also highlights the importance of formalizing cognitive models of behavior to distinguish between competing theoretical mechanisms.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>1. Introduction</title>
<p>Spoken language understanding is a complex cognitive activity. Listeners need to decode their interlocutors&#8217; intended meaning from the acoustic signal they produce, but the percept of this signal is high-dimensional and corrupted by listener-internal, speaker-internal, and environmental noise. Thus, any given cue in the signal inevitably leaves some degree of uncertainty about the underlying linguistic unit (e.g. phonemes, syllables, words). In speech perception, this is typically referred to as the <italic>lack of invariance</italic> problem: there is no one-to-one mapping between acoustic cues and phonetic units (<xref ref-type="bibr" rid="B34">Lisker &amp; Abramson, 1967</xref>). The fundamental problem of real-time spoken language processing, then, is how listeners arrive at decisions about the identity of linguistic categories (like sounds, words, and syntactic structures) from inconclusive evidence.</p>
<p>One way for listeners to mitigate this problem is to make inferences based on multiple cues in the input, reducing (but not fully eliminating) uncertainty. In speech, each segment contains a multitude of cues that are relevant to inferring its category. In American English, for example, syllable-initial voicing (distinguishing, e.g., /t/ vs. /d/) is cued by voice-onset time, fundamental frequency, speech rate, burst duration, and other acoustic properties (<xref ref-type="bibr" rid="B12">Cooper et al., 1952</xref>; <xref ref-type="bibr" rid="B27">Kingston &amp; Diehl, 1994</xref>; <xref ref-type="bibr" rid="B33">Liberman, 1957</xref>; <xref ref-type="bibr" rid="B47">Port, 1979</xref>). Listeners are able to combine these phonetic cues to infer an underlying phonemic category (<xref ref-type="bibr" rid="B33">Liberman, 1957</xref>; <xref ref-type="bibr" rid="B35">Lisker &amp; Abramson, 1970</xref>). However, cues to segment identity do not always appear concurrently; they are also temporally distributed across the signal. For example, an important cue to syllable-final voicing is the duration of the <italic>previous</italic> vowel (<xref ref-type="bibr" rid="B28">Klatt, 1976</xref>). This temporal distribution also includes higher-level cues beyond acoustics. Later lexical context, for example, can provide cues to the identity of earlier segments &#8211; e.g., <italic>-ask</italic> following a segment acoustically manipulated to range between /t-d/ suggests that the earlier segment was more likely to be /t/ (<italic>task</italic> is a word, while <italic>dask</italic> is not). Indeed, listeners integrate these lexical cues with earlier acoustic information in spoken word recognition experiments (<xref ref-type="bibr" rid="B18">Ganong, 1980</xref>).<xref ref-type="fn" rid="n1">1</xref> This effect is particularly striking because it implies that listeners can maintain subcategorical information about the initial segment /t-d/ in memory over the course of the word: in order to successfully integrate early acoustic and later lexical cues, listeners must have access to the early cue in memory, so it can be integrated with the later cue. But given human memory limitations, it is impossible for listeners to maintain every bit of the complex acoustic signal in memory indefinitely. Thus, it is critical that we reconcile how listeners are able to integrate long-distance cues with the fundamental limitation of memory. The goal of the present work is to investigate to what degree listeners have access to subcategorical information about prior speech input over time, and what limitations there are (if any) on this process.</p>
<p>These questions are important to ask, because they address the foundational underpinnings of our theoretical understanding of language processing. In particular, a long-held belief in the psycholinguistic literature, often referred to as the <italic>immediacy assumption</italic>, is that listeners categorize incoming input as fast as possible and immediately discard low-level information in favor of categorical representations (<xref ref-type="bibr" rid="B9">Christiansen &amp; Chater, 2016</xref>; <xref ref-type="bibr" rid="B25">Just &amp; Carpenter, 1980</xref>). This assumption is driven by the (correct) observation that there are memory limitations on language processing. However, there is a large empirical and theoretical literature potentially suggesting that listeners can maintain significant amounts of subcategorical information for long periods of time. Indeed, most influential formal models of speech perception assume that some degree of low-level information is maintained by listeners over time. In connectionist models like TRACE and its descendants, this manifests as lingering activation for competitors as a result of network dynamics (<xref ref-type="bibr" rid="B38">Magnuson et al., 2020</xref>; <xref ref-type="bibr" rid="B40">McClelland &amp; Elman, 1986</xref>); in Bayesian models like Shortlist B, all relevant acoustic and sentential information is combined to produce a word categorization, implying that listeners keep track of these cues over time (<xref ref-type="bibr" rid="B45">Norris &amp; McQueen, 2008</xref>). However, the predictions of these models have primarily been tested in isolated word recognition (<xref ref-type="bibr" rid="B18">Ganong, 1980</xref>; <xref ref-type="bibr" rid="B22">Gwilliams et al., 2018</xref>; <xref ref-type="bibr" rid="B42">McMurray et al., 2002</xref>, <xref ref-type="bibr" rid="B43">2009</xref>; <xref ref-type="bibr" rid="B50">Toscano et al., 2010</xref>), and only sometimes are quantitatively compared to formal models.</p>
<p>An emerging area of empirical research investigates these effects at longer timescales. An influential study by Connine and colleagues presented participants with sentences like the following:</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(1)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>When the <bold>?ent</bold> in the <bold>fender</bold> was noticed, we sold the car.</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>They varied the acoustic features of the <bold>?</bold> segment between /t-d/, and a later word in the sentence semantically biased toward a <italic>tent</italic> or <italic>dent</italic> interpretation of the target word (<italic>fender</italic>, as in (1) above, vs. <italic>forest</italic>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). They found that participants&#8217; categorizations of the target word were influenced both by the acoustics of the initial target and the later semantic context. Other studies using similar stimuli and methods ranging from categorization to visual world eye-tracking have yielded similar results at strikingly long distances (up to 35 syllables away from a target word; <xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B3">Brown-Schmidt &amp; Toscano, 2017</xref>; <xref ref-type="bibr" rid="B6">Bushong &amp; Jaeger, 2019</xref>, <xref ref-type="bibr" rid="B7">2025</xref>; <xref ref-type="bibr" rid="B16">Falandays et al., 2020</xref>; <xref ref-type="bibr" rid="B49">Szostak &amp; Pitt, 2013</xref>; <xref ref-type="bibr" rid="B52">Zellou &amp; Dahan, 2019</xref>).</p>
<p>These findings have been interpreted to constitute evidence for subcategorical information maintenance. The key assumption is that if both early and late cues are used in categorization, listeners must have maintained subcategorical information about the early cue.<xref ref-type="fn" rid="n2">2</xref> This assumption is based on a (usually implicit) comparison between two basic models of listener strategies, which in this article we will call <italic>ideal integration</italic> and <italic>categorize-&amp;-discard</italic>. <xref ref-type="fig" rid="F1">Figure 1</xref> shows a basic demonstration of the ideal integration strategy, using example stimuli from the studies presented in this article (inspired by <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). Here, we use the term <italic>ideal</italic> in the sense of ideal observer models, which estimate the statistically optimal solution to perceptual cue integration problems (see, e.g., <xref ref-type="bibr" rid="B15">Ernst &amp; Banks, 2002</xref>).<xref ref-type="fn" rid="n3">3</xref> The listener maintains some information about an initial stimulus varying between /t-d/ &#8211; at minimum, their degree of uncertainty about whether the sound was /t/ or /d/, but potentially more detailed, such as the value of a relevant acoustic cue, like voice-onset time (VOT). When they encounter the later biasing context <italic>fender</italic>, they are able to integrate information from both sources to come to a categorization decision. Notice that if the listener makes an initial categorical commitment to /t/ or /d/, this could not happen. <xref ref-type="fig" rid="F2">Figure 2</xref> demonstrates this categorize-&amp;-discard approach: by the time the listener reaches <italic>fender</italic>, they are already committed to a /t/ response and have no access to a gradient representation with which to integrate the contextual information. This is the basic line of reasoning that has led researchers in this field to argue for subcategorical information maintenance.<xref ref-type="fn" rid="n4">4</xref></p>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> Schematic of the <italic>ideal integration</italic> model, with maintenance of subcategorical information over time.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g1.png"/>
</fig>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> Schematic of the <italic>categorize-&amp;-discard</italic> model, without maintenance of subcategorical information over time.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g2.png"/>
</fig>
<p>Surprisingly, this interpretation of the literature has gone largely unchallenged, even though there are many other possible word recognition strategies that listeners could engage in beyond the ideal integration and categorize-&amp;-discard options presented above. Indeed, it is quite possible to derive the central qualitative finding of these studies &#8211; that listeners&#8217; behavioral responses depend on both the acoustic properties of the target sound (e.g., the /t-d/ stimulus) and subsequent sentential context (<italic>fender</italic> vs. <italic>forest</italic>) &#8211; without requiring maintenance of subcategorical information. Imagine, for example, that listeners initially categorize the /t-d/ sound as /t/ or /d/, discarding all gradient subcategorical information about it. After they encounter the subsequent context, they can choose to switch their response if the context conflicts with their original categorization. As we describe in more detail below, this strategy would yield categorization responses that exhibit dependence on both the acoustic and subsequent contextual cues: exactly the qualitative pattern observed in previous research. In short, there are plausible scenarios under which the available empirical evidence is <italic>qualitatively</italic> compatible with models of spoken word recognition that do not allow subcategorical information maintenance. Thus, it is clear that relying on qualitative outcomes like the presence or absence of acoustic and contextual effects in experiments is not sufficient for distinguishing between different theories of subcategorical information maintenance. Instead, we need to mathematically formalize these theories and fit them quantitatively to behavioral data.</p>
<p>The goal of the present study is to develop formal models of subcategorical information maintenance and test them in behavioral experiments. We formalize and evaluate a range of plausible listener strategies, from all-or-nothing approaches (like the ideal integration and categorize-&amp;-discard models), to more nuanced strategies. Critically, by mathematically formalizing our models and fitting them quantitatively to behavioral data, we make our theoretical assumptions explicit, in contrast to previous work, which evaluates qualitative data patterns that could (in principle) be compatible with different theories.</p>
<p>First, we describe the formalization of each model, then present four behavioral experiments against which we fit our computational models. <xref ref-type="fig" rid="F3">Figure 3</xref> shows the general structure of the modeled task, closely following Connine et al. (<xref ref-type="bibr" rid="B11">1991</xref>). Listeners hear sentences that contain a target word whose onset varies acoustically between /t-d/ (by manipulating voice-onset time, VOT); additionally, a subsequent word in the sentence biases toward a particular interpretation of the target word. The listeners&#8217; task is to categorize the target word.</p>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> General design of stimuli for all experiments in this article.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g3.png"/>
</fig>
</sec>
<sec>
<title>2. Models</title>
<p>We formalize five models of how listeners may maintain subcategorical information maintenance (or not): <italic>ideal integration, ambiguity-dependent, categorize-&amp;-discard, categorize-discard-&amp;-switch</italic>, and <italic>context-only</italic>. 2.2 describes the models, and <xref ref-type="fig" rid="F4">Figures 4</xref> and <xref ref-type="fig" rid="F5">5</xref> show the qualitative predictions of each model. For additional details about model fitting procedures, see the Supplementary Information (SI &#167;1 at the GitHub repository for this study).<xref ref-type="fn" rid="n5">5</xref></p>
<fig id="F4">
<caption>
<p><bold>Figure 4:</bold> Qualitative predictions of each model by VOT (the acoustic cue distinguishing /t/ and /d/) and context. We set the point of maximal ambiguity to the center of the displayed VOT range, and assume that the contextual evidence for either response (<italic>tent</italic> vs. <italic>dent</italic>) is symmetric around a neutral categorization function that would result in a neutral context (not shown). These choices make it easiest to see the influence of VOT and context on the predictions of the models. For the quantitative evaluation of the models, we do not make these assumptions.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g4.png"/>
</fig>
<fig id="F5">
<caption>
<p><bold>Figure 5:</bold> Qualitative predictions of the independent effects of (a) VOT and (b) context for each model. The ideal integration, ambiguity-dependent, and categorize-&amp;-discard models make identical predictions for the effect of VOT on categorizations; similarly, the ideal integration and context-only models make identical predictions for the effect of context on categorizations.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g5.png"/>
</fig>
<sec>
<title>2.1 Modeling preliminaries</title>
<p>Before we discuss each model in more detail, we first address two basic aspects of working with speech categorization data: first, how acoustic cues alone are expected to influence categorization responses; and second, which space categorization data is most usefully analyzed in.</p>
<sec>
<title>2.1.1 Predicting categorization responses from acoustic cues</title>
<p>It is important to address the major factor which impacts speech perception and can potentially change the qualitative and quantitative predictions of the cognitive models at hand: how listeners categorize voicing based on the acoustic cue of VOT alone. There are two issues to address here: (i) the link between listeners&#8217; underlying representations of acoustic evidence and decisions; and (ii) how exactly VOT affects listeners&#8217; perception of voicing.</p>
<p>Following previous work on other questions in speech perception, all the models we present assume that listeners&#8217; categorization responses are proportional to listeners&#8217; subjective posterior probabilities of categories (Luce&#8217;s choice rule; <xref ref-type="bibr" rid="B37">R. D. Luce, 1963</xref>). The Luce choice rule has been found to provide a good fit to human categorization responses (<xref ref-type="bibr" rid="B10">Clayards et al., 2008</xref>; <xref ref-type="bibr" rid="B17">Feldman et al., 2009</xref>; <xref ref-type="bibr" rid="B29">Kleinschmidt &amp; Jaeger, 2015</xref>; <xref ref-type="bibr" rid="B30">Kronrod et al., 2016</xref>; <xref ref-type="bibr" rid="B36">P. A. Luce &amp; Pisoni, 1998</xref>). There have been other proposals for linking functions from underlying representations to decisions (see, e.g., <xref ref-type="bibr" rid="B39">Massaro &amp; Friedman, 1990</xref>). These choices would likely affect the quantitative predictions of our models presented here, but given that Luce&#8217;s choice rule is a standard assumption in the literature, we consider it outside the scope of the present work to consider other alternatives.</p>
<p>Although not always described in these terms, most theories of speech perception agree that the predicted slope of the VOT effect on categorization depends on listeners&#8217; beliefs about both the means and variances of the /t/ and /d/ categories along the VOT continuum. This follows from the decision rules for categorization in common models of speech perception (e.g., <xref ref-type="bibr" rid="B36">P. A. Luce &amp; Pisoni, 1998</xref>; <xref ref-type="bibr" rid="B44">Norris, 1994</xref>; <xref ref-type="bibr" rid="B45">Norris &amp; McQueen, 2008</xref>; <xref ref-type="bibr" rid="B46">Oden &amp; Massaro, 1978</xref>). If two Gaussian categories (/t/ and /d/) have equal variance along VOT, an ideal observer will exhibit linear effects of VOT on the <italic>log-odds</italic> of /t/-responses. However, it is well established that voicing contrasts (including /t/ vs. /d/) exhibit unequal variances along the VOT continuum (<xref ref-type="bibr" rid="B34">Lisker &amp; Abramson, 1967</xref>). A standard ideal-observer model, thus, predicts positive quadratic effects of VOT on the log-odds of /t/-responses. Quadratic VOT effects can subtly change the qualitative and quantitative predictions of some of our models. For simplicity of visualization, we show model predictions here and in the main text without quadratic VOT effects. However, all fitted models assume that listeners&#8217; subjective <italic>p</italic>(<italic>t</italic>&#124;<italic>VOT</italic>) depends on both a linear and quadratic VOT component.</p>
</sec>
<sec>
<title>2.1.2 Log-odds vs. proportion space for visualizing and analyzing categorization data</title>
<p>Throughout this work, we discuss our cognitive models&#8217; predictions in log-odds space rather than proportion space. This is because this is the space where the models are clearly distinguishable. Proportions are bounded, squashing model predictions into &#8220;S&#8221;-shaped curves: a linear effect of VOT in log-odds space surfaces as a non-linear effect in proportion space, and the non-linear effect of squared VOT in log-odds space surfaces as a quite similar non-linear function in proportion space. It is, thus, difficult to distinguish the predictions of our models in proportion space. For that reason, we visualize model predictions below in log-odds space, where the qualitative differences are most obvious.</p>
<p>Notably, comparing model predictions in proportion space also exacerbates the problems involved in making qualitative comparisons between model predictions and experimental data. Connine and colleagues, for example, infer that the smaller effects of context at more extreme VOTs observed in their data in proportion space constitute evidence for the ambiguity model (<xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). However, the ideal integration model also makes the prediction that context effects should be smaller at more extreme VOTs in proportion space. The model predictions only become qualitatively distinct when compared to each other in log-odds space. Furthermore, analyzing binary categorization data using linear methods like linear regression or ANOVA further underestimates effects at proportions close to 0 or 1 (<xref ref-type="bibr" rid="B24">Jaeger, 2008</xref>).</p>
</sec>
</sec>
<sec>
<title>2.2 Model descriptions</title>
<sec>
<title>2.2.1 Ideal integration</title>
<p>The <italic>ideal integration</italic> model holds that listeners maintain subcategorical information about the temporally first cue (here, the acoustic cue VOT) in memory for subsequent integration with a later cue (here, context). We use the term <italic>ideal</italic> in the sense of rational cue integration frameworks (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B15">Ernst &amp; Banks, 2002</xref>). These normative models provide an ideal baseline against which to compare human behavior. Under the ideal integration model, the listener always maintains subcategorical information about VOT, because ideal categorization requires access to at least <italic>p</italic>(<italic>category</italic>&#124;<italic>VOT</italic>) during integration with context. This model has been conceptually proposed in the past, but only qualitatively tested against behavioral data (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>).</p>
<p>If humans have no memory constraints and ideally integrate all cues available to them, their behavior should resemble the predictions of the ideal integration model. That is, /t/-responses should be conditioned on both VOT and context:</p>
<disp-formula id="FD1">
<label>&#160;&#160;&#160;&#160;&#160;&#160;&#160;(E1)</label>
<alternatives>
<mml:math id="Eq001-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ideal</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mtext>/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M1">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{ideal}}(/t{\rm{/ - response}})\; = \;p({\rm{/}}t{\rm{/|}}VOT,\;context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e1.gif"/>
</alternatives>
</disp-formula>
<p>After applying Bayes&#8217; Theorem, this yields:</p>
<disp-formula id="FD2">
<label>(E2)</label>
<alternatives>
<mml:math id="Eq002-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>,</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x2009;</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
</mml:math>
<tex-math id="M2">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}VOT,\;context)\; = \;\frac{{p(VOT{\rm{|}}context,\;{\rm{/}}t{\rm{/}})\;p(context,\;{\rm{/}}t{\rm{/}})}}{{p(VOT, context)}}\; = \;\frac{{p(VOT{\rm{|}}context,\;{\rm{/}}t{\rm{/}})\;p({\rm{/}}t{\rm{/|}}context)}}{{p(VOT{\rm{|}}context)}}
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e2.gif"/>
</alternatives>
</disp-formula>
<p>Under the plausible assumption that VOT and context are conditionally independent (following <xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>):</p>
<disp-formula id="FD3">
<label>&#160;&#160;&#160;&#160;&#160;&#160;&#160;(E3)</label>
<alternatives>
<mml:math id="Eq003-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ideal</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>&#x221D;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>&#x007C;/</mml:mo><mml:mi>t</mml:mi><mml:mtext>/</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M3">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{ideal}}({\rm{/}}t{\rm{/ - response}})\; \propto \;p(VOT{\rm{|/}}t{\rm{/}})\;p({\rm{/}}t{\rm{/|}}context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e3.gif"/>
</alternatives>
</disp-formula>
<p>Translated to log-odds space, this results in a simple addition of the evidence from both cues (see <xref ref-type="fig" rid="F4">Figure 4a</xref>).</p>
</sec>
<sec>
<title>2.2.2 Ambiguity-dependent integration</title>
<p>In contrast to the ideal integration model, under the <italic>ambiguity-dependent</italic> model, listeners store information about VOT to the extent to which it is perceptually ambiguous: the more ambiguous the VOT is (i.e., closer to a categorization probability of 50%), the more likely listeners should be to maintain information about VOT for subsequent integration with context. The ambiguity-dependent hypothesis &#8211; first conceptually proposed by Connine and colleagues (<xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>), and a generally accepted theory (<xref ref-type="bibr" rid="B14">Dahan, 2010</xref>; <xref ref-type="bibr" rid="B49">Szostak &amp; Pitt, 2013</xref>) &#8211; thus sees maintenance of subcategorical information as a special case: if the signal is relatively clear, then listeners immediately categorize and discard low-level information. Only when the perceptual input is ambiguous is information about it maintained in memory, so as to facilitate robust integration with subsequent cues. This can be seen as serving memory economy (for related proposals, see also <xref ref-type="bibr" rid="B14">Dahan, 2010</xref>).</p>
<p>Previous tests of this hypothesis have been limited to qualitative comparisons (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). Those studies have ruled out a categorical ambiguity-dependent model, in which subcategorical information is maintained only for the absolutely most ambiguous input (for a critical review and qualitative comparison to the ideal integration model, see <xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>). Here we derive a quantitative model. There are several ways of instantiating the idea that information about VOT is only maintained if it is perceptually ambiguous. Here, we evaluate a gradient version of this hypothesis: with increasingly ambiguous VOT evidence, listeners are assumed to be more likely to maintain gradient representations of VOT to integrate with later context, instead categorizing on the basis of VOT alone. We quantify the degree of perceptual ambiguity as:</p>
<disp-formula id="FD4">
<label>&#160;&#160;&#160;&#160;&#160;&#160;&#160;(E4)</label>
<alternatives>
<mml:math id="Eq004-mml">
<mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x2009;</mml:mo><mml:mtext>=</mml:mtext><mml:mo>&#x2009;</mml:mo><mml:mn>2</mml:mn><mml:mrow><mml:mo>|</mml:mo> <mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mn>0.5</mml:mn></mml:mrow> <mml:mo>|</mml:mo></mml:mrow></mml:mrow>
</mml:math>
<tex-math id="M4">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
\alpha \;{\rm{ = }}\;2\left| {p({\rm{/}}t{\rm{/|}}VOT)\;{\rm{--}}\;0.5} \right|
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e4.gif"/>
</alternatives>
</disp-formula>
<p>Here, <italic>&#945;</italic> is determined<xref ref-type="fn" rid="n6">6</xref> by the perceptual ambiguity of the stimulus: <italic>&#945;</italic> is minimized when <inline-formula>
<alternatives>
<mml:math id="Eq005-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M5">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}VOT)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e9.gif"/>
</alternatives>
</inline-formula> is .5, (the maximally ambiguous stimulus); and <italic>&#945;</italic> is maximized when <inline-formula>
<alternatives>
<mml:math id="Eq006-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M6">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}VOT)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e9.gif"/>
</alternatives>
</inline-formula> is 0 or 1, (the least ambiguous stimuli). We can then use <italic>&#945;</italic> as a weight in a mixture model that describes the relative probability of using VOT only or integrating VOT and context:</p>
<disp-formula id="FD5">
<label>(E5)</label>
<alternatives>
<mml:math id="Eq007-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ambiguity</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mtext>/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>&#x221D;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>+</mml:mtext><mml:mo>&#x2009;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M7">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{ambiguity}}(/t{\rm{/ - response}})\; \propto \;\alpha p({\rm{/}}t{\rm{/|}}VOT)\;{\rm{ + }}\;(1\;{\rm{--}}\;\alpha)\;p({\rm{/}}t{\rm{/|}}VOT,\;context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e5.gif"/>
</alternatives>
</disp-formula>
<p>Intuitively, we can think of this as listeners <italic>not</italic> maintaining gradient representations of VOT over time on <italic>&#945;</italic> proportion of trials. On the remaining 1&#8211;<italic>&#945;</italic> trials, listeners do maintain a gradient representation &#8211; notice that this portion of the equation is identical to the ideal integration model. This model predicts effects of both VOT and context on behavioral categorization responses, with context effects particularly pronounced in the center of the acoustic-perceptual continuum (see <xref ref-type="fig" rid="F4">Figure 4b</xref>).</p>
</sec>
<sec>
<title>2.2.3 Categorize-&amp;-discard</title>
<p>The next three models we consider do <italic>not</italic> maintain information about VOT in memory over time, but rather immediately categorize, based on the first cue, and then discard all subcategorical information about that cue. These models maximize memory economy at the cost of categorization accuracy. Under the most simple <italic>categorize-&amp;-discard</italic> model, listeners categorize the target word based on VOT, discard all subcategorical information about VOT, and then never revisit the categorization decision. As this model never considers later sources of information, its categorization accuracy will necessarily be suboptimal. We formalize this model as simply making decisions on the basis of VOT alone:</p>
<disp-formula id="FD6">
<label>&#160;&#160;&#160;&#160;&#160;&#160;&#160;(E6)</label>
<alternatives>
<mml:math id="Eq008-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">cat</mml:mtext><mml:mo>&#x005F;</mml:mo><mml:mtext mathvariant="italic">discard</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>=</mml:mtext><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M8">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{cat\_discard}}({\rm{/}}t{\rm{/ - response}})\;{\rm{ = }}\;p({\rm{/}}t{\rm{/|}}VOT)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e6.gif"/>
</alternatives>
</disp-formula>
<p>The hallmark predictions of this model are an effect of VOT, but a null effect of context, on behavioral responses (see <xref ref-type="fig" rid="F4">Figure 4c</xref>).</p>
</sec>
<sec>
<title>2.2.4 Categorize-discard-&amp;-switch</title>
<p>The second model of this class we consider also discards all subcategorical information about VOT immediately after having used it to categorize. However, under the <italic>categorize-discard-&amp;-switch</italic> model, listeners have a mechanism to take context into account: if context conflicts with the initial categorization decision, the listener will change their categorization response in proportion to the strength of the evidence from context. To give a specific example, suppose the listener initially categorizes a segment as /d/, but later evidence from context is more consistent with a /t/ interpretation; the listener will switch their categorization decision to /t/ with probability <inline-formula>
<alternatives>
<mml:math id="Eq009-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M9">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e10.gif"/>
</alternatives>
</inline-formula>.</p>
<disp-formula id="FD7">
<label>(E7)</label>
<alternatives>
<mml:math id="Eq010-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">cat</mml:mtext><mml:mo>&#x005F;</mml:mo><mml:mtext mathvariant="italic">switch</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mtext>/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>&#x221D;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>+</mml:mtext><mml:mo>&#x2009;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M10">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{cat\_switch}}(/t{\rm{/ - response}})\; \propto \;p({\rm{/}}t{\rm{/|}}VOT)\;{\rm{ + }}\;(1\;{\rm{--}}\;p({\rm{/}}t{\rm{/|}}VOT))\;p({\rm{/}}t{\rm{/|}}context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e7.gif"/>
</alternatives>
</disp-formula>
<p>Like the ambiguity-dependent model, we can think of the categorize-discard-&amp;-switch model as describing behavior across trials. Consider trials in the experiment containing /t/-biasing subsequent context. On some proportion of those trials <inline-formula>
<alternatives>
<mml:math id="Eq011-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M11">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}VOT)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e9.gif"/>
</alternatives>
</inline-formula>, listeners would have categorized a stimulus as /t/, based on VOT alone. On the remaining trials, where listeners would have made a /d/ categorization based on VOT alone (i.e., <inline-formula>
<alternatives>
<mml:math id="Eq012-mml">
<mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/</mml:mtext><mml:mi>t</mml:mi><mml:mtext>/&#x007C;</mml:mtext><mml:mtext mathvariant="italic">VOT</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M12">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
1\;{\rm{-}}\;p({\rm{/}}t{\rm{/|}}VOT)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e11.gif"/>
</alternatives>
</inline-formula> trials), they switch their response to /t/, proportional to <inline-formula>
<alternatives>
<mml:math id="Eq013-mml">
<mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M13">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
p(/t/{\rm{|}}context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e10.gif"/>
</alternatives>
</inline-formula>. The reverse occurs on trials with /d/-biasing subsequent context.</p>
<p>The categorize-discard-&amp;-switch model, like the ideal integration and ambiguity-dependent model, predicts effects of both VOT and context on categorization responses; however, context should affect perception more at perceptual endpoints (i.e., the reverse of the ambiguity-dependent model&#8217;s predictions; compare the orange and purple lines in <xref ref-type="fig" rid="F5">Figure 5b</xref>). This prediction is of particular relevance in light of recent studies that find evidence of numerically larger context effects at acoustic-perceptual endpoints (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>).</p>
<p>One more point of interest here is the difference in how the ambiguity-dependent and categorize-discard-&amp;-switch models predict differences in the context effect across the VOT continuum. Under the ambiguity-dependent model, whether context enters the listener&#8217;s categorization process at all is dependent only on perceptual ambiguity; whereas in the categorize-discard-&amp;-switch model, context always affects the listener&#8217;s categorization process, but whether listeners act on contextual evidence is dependent (indirectly) on the perceptual evidence. In general, the models we present here vary not only in <italic>how</italic>, but also <italic>when</italic> acoustic and contextual information enter listeners&#8217; decision-making processes. Teasing apart these distinctions further will likely require paradigms that allow for tracking the timecourse of listener interpretation.</p>
</sec>
<sec>
<title>2.2.5 Context-only</title>
<p>Finally, we entertain a model that uses only context in its categorization responses:</p>
<disp-formula id="FD8">
<label>&#160;&#160;&#160;&#160;&#160;&#160;&#160;(E8)</label>
<alternatives>
<mml:math id="Eq014-mml">
<mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo>&#x005F;</mml:mo><mml:mtext mathvariant="italic">only</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mtext>/t/-response</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>=</mml:mtext><mml:mo>&#x2009;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>/</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mtext mathvariant="italic">context</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<tex-math id="M14">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
{p_{context\_only}}({\rm{/t/ - response}})\;{\rm{ = }}\;p(/t/{\rm{|}}context)
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e8.gif"/>
</alternatives>
</disp-formula>
<p>This model captures two potential mechanisms listeners may be engaged in during spoken word recognition. Firstly, this pattern of behavioral responses would be predicted if listeners ignore VOT entirely in the task (an unlikely but possible participant strategy in the present experiments). Secondly, responding based only on context also captures a more extreme version of the categorization-switching model we described above. Our categorize-discard-&amp;-switch model assumes that listeners only make switches when the later context conflicts with their original categorization. One possible alternative switching model is that listeners may switch their categorization choices regardless of acoustic-contextual match, proportional to the evidence from later context, regardless of their earlier categorization; this would predict only an effect of context, with no main effect of VOT. While such a model is highly unlikely to provide an adequate fit to the data, given the strength of VOT effects observed in these kinds of experiments, it serves as an informative baseline against which to compare more complex models.</p>
</sec>
</sec>
<sec>
<title>2.3 Distinguishing the models</title>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows each model&#8217;s predictions for the effects of VOT and context on behavior, assuming the same underlying parameters. There is sizeable overlap in the models&#8217; qualitative <italic>and</italic> quantitative predictions for each factor. Thus, comparing our models to the empirical data qualitatively is unlikely to be fruitful. However, each model makes unique quantitative predictions about the <italic>joint distribution of VOT and context effects</italic>. We thus evaluate the models quantitatively against behavioral data from four perceptual categorization experiments.</p>
</sec>
</sec>
<sec>
<title>3. Experimental methods</title>
<p>We fit our models to four previously conducted behavioral experiments in our lab that used the same general paradigm (see <xref ref-type="fig" rid="F3">Figure 3</xref>). Experiment 1 was previously reported in Bicknell et al. (<xref ref-type="bibr" rid="B2">2025</xref>) as Experiment 2; Experiment 3 was reported in Bushong and Jaeger (<xref ref-type="bibr" rid="B6">2019</xref>) as the &#8220;high-conflict&#8221; group. Experiments 2 and 4 have not been previously reported.</p>
<p>Our experimental materials, full datasets, and analysis scripts can be found in our GitHub repository at <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.15237589">https://doi.org/10.5281/zenodo.15237589</ext-link>.</p>
<sec>
<title>3.1 Participants</title>
<p>Participants were recruited from Amazon Mechanical Turk. Each experiment took approximately 30 minutes to complete and subjects were compensated $3.00 for their participation in the experiment.<xref ref-type="fn" rid="n7">7</xref> 48 participants were recruited for Experiments 1&#8212;2, and 60 were recruited for Experiments 3&#8212;4. All experiments were approved by the University of Rochester Research Subjects Review Board (RSRB).</p>
</sec>
<sec>
<title>3.2 Materials</title>
<p>Our experiments are inspired by the paradigm first introduced by Connine and colleagues (<xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). <xref ref-type="table" rid="T1">Table 1</xref> shows an example sentence item. Following Connine and colleagues, we manipulated context (tent-biasing vs. dent-biasing), distance (near, 3 syllables vs. far, 6&#8211;9 syllables), and voice-onset time (VOT, the acoustic cue distinguishing /t/ from /d/; 6 continuum steps in each experiment). For the purposes of the present work, we do not evaluate any differences between context distance conditions, though this is an important avenue for future research to explore.</p>
<table-wrap id="T1">
<caption>
<p><bold>Table 1:</bold> Example sentence item in each biasing context and distance condition.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Subsequent Context</bold></td>
<td align="left" valign="top"><bold>Distance</bold></td>
<td align="left" valign="top"><bold>Sentence</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Tent-biasing</td>
<td align="left" valign="top">Near (3 syllables)</td>
<td align="left" valign="top">When the [t/d]ent in the <bold>forest</bold> was well camouflaged, we began our hike.</td>
</tr>
<tr>
<td align="left" valign="top">Dent-biasing</td>
<td align="left" valign="top">Near (3 syllables)</td>
<td align="left" valign="top">When the [t/d]ent in the <bold>fender</bold> was<break/>well camouflaged, we sold the car.</td>
</tr>
<tr>
<td align="left" valign="top">Tent-biasing</td>
<td align="left" valign="top">Far (6&#8211;9 syllables)</td>
<td align="left" valign="top">When the [t/d]ent was noticed in the<break/><bold>forest</bold>, we stopped to rest.</td>
</tr>
<tr>
<td align="left" valign="top">Dent-biasing</td>
<td align="left" valign="top">Far (6&#8211;9 syllables)</td>
<td align="left" valign="top">When the [t/d]ent was noticed in the<break/><bold>fender</bold>, we sold the car.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Each participant heard seven sentence frames in each of the context, distance, and VOT condition combinations, resulting in a total of 168 sentences in each experiment.<xref ref-type="fn" rid="n8">8</xref></p>
</sec>
<sec>
<title>3.3 Procedure</title>
<p>Participants were instructed to listen to the sentence and report whether they heard the word <italic>tent</italic> or <italic>dent</italic>. Between experiments, we manipulated whether participants could make a response only after they had heard the entire sentence (&#8220;forced-response&#8221;), or were permitted to respond anytime during the sentence stimulus (&#8220;free-response&#8221;).</p>
<p>We chose this comparatively simple paradigm because it allows a clear linking function between the input (acoustic cues and subsequent context) and listeners&#8217; categorization decisions (for more discussion, see 2.1.1). By contrast, the link between subjective probabilities and more complex measures, such as fixation latency in visual-world eye-tracking experiments (<xref ref-type="bibr" rid="B3">Brown-Schmidt &amp; Toscano, 2017</xref>; <xref ref-type="bibr" rid="B43">McMurray et al., 2009</xref>), or MEG responses (<xref ref-type="bibr" rid="B22">Gwilliams et al., 2018</xref>), is less well understood. This rich temporal information has the potential to give us additional insight into the mechanisms listeners use when integrating incoming information, but it is not necessary to answer the basic question we seek to address in the current work.</p>
</sec>
<sec>
<title>3.4 Acoustic manipulation</title>
<p>We created a continuum between /t/-/d/ by following the procedure of previous studies (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). From our recordings of the full sentence stimuli, we took one recording of <italic>dent</italic> with a relatively short VOT (10 ms) and one recording of <italic>tent</italic> with a relatively long VOT (85 ms). Then, we replaced the /d/ portion of the <italic>dent</italic> recording by successively replacing more and more portions of the /t/ in <italic>tent</italic> (i.e., 15 ms VOT was created by taking the closure and burst of the /t/ recording plus 15 ms of VOT and pasting this onto the <italic>ent</italic> portion of the <italic>dent</italic> recording). The continuum created by this process then replaced the original target words produced in the full sentence recordings.</p>
<p>Experiments 1&#8211;2 use the same stimulus set used in Bicknell et al. (<xref ref-type="bibr" rid="B2">2025</xref>), and we used the same VOT steps as reported in that study. For Experiments 3&#8211;4, we developed a new stimulus set with an expanded set of sentence frames. We used the same VOT manipulation process on these stimuli; after stimulus creation, we conducted a norming study in order to choose the VOT points we would present to participants. The full details of the norming study are presented in SI &#167;2 at the GitHub repository for this study.</p>
</sec>
<sec>
<title>3.5 Data exclusions</title>
<p>Following previous work using this paradigm (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B6">Bushong &amp; Jaeger, 2019</xref>), participants were excluded from data analysis if they showed no effect of VOT, as defined by significance of a VOT coefficient in a simple logistic regression fitted to each subject. This resulted in the exclusion of 8, 11, 9, and 12 participants, respectively. For the free-response experiments, we removed trials where participants responded before hearing the biasing subsequent context (defined as 200 ms after context word offset, to account for motor planning). See <xref ref-type="table" rid="T2">Table 2</xref> for the number of observations remaining for each experiment after exclusions.</p>
<table-wrap id="T2">
<caption>
<p><bold>Table 2:</bold> Overview of each experiment after data exclusions. For the free-response experiments, we removed trials where participants responded before subsequent context, resulting in many fewer trials than their forced-response counterparts.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Experiment</bold></td>
<td align="left" valign="top"><bold>Response Type</bold></td>
<td align="left" valign="top"><bold># Participants</bold></td>
<td align="left" valign="top"><bold># Observations</bold></td>
<td align="left" valign="top"><bold>VOT Steps</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Forced-Response</td>
<td align="left" valign="top">40</td>
<td align="left" valign="top">6,720</td>
<td align="left" valign="top">10, 40, 50, 60, 70, 85</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 2</td>
<td align="left" valign="top">Free-Response</td>
<td align="left" valign="top">37</td>
<td align="left" valign="top">3,470</td>
<td align="left" valign="top">10, 40, 50, 60, 70, 85</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 3</td>
<td align="left" valign="top">Forced-Response</td>
<td align="left" valign="top">51</td>
<td align="left" valign="top">8,568</td>
<td align="left" valign="top">10, 30, 35, 40, 50, 85</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 4</td>
<td align="left" valign="top">Free-Response</td>
<td align="left" valign="top">48</td>
<td align="left" valign="top">4,723</td>
<td align="left" valign="top">10, 30, 35, 40, 50, 85</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>3.6 Model fitting and comparison</title>
<p>We employed Bayesian non-linear mixed-effects regression to test each of our formal cognitive models. The advantages of this approach are (i) we can directly fit the equations derived above for each of the models, rather than relying on testing qualitative predictions (like patterns of significant results), and (ii) the Bayesian approach allows us to derive measures of evidentiary support based on posterior predictive accuracy. To implement these models, we used the nonlinear formula feature of the brms package in R (<xref ref-type="bibr" rid="B5">B&#252;rkner et al., 2017</xref>; <xref ref-type="bibr" rid="B48">R Core Team, 2016</xref>).</p>
<p>We follow common practice and use weakly regularizing priors to facilitate model convergence. For fixed effect parameters, we use Student priors centered around zero, with a scale of 2.5 units (following <xref ref-type="bibr" rid="B19">Gelman, 2008</xref>) and 3 degrees of freedom. For random effect standard deviations, we use a Cauchy prior, with location 0 and scale 2, and for random effect correlations, we use an uninformative LKJ-Correlation prior, with its only parameter set to 1 (<xref ref-type="bibr" rid="B32">Lewandowski et al., 2009</xref>), describing a uniform prior over correlation matrices. Each model was fit using four chains, with 1,000 post-warmup samples per chain (after thinning to every 4th sample to reduce auto-correlations), for a total of 4,000 posterior samples for each analysis. Each chain used 2,000 warmup samples to calibrate Stan&#8217;s No U-Turn Sampler. All analyses reported here converged (e.g., all <inline-formula>
<alternatives>
<mml:math id="Eq015-mml">
<mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2264;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover><mml:mi>s</mml:mi><mml:mo>&#x2009;</mml:mo><mml:mo>&#x226A;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mn>1.01</mml:mn></mml:mrow>
</mml:math>
<tex-math id="M15">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
1\; \le \;\hat Rs\; \ll \;1.01
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e12.gif"/>
</alternatives>
</inline-formula>).</p>
<p>To compare models against each other, we used the Watanabe-Aikake Information Criterion (WAIC, also known as Widely Applicable Information Criterion; <xref ref-type="bibr" rid="B51">Watanabe &amp; Opper, 2010</xref>). The WAIC is a measure for the comparison of non-nested models. It is an approximation of Bayesian leave-one-out (LOO) cross-validation, which provides a measure of a model&#8217;s predictive accuracy &#8211; specifically, its estimated log predictive density (<italic>elpd</italic>; <xref ref-type="bibr" rid="B20">Gelman et al., 2014</xref>; <xref ref-type="bibr" rid="B51">Watanabe &amp; Opper, 2010</xref>). LOO is very computationally intensive, requiring re-fitting of the same model many times. Considering the complexity of our models, refitting each model to thousands of observations for every experiment is computationally infeasible.</p>
<p>WAIC saves on this expensive computation by starting with a biased estimate of a model&#8217;s <italic>elpd</italic> (based on its within-sample predictive accuracy), and correcting for its effective number of parameters. This is particularly important in our case, because several of our models have the same number of fitted parameters, but have a higher effective number of parameters (compare the ideal integration and ambiguity-dependent models). We chose the WAIC, as opposed to other information criteria, because it averages over the posterior density of the model, rather than relying on point estimates. This makes the WAIC useful in evaluating mixed-effects models like ours, which contain many parameters that may result in singular estimates (<xref ref-type="bibr" rid="B20">Gelman et al., 2014</xref>). Continuing forward, we will refer to the WAIC-estimated <italic>elpd</italic> as <italic>elpd<sub>waic</sub></italic>.</p>
<p>There is no general rule of thumb for what differences in <italic>elpd<sub>waic</sub></italic> between models constitute evidence for a difference. One proposal by Vehtari<xref ref-type="fn" rid="n9">9</xref> is 5 times the standard error (SE) of the difference &#8211; 2.5 SEs, to cover the 95% interval on the difference, and multiplication by 2, since this is the upper limit on the error of the 99% interval estimated by Bengio and Grandvalet (<xref ref-type="bibr" rid="B1">2004</xref>). For our purposes here, we will classify 2.5 SE &lt;<italic>elpd<sub>waic</sub></italic> diff &lt; 5 SE as weak evidence, and <italic>elpd<sub>waic</sub></italic>diff&gt;5 SE as strong evidence.</p>
</sec>
<sec>
<title>3.7 Assessing individual differences</title>
<p>Recent studies of cue integration in spoken word recognition have increasingly noted that there is sizable individual variability in cue use and weighting (<xref ref-type="bibr" rid="B13">Crinnion et al., 2024</xref>), including in some of our recent work using this paradigm (<xref ref-type="bibr" rid="B7">Bushong &amp; Jaeger, 2025</xref>). Thus, it is possible that the best-performing models fitted to an entire experiment might not accurately characterize any particular individual subject. The inclusion of random effects over subjects mitigates this issue slightly, but is inadequate for assessing whether different listeners use wholly different strategies.</p>
<p>To characterize possible individual differences in listener strategies, we fit each of our five models to each individual participant across the four experiments. After this process, we excluded from further analysis any subject for whom at least one model did not converge (which we define as at least one <inline-formula>
<alternatives>
<mml:math id="Eq016-mml">
<mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover><mml:mo>&#x2009;</mml:mo><mml:mo>&#x2265;</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mn>1.01</mml:mn></mml:mrow>
</mml:math>
<tex-math id="M16">
\documentclass[10pt]{article}
\usepackage{wasysym}
\usepackage[substack]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage[mathscr]{eucal}
\usepackage{mathrsfs}
\usepackage{pmc}
\usepackage[Euler]{upgreek}
\pagestyle{empty}
\oddsidemargin -1.0in
\begin{document}
\[
\hat R\; \ge \;1.01
\]
\end{document}
</tex-math>
<graphic xlink:href="glossapx-4-1-40229-e13.gif"/>
</alternatives>
</inline-formula> or &#8804;0.99, a slightly looser criterion than our standard for the aggregate models). We then conducted the <italic>elpd<sub>waic</sub></italic> model comparisons within each individual participant. We calculated which model was the best fit for each subject and the degree of evidence for that model over the next-best-fitting model (defined, as above, as a difference in <italic>elpd<sub>waic</sub></italic> of &gt;2.5 SE for weak evidence &gt;5 SE for strong evidence).</p>
</sec>
</sec>
<sec>
<title>4. Results</title>
<p>At the whole-experiment level, the model comparisons yielded strikingly similar results across all experiments. Models with subcategorical information maintenance always outperformed models without subcategorical information maintenance; in fact, there was strong evidence against the categorize-&amp;-discard, categorize-discard-&amp;-switch, and context-only models, compared to the ideal integration and ambiguity-dependent models, in twenty-three out of twenty-four comparisons across the experiments. The ideal integration model was the best-fitting across the board, strongly outperforming the ambiguity-dependent model in Experiments 1&#8211;2, and weakly outperforming it in Experiments 3&#8211;4. For full pairwise model comparisons for each experiment, see <xref ref-type="table" rid="T3">Table 3</xref>.</p>
<table-wrap id="T3">
<caption>
<p><bold>Table 3:</bold> Pairwise comparison of model fits (<italic>elpd<sub>waic</sub></italic>) for Experiments 1&#8211;4. Each cell shows the fit difference and the standard error of the difference in parentheses. Negative values indicate that the model listed in the row is a better fit than the model in the column (i.e., the top left cell shows the ideal integration model is a better fit than the ambiguity-dependent model for Experiment 1). Italicized cells indicate weak evidence for a difference (<italic>elpd<sub>waic</sub></italic> difference &gt; 2.5 SEs), with bolded cells indicating strong evidence (difference &gt; 5 SEs).</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Experiment 1</bold></td>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">cat.-discard</td>
<td align="left" valign="top">cat.-discard-switch</td>
<td align="left" valign="top">context-only</td>
</tr>
<tr>
<td align="left" valign="top">ideal</td>
<td align="left" valign="top"><bold>&#8211;25.6 (4.6)</bold></td>
<td align="left" valign="top"><bold>&#8211;55.6 (9.9)</bold></td>
<td align="left" valign="top"><bold>&#8211;1009.7 (42)</bold></td>
<td align="left" valign="top"><bold>&#8211;2415.4 (52.4)</bold></td>
</tr>
<tr>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><italic>&#8211;30 (7.9)</italic></td>
<td align="left" valign="top"><bold>&#8211;984.2 (43.9)</bold></td>
<td align="left" valign="top"><bold>&#8211;2389.8 (53)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;954.1 (45.5)</bold></td>
<td align="left" valign="top"><bold>&#8211;2359.8 (53.4)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard-switch</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;1405.6 (33)</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Experiment 2</bold></td>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">cat.-discard</td>
<td align="left" valign="top">cat.-discard-switch</td>
<td align="left" valign="top">context-only</td>
</tr>
<tr>
<td align="left" valign="top">ideal</td>
<td align="left" valign="top"><bold>&#8211;30.7 (5.5)</bold></td>
<td align="left" valign="top"><bold>&#8211;187.3 (17.8)</bold></td>
<td align="left" valign="top"><bold>&#8211;294.8 (25.3)</bold></td>
<td align="left" valign="top"><bold>&#8211;861 (33.9)</bold></td>
</tr>
<tr>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;156.6 (16.7)</bold></td>
<td align="left" valign="top"><bold>&#8211;264.1 (27.2)</bold></td>
<td align="left" valign="top"><bold>&#8211;830.3 (34.3)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;107.5 (30.9)</bold></td>
<td align="left" valign="top"><bold>&#8211;673.7 (38.9)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard-switch</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;566.2 (22.9)</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Experiment 3</bold></td>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">cat.-discard</td>
<td align="left" valign="top">cat.-discard-switch</td>
<td align="left" valign="top">context-only</td>
</tr>
<tr>
<td align="left" valign="top">ideal</td>
<td align="left" valign="top"><italic>&#8211;26.3 (6.6)</italic></td>
<td align="left" valign="top"><bold>&#8211;180.3 (17.6)</bold></td>
<td align="left" valign="top"><bold>&#8211;1481.4 (42.7)</bold></td>
<td align="left" valign="top"><bold>&#8211;2662.2 (57.1)</bold></td>
</tr>
<tr>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;154 (16.4)</bold></td>
<td align="left" valign="top"><bold>&#8211;1455.2 (44.9)</bold></td>
<td align="left" valign="top"><bold>&#8211;2635 (57.6)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;1301.1 (48)</bold></td>
<td align="left" valign="top"><bold>&#8211;2481.9 (59.5)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard-switch</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;1180.8 (44.3)</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Experiment 4</bold></td>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">cat.-discard</td>
<td align="left" valign="top">cat.-discard-switch</td>
<td align="left" valign="top">context-only</td>
</tr>
<tr>
<td align="left" valign="top">ideal</td>
<td align="left" valign="top"><italic>&#8211;17.1 (5.8)</italic></td>
<td align="left" valign="top"><bold>&#8211;130.9 (15.6)</bold></td>
<td align="left" valign="top"><bold>&#8211;579.5 (33.2)</bold></td>
<td align="left" valign="top"><bold>&#8211;1187.5 (42.8)</bold></td>
</tr>
<tr>
<td align="left" valign="top">ambiguity</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;113.8 (14.9)</bold></td>
<td align="left" valign="top"><bold>&#8211;562.4 (35.1)</bold></td>
<td align="left" valign="top"><bold>&#8211;1170.4 (42.7)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;448.6 (38.5)</bold></td>
<td align="left" valign="top"><bold>&#8211;1056.7 (45.4)</bold></td>
</tr>
<tr>
<td align="left" valign="top">cat-discard-switch</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top">&#160;</td>
<td align="left" valign="top"><bold>&#8211;608.1 (32.4)</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To illustrate the fit of the different models to listeners&#8217; responses, we visualize the predictions of all models for Experiment 2 in <xref ref-type="fig" rid="F6">Figure 6</xref>.<xref ref-type="fn" rid="n10">10</xref> It is clear from these fits why the <italic>a priori</italic> plausible categorize-discard-&amp;-switch model (and its more extreme counterpart, the context-only model) performed so badly: the model predicts quite a shallow effect of VOT, which does not fit well to the relatively steep average slope we observe in behavior. By contrast, the ideal integration and ambiguity-dependent models fit the VOT effect quite well, while also explaining the presence of the context effect.</p>
<fig id="F6">
<caption>
<p><bold>Figure 6:</bold> Predictions of the five models fit to Experiment 2 in proportion space (left panel), log-odds space (center panel), and context effect predictions (right panel). Point ranges in the left panel show means and bootstrapped 95% confidence intervals over empirical by-subject means. Dashed lines and shaded regions are mean and 95% highest-density continuous interval (HDCI) of model predictions, drawn from 1,000 random posterior samples.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g6.png"/>
</fig>
<sec>
<title>4.1 Individual results</title>
<p>Of 176 participants, at least one model failed to converge for 25, leaving us with 151 participants who had analyzable results. The results of the model comparisons are summarized in <xref ref-type="fig" rid="F7">Figure 7</xref>. Unlike the models fit to whole experiments, the results for individual participants were less clear. For every participant, the best-fitting model was not statistically distinguishable from the next-best-fitting model (i.e., <italic>elpd<sub>waic</sub></italic> difference &lt;2.5 SE). Numerically, for most participants, the best-fitting model was the ideal integration model (62, 41% of participants), followed by categorize-&amp;-discard (61, 40.4%), ambiguity-dependent (14, 9.3%), categorize-discard-&amp;-switch (13, 8.6%), and context-only (1, .6%).</p>
<fig id="F7">
<caption>
<p><bold>Figure 7:</bold> Summary of models fit to individual participants. Each panel represents a model, and each position on the y-axis indicates the model it is compared against. Each point represents an individual subject. The position on the x-axis is the degree of evidence for the model represented by the panel. Shaded regions indicate degree of evidence for or against the model (gray: inconclusive evidence, light green/light red: weak evidence for/against, green/red: strong evidence for/against). Note that there was no subject for whom the best-fitting model performed significantly better than the <italic>next-best-fitting</italic> model. So while, for example, there are many instances of the ideal integration model being a significantly better fit than either the ambiguity-dependent or categorize-&amp;-discard model (see top-left panel), it was never the case that the model was a significantly better fit than <italic>both</italic> of those models within an individual participant.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-40229-g7.png"/>
</fig>
</sec>
</sec>
<sec>
<title>5. General discussion</title>
<p>There is a substantial body of work that seeks to answer the question of whether listeners are able to maintain subcategorical information about previous input (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B3">Brown-Schmidt &amp; Toscano, 2017</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>; <xref ref-type="bibr" rid="B16">Falandays et al., 2020</xref>; <xref ref-type="bibr" rid="B18">Ganong, 1980</xref>; <xref ref-type="bibr" rid="B43">McMurray et al., 2009</xref>; <xref ref-type="bibr" rid="B49">Szostak &amp; Pitt, 2013</xref>; <xref ref-type="bibr" rid="B52">Zellou &amp; Dahan, 2019</xref>, inter alia). The inferences made by these studies have rested on the assumption that observing effects of both initial acoustic input and later contextual information on behavioral responses constitutes evidence that listeners have maintained gradient subcategorical information about prior input. While some studies have proposed conceptual cognitive models that can be compared to behavior (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>), there has been no concerted effort to formalize and quantitatively test these alternatives.</p>
<p>Here, we formalized five cognitive models that allow us to distinguish different kinds of information maintenance using results from perceptual categorization studies. Two of these models, ideal integration and ambiguity-dependent, were based on prior conceptual proposals in the literature (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). We introduced three additional models that assume listeners do not maintain any uncertainty about prior input after initial word recognition: the categorize-&amp;-discard models and context-only model. The categorize-discard-&amp;-switch is a novel contribution to this literature &#8211; to our knowledge, such a cognitive process has not been proposed before to explain subsequent context effects. At first blush, this new model seemed to provide an alternative explanation for behavioral patterns that reflect both early and later cues: if listeners simply switch their categorizations when later information conflicts with initial categorizations, one would expect this pattern.</p>
<p>The quantitative comparison of the competing models yielded strikingly consistent results across experiments: the ideal integration models always outperformed the four non-ideal models. The ambiguity-dependent model was also a strong contender, but it always patterned after the ideal integration models, and in two of our four experiments, the evidence against it in favor of the ideal integration model was strong. The three models that assume listeners discard subcategorical information were systematically worse, with our novel proposal, the categorize-discard-&amp;-switch, patterning consistently second-worst. On the whole, these results very strongly suggest that listeners are capable of maintaining subcategorical information about input over long perceptual timescales (3&#8211;9 syllables).</p>
<p>Since there may be variability in strategies between participants, we also assessed model fits within individuals. These results were less conclusive, because within each participant, the best-fitting models were statistically indistinguishable from the next-best fit, making it difficult to draw firm conclusions. The qualitative pattern of results, however, showed some divergence from the models fit to whole datasets: while the ideal integration model was the best-fitting model for a plurality of participants, the categorize-discard-&amp;-switch model was a close second, and the ambiguity-dependent model was the best fit for only a small fraction of the total participants (as in the full-experiment results, the categorize-discard-&amp;-switch and context-only models were the worst-performing). We discuss these results further in 5.2.</p>
<p>The failure of the categorize-discard-&amp;-switch model is illustrative of the importance of formalizing and quantitatively testing theories. On its face, it appears to be a plausible competitor to models that assume maintenance of subcategorical information. It also predicts effects of both acoustic and contextual cues across time, and calls into question the assumption in previous work that finding these effects must imply maintenance of subcategorical information (<xref ref-type="bibr" rid="B2">Bicknell et al., 2025</xref>; <xref ref-type="bibr" rid="B3">Brown-Schmidt &amp; Toscano, 2017</xref>; <xref ref-type="bibr" rid="B11">Connine et al., 1991</xref>). When we quantitatively evaluated this model, however, it provided a very poor fit to the data. This work, thus, highlights the importance of directly fitting quantitative predictions of cognitive models to behavioral data. With an eye to the future, we see two major avenues for advancement in this area.</p>
<sec>
<title>5.1 What kind of subcategorical information do listeners maintain?</title>
<p>While the present work reveals that listeners can maintain gradient representations of previous input, it is unclear what <italic>kind</italic> of information is contained in these representations. Throughout this paper, we use the general term <italic>subcategorical information</italic> to refer to any kind of representation of past input that is below the level of a categorical decision. But how detailed these representations are has significant implications for the language processing system. For example, listeners could maintain information about specific cue values over time, which would likely be a highly resource-intensive process. By contrast, listeners may maintain something as general as a probability distribution over possible categories, which would be less resource-intensive, but still sufficient to perform ideal cue integration (under some simplifying assumptions). It is also possible that there is some mixture of representations maintained over different timescales; listeners may maintain fine phonetic detail over limited timescales, moving to uncertainty over categories as more time passes. In this work, we aimed to show that maintenance of <italic>some kind</italic> of subcategorical information is possible over long timescales, but our paradigm cannot adjudicate between these different types of representations.</p>
<p>This issue is not trivially solvable. In a series of neuroimaging studies, Gwilliams et al. (<xref ref-type="bibr" rid="B22">2018</xref>, <xref ref-type="bibr" rid="B21">2022</xref>) use MEG to reveal across time the neural activity of brain regions known to be associated with phonetic processing. In particular, Gwilliams et al. (<xref ref-type="bibr" rid="B21">2022</xref>) are able to decode phonetic features from these regions (as subjects listen to natural speech) for modest perceptual distances (&#8764;300 ms). However, these data do not necessarily disambiguate whether listeners have access to more detailed information: indeed, uncertainty about phonetic feature identity may paradoxically lead to <italic>worse</italic> decoding accuracy (particularly in noisy natural speech), precisely because listeners have access to information more detailed than the binary phonetic feature category level, leading to a higher degree of uncertainty at the category level. Furthermore, listeners could, in principle, make perceptual <italic>commitments</italic> while continuing to maintain subcategorical detail over time &#8211; what pattern of neural responses this would predict is unclear.</p>
<p>There is a second line of work that tackles the problem of representational detail behaviorally, using the perceptual recalibration paradigm. Caplan et al. (<xref ref-type="bibr" rid="B8">2021</xref>) find that lexical labeling following exposure to acoustically manipulated words failed to induce perceptual recalibration effects (in contrast to lexical labeling preceding acoustic information). Perceptual recalibration requires that listeners be able to track acoustic cues and re-map them to phonemic categories, so the absence of recalibration suggests listeners do not have access to representations as detailed as acoustic cue values at the time of lexical labeling. However, other work using a different accent adaptation paradigm has found effects with delayed lexical labeling (<xref ref-type="bibr" rid="B4">Burchill et al., 2018</xref>).</p>
<p>Given the results from the above lines of work, we find it likely that listeners in our studies maintain a more general uncertainty, such as a probability distribution over phonemic categories, rather than a more detailed representation of acoustic feature values. However, we cannot rule it out, and testing these questions is very tricky. Careful model-building and highly controlled experiments are likely to be key to future work in this area.</p>
</sec>
<sec>
<title>5.2 How general is subcategorical information maintenance?</title>
<p>How generalizable our results are to naturalistic language comprehension depends on two factors: (i) how well our models, which are fit to entire experiments, capture what any particular <italic>individual</italic> listener does; and (ii) whether our task is reflective of typical language use.</p>
<p>Psycholinguistic experiments generally assume a <italic>modal</italic> language user &#8211; that is, we operate under the assumption that most humans share the same fundamental language production and comprehension processes, with some limited exceptions (for recent discussion of this issue, see <xref ref-type="bibr" rid="B41">McMurray et al., 2023</xref>). This is in contrast to an approach which views psycholinguistic processes as fundamentally variable and under which each individual&#8217;s behavioral patterns are considered. Thus, it is important in our work to at least begin to address to what degree our average results (here, the models fit to entire experiments) are sufficient descriptors of individual participants. To tackle this issue, we fit each of our models to individual participants. Unfortunately, given the small amount of data at the individual level, the results were inconclusive. The most notable result to us was that the categorize-&amp;-discard model performed much better on an individual level than at the aggregate level; in particular, it was the best-fitting model for nearly the same number of participants as the ideal integration model. The ambiguity-dependent model, by contrast, performed much worse for individuals than in the aggregate models. To some degree, this is likely driven by the strong VOT effects present within subjects. However, we want to avoid speculating about these results too much &#8211; there were no participants who had a statistically clear best-fitting model. Ultimately, to address the question of whether there is individual variation in the mechanisms of subcategorical information maintenance, we need a different approach. Future work should collect significantly more data per participant and fit one model that implements a mixture over the base models &#8211; in this way, one could get an estimate of the degree to which each model captures variation in strategies between individuals. Such an approach could also help us understand whether there may even be changes in strategy between trials.</p>
<p>A second concern about the generalizability of the present work concerns task. The experiments presented here use highly predictable, repetitive stimuli: participants are always asked about the first phoneme of the third word of the sentence, which is predictably followed by additional sentence context 3&#8211;9 syllables later. Thus, it is worth considering to what extent our results here reflect real, day-to-day language comprehension, versus a learned task strategy that develops based on exposure to our particular stimuli. To some degree, we can address this question empirically. If participants in our studies show effects of VOT and context from the very beginning of the experiment, this would constitute evidence (albeit limited) that use of these time-disjoint cues is not a task-dependent strategy that requires repeated exposure to our stimuli. To that end, we conducted additional trial analyses (presented in SI &#167;3 at the GitHub repository for this study) on each of our experiments. We find that there is strong evidence for effects of both context and VOT from the very first trial of the experiment. Of course, this does not constitute evidence that there are no task-specific adaptations that may result in behavioral patterns that are not present in natural language comprehension. To mitigate this problem, future work using a paradigm like ours should take steps to draw listeners&#8217; attention away from critical manipulations, including introducing filler items, probing alternative words in the sentences, and developing a larger set of sentence items to reduce repetition.</p>
<p>Even if we take as a given that the effects we find here are not task-dependent, it is worth asking whether subcategorical information maintenance is a static, unchanging <italic>mechanism</italic> of language comprehension, or is a <italic>strategy</italic> that is malleable and under listeners&#8217; control. To this point, we have used these terms interchangeably, but they imply quite different things about the language processing system. Consider what it would mean for subcategorical information maintenance to be a general mechanism: it would imply that listeners always maintain subcategorical representations about every segment of speech input on an indefinite timescale &#8211; the memory demands this process would imply seem immense (and, to some degree, contradictory to the general principle of incrementality in language processing; <xref ref-type="bibr" rid="B9">Christiansen &amp; Chater, 2016</xref>). Some work has begun to test whether subcategorical information maintenance can change across time; for example, Bushong and Jaeger (<xref ref-type="bibr" rid="B7">2025</xref>) propose that the expected utility of context modulates whether listeners maintain subcategorical information; they find that when sentence context is less informative, listeners subsequently down-weight its use in a spoken word recognition task. Some lexical garden-path studies have also started to investigate whether individual listeners&#8217; perceptual abilities modulate acoustic-lexical cue integration (<xref ref-type="bibr" rid="B26">Kapnoula et al., 2021</xref>). And, as we mentioned above, extensions to our current work to model mixtures of strategies may also begin to elucidate these processes. As of yet, however, there are no concrete theories of how perceptual, attentional, and memory processes together play a role in maintaining and updating linguistic representations of uncertainty in real time spoken language understanding.<xref ref-type="fn" rid="n11">11</xref> We see this as a fruitful area for future work to address.</p>
</sec>
</sec>
<sec>
<title>6. Conclusion</title>
<p>The present work suggests that there is strong evidence that listeners can maintain subcategorical representations of previous linguistic input for long perceptual timescales beyond the single word. The present results point to a need for broader theories of speech perception (and language processing generally) to recognize that listeners have access to low-level information even after initial processing. Converging evidence from other domains (e.g., maintenance of uncertainty about syntactic parses over time; <xref ref-type="bibr" rid="B23">Hahn et al., 2022</xref>; <xref ref-type="bibr" rid="B31">Levy et al., 2009</xref>) suggests that maintaining intermediate representations about linguistic input may be the norm, rather than the exception, of the human language processing system.</p>
</sec>
</body>
<back>
<sec>
<title>Data accessibility statement</title>
<p>Our experimental materials, full datasets, and analysis scripts can be found in our GitHub repository, which can be accessed via this persistent link: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.15237589">https://doi.org/10.5281/zenodo.15237589</ext-link>.</p>
</sec>
<sec>
<title>Ethics and consent</title>
<p>All experiments were conducted when W.B. was affiliated with University of Rochester; the experiments were approved by the University of Rochester Research Subjects Review Board Case No. 00045955.</p>
</sec>
<sec>
<title>Acknowledgments</title>
<p>This research was funded by NICHD HD075797 to T. Florian Jaeger and NSF NRT 1449828 to Wednesday Bushong. The views expressed here do not necessarily reflect those of the funding agencies. I would like to thank T. Florian Jaeger for many helpful discussions of this project over the years, and for key contributions to the model fitting procedures and data visualization; and for funding support. Evan Hamaguchi and Chelsea March assisted with stimulus sentence creation and recording. Thanks to Mike Tanenhaus and Aaron White for critical discussions about early forms of this work. I am grateful to those who read early drafts of this manuscript, especially Maleka Donaldson and Jennifer McLeer. Thanks to three anonymous peer reviewers for their insightful feedback on this work.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The author has no competing interests to declare.</p>
</sec>
<sec>
<title>Authors contributions</title>
<p>W.B. conceptualized the models and experiments, collected and analyzed the data, and wrote the manuscript.</p>
</sec>
<sec>
<title>ORCiD IDs</title>
<p>Wednesday Bushong <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://orcid.org/0000-0002-1837-0689">https://orcid.org/0000-0002-1837-0689</ext-link></p>
</sec>
<fn-group>
<fn id="n1"><p>We use the terms <italic>speech perception and spoken word recognition</italic> interchangeably, as it is hard to disentangle whether word categorization effects reflect listeners&#8217; perception of a particular speech sound or a whole word.</p></fn>
<fn id="n2"><p>By <italic>subcategorical information</italic>, we mean that listeners maintain a representation with more detail than a simple categorical decision; this could range from maintenance of fine phonetic detail to a probability distribution over phonemic categories; we discuss this in more detail in 5.1.</p></fn>
<fn id="n3"><p>Notably, these models optimize categorization accuracy; they do not optimize other reasonable goals an organism might have, like categorization speed or memory economy.</p></fn>
<fn id="n4"><p>This argument was first presented by McClelland and Elman (<xref ref-type="bibr" rid="B40">1986</xref>) as a key motivation for TRACE, on the basis of the Ganong effect (<xref ref-type="bibr" rid="B18">Ganong, 1980</xref>).</p></fn>
<fn id="n5"><p>The GitHub repository containing full data, analyses, and supplementary information for this project can be accessed via this persistent link: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.15237589">https://doi.org/10.5281/zenodo.15237589</ext-link>.</p></fn>
<fn id="n6"><p>I.e., <italic>&#945;</italic> is not a free parameter in this model.</p></fn>
<fn id="n7"><p>These experiments were conducted between 2016&#8211;2018 and were based on a $6/hour compensation rate.</p></fn>
<fn id="n8"><p>For full sentence materials and details about which materials were used in which experiments, see the stimulus files in our GitHub repository.</p></fn>
<fn id="n9"><p><ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://discourse.mc-stan.org/t/interpreting-elpd-diff-loo-package/1628/2">https://discourse.mc-stan.org/t/interpreting-elpd-diff-loo-package/1628/2</ext-link>.</p></fn>
<fn id="n10"><p>We show Experiment 2, because this dataset had the largest overall context effect, which makes the qualitative differences between the model fits more clear. However, the model fits to the three other experiments showed the same quantitative and qualitative patterns (see Figures S3&#8211;6 in the SI at the GitHub repository for this study).</p></fn>
<fn id="n11"><p>Notably, some work in sentence processing has begun to address these issues (e.g., recent extensions of noisy-channel surprisal, such as <xref ref-type="bibr" rid="B23">Hahn et al., 2022</xref>).</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><mixed-citation publication-type="journal"><string-name><surname>Bengio</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name><surname>Grandvalet</surname>, <given-names>Y.</given-names></string-name> (<year>2004</year>). <article-title>No unbiased estimator of the variance of k-fold cross-validation</article-title>. <source>Journal of Machine Learning Research</source>, <volume>5</volume>(<month>Sep</month>), <fpage>1089</fpage>&#8211;<lpage>1105</lpage>.</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><string-name><surname>Bicknell</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Bushong</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Tanenhaus</surname>, <given-names>M. K.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2025</year>). <article-title>Maintenance of subcategorical information during speech perception: Revisiting misunderstood limitations</article-title>. <source>Journal of Memory and Language</source>, <volume>140</volume>, <elocation-id>104565</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jml.2024.104565</pub-id></mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><string-name><surname>Brown-Schmidt</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Toscano</surname>, <given-names>J. C.</given-names></string-name> (<year>2017</year>). <article-title>Gradient acoustic information induces long-lasting referential uncertainty in short discourses</article-title>. <source>Language, Cognition and Neuroscience</source>, <volume>32</volume>(<issue>10</issue>), <fpage>1211</fpage>&#8211;<lpage>1228</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2017.1325508</pub-id></mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><string-name><surname>Burchill</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2018</year>). <article-title>Maintaining information about speech input during accent adaptation</article-title>. <source>PloS One</source>, <volume>13</volume>(<issue>8</issue>), <elocation-id>e0199358</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0199358</pub-id></mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><string-name><surname>B&#252;rkner</surname>, <given-names>P.-C.</given-names></string-name>, <italic>et al.</italic> (<year>2017</year>). <article-title>Brms: An R package for Bayesian multilevel models using Stan</article-title>. <source>Journal of Statistical Software</source>, <volume>80</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v080.i01</pub-id></mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><string-name><surname>Bushong</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2019</year>). <article-title>Dynamic re-weighting of acoustic and contextual cues in spoken word recognition</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>146</volume>(<issue>2</issue>), <fpage>EL135</fpage>&#8211;<lpage>EL140</lpage>. <pub-id pub-id-type="doi">10.1121/1.5119271</pub-id></mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="journal"><string-name><surname>Bushong</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2025</year>). <article-title>Changes in informativity of sentential context affects its integration with subcategorical information about preceding speech</article-title>. <source>Journal of Experimental Psychology: Learning, Memory, and Cognition</source>. <pub-id pub-id-type="doi">10.1037/xlm0001443</pub-id></mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><string-name><surname>Caplan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Hafri</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Trueswell</surname>, <given-names>J. C.</given-names></string-name> (<year>2021</year>). <article-title>Now you hear me, later you don&#8217;t: The immediacy of linguistic computation and the representation of speech</article-title>. <source>Psychological Science</source>, <volume>32</volume>(<issue>3</issue>), <fpage>410</fpage>&#8211;<lpage>423</lpage>. <pub-id pub-id-type="doi">10.1177/0956797620968787</pub-id></mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><string-name><surname>Christiansen</surname>, <given-names>M. H.</given-names></string-name>, &amp; <string-name><surname>Chater</surname>, <given-names>N.</given-names></string-name> (<year>2016</year>). <article-title>The now-or-never bottleneck: A fundamental constraint on language</article-title>. <source>Behavioral and Brain Sciences</source>, <volume>39</volume>, <elocation-id>E62</elocation-id>. <pub-id pub-id-type="doi">10.1017/S0140525X1500031X</pub-id></mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="journal"><string-name><surname>Clayards</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Tanenhaus</surname>, <given-names>M. K.</given-names></string-name>, <string-name><surname>Aslin</surname>, <given-names>R. N.</given-names></string-name>, &amp; <string-name><surname>Jacobs</surname>, <given-names>R. A.</given-names></string-name> (<year>2008</year>). <article-title>Perception of speech reflects optimal use of probabilistic speech cues</article-title>. <source>Cognition</source>, <volume>108</volume>(<issue>3</issue>), <fpage>804</fpage>&#8211;<lpage>809</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2008.04.004</pub-id></mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="journal"><string-name><surname>Connine</surname>, <given-names>C. M.</given-names></string-name>, <string-name><surname>Blasko</surname>, <given-names>D. G.</given-names></string-name>, &amp; <string-name><surname>Hall</surname>, <given-names>M.</given-names></string-name> (<year>1991</year>). <article-title>Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constraints</article-title>. <source>Journal of Memory and Language</source>, <volume>30</volume>(<issue>1</issue>), <elocation-id>234</elocation-id>. <pub-id pub-id-type="doi">10.1016/0749-596x(91)90005-5</pub-id></mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><string-name><surname>Cooper</surname>, <given-names>F. S.</given-names></string-name>, <string-name><surname>Delattre</surname>, <given-names>P. C.</given-names></string-name>, <string-name><surname>Liberman</surname>, <given-names>A. M.</given-names></string-name>, <string-name><surname>Borst</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name><surname>Gerstman</surname>, <given-names>L. J.</given-names></string-name> (<year>1952</year>). <article-title>Some experiments on the perception of synthetic speech sounds</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>24</volume>(<issue>6</issue>), <fpage>597</fpage>&#8211;<lpage>606</lpage>. <pub-id pub-id-type="doi">10.1121/1.1906940</pub-id></mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><string-name><surname>Crinnion</surname>, <given-names>A. M.</given-names></string-name>, <string-name><surname>Heffner</surname>, <given-names>C. C.</given-names></string-name>, &amp; <string-name><surname>Myers</surname>, <given-names>E. B.</given-names></string-name> (<year>2024</year>). <article-title>Individual differences in the use of top-down versus bottom-up cues to resolve phonetic ambiguity</article-title>. <source>Attention, Perception, &amp; Psychophysics</source>, <fpage>1</fpage>&#8211;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-024-02889-4</pub-id></mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><string-name><surname>Dahan</surname>, <given-names>D.</given-names></string-name> (<year>2010</year>). <article-title>The time course of interpretation in speech comprehension</article-title>. <source>Current Directions in Psychological Science</source>, <volume>19</volume>(<issue>2</issue>), <fpage>121</fpage>&#8211;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1177/0963721410364726</pub-id></mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="journal"><string-name><surname>Ernst</surname>, <given-names>M. O.</given-names></string-name>, &amp; <string-name><surname>Banks</surname>, <given-names>M. S.</given-names></string-name> (<year>2002</year>). <article-title>Humans integrate visual and haptic information in a statistically optimal fashion</article-title>. <source>Nature</source>, <volume>415</volume>(<issue>6870</issue>), <fpage>429</fpage>&#8211;<lpage>433</lpage>. <pub-id pub-id-type="doi">10.1038/415429a</pub-id></mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="journal"><string-name><surname>Falandays</surname>, <given-names>J. B.</given-names></string-name>, <string-name><surname>Brown-Schmidt</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Toscano</surname>, <given-names>J. C.</given-names></string-name> (<year>2020</year>). <article-title>Long-lasting gradient activation of referents during spoken language processing</article-title>. <source>Journal of Memory and Language</source>, <volume>112</volume>, <elocation-id>104088</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jml.2020.104088</pub-id></mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="journal"><string-name><surname>Feldman</surname>, <given-names>N. H.</given-names></string-name>, <string-name><surname>Griffiths</surname>, <given-names>T. L.</given-names></string-name>, &amp; <string-name><surname>Morgan</surname>, <given-names>J. L.</given-names></string-name> (<year>2009</year>). <article-title>The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference</article-title>. <source>Psychological Review</source>, <volume>116</volume>(<issue>4</issue>), <fpage>752</fpage>&#8211;<lpage>782</lpage>. <pub-id pub-id-type="doi">10.1037/a0017196</pub-id></mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="journal"><string-name><surname>Ganong</surname>, <given-names>W. F.</given-names></string-name> (<year>1980</year>). <article-title>Phonetic categorization in auditory word perception</article-title>. <source>Journal of Experimental Psychology: Human Perception and Performance</source>, <volume>6</volume>(<issue>1</issue>), <fpage>110</fpage>&#8211;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1037//0096-1523.6.1.110</pub-id></mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><string-name><surname>Gelman</surname>, <given-names>A.</given-names></string-name> (<year>2008</year>). <article-title>Scaling regression inputs by dividing by two standard deviations</article-title>. <source>Statistics in Medicine</source>, <volume>27</volume>(<issue>15</issue>), <fpage>2865</fpage>&#8211;<lpage>2873</lpage>. <pub-id pub-id-type="doi">10.1002/sim.3107</pub-id></mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="journal"><string-name><surname>Gelman</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Hwang</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Vehtari</surname>, <given-names>A.</given-names></string-name> (<year>2014</year>). <article-title>Understanding predictive information criteria for Bayesian models</article-title>. <source>Statistics and Computing</source>, <volume>24</volume>(<issue>6</issue>), <fpage>997</fpage>&#8211;<lpage>1016</lpage>. <pub-id pub-id-type="doi">10.1007/s11222-013-9416-2</pub-id></mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><string-name><surname>Gwilliams</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>King</surname>, <given-names>J.-R.</given-names></string-name>, <string-name><surname>Marantz</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Poeppel</surname>, <given-names>D.</given-names></string-name> (<year>2022</year>). <article-title>Neural dynamics of phoneme sequences reveal position-invariant code for content and order</article-title>. <source>Nature Communications</source>, <volume>13</volume>(<issue>1</issue>), <elocation-id>6606</elocation-id>. <pub-id pub-id-type="doi">10.1038/s41467-022-34326-1</pub-id></mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="journal"><string-name><surname>Gwilliams</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Linzen</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Poeppel</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name><surname>Marantz</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>In spoken word recognition, the future predicts the past</article-title>. <source>Journal of Neuroscience</source>, <volume>38</volume>(<issue>35</issue>), <fpage>7585</fpage>&#8211;<lpage>7599</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0065-18.2018</pub-id></mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="journal"><string-name><surname>Hahn</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Futrell</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Levy</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name><surname>Gibson</surname>, <given-names>E.</given-names></string-name> (<year>2022</year>). <article-title>A resource-rational model of human processing of recursive linguistic structure</article-title>. <source>Proceedings of the National Academy of Sciences</source>, <volume>119</volume>(<issue>43</issue>), <elocation-id>e2122602119</elocation-id>. <pub-id pub-id-type="doi">10.1073/pnas.2122602119</pub-id></mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2008</year>). <article-title>Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models</article-title>. <source>Journal of Memory and Language</source>, <volume>59</volume>(<issue>4</issue>), <fpage>434</fpage>&#8211;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2007.11.007</pub-id></mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Just</surname>, <given-names>M. A.</given-names></string-name>, &amp; <string-name><surname>Carpenter</surname>, <given-names>P. A.</given-names></string-name> (<year>1980</year>). <article-title>A theory of reading: From eye fixations to comprehension</article-title>. <source>Psychological Review</source>, <volume>87</volume>(<issue>4</issue>), <fpage>329</fpage>&#8211;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1037//0033-295x.87.4.329</pub-id></mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="journal"><string-name><surname>Kapnoula</surname>, <given-names>E. C.</given-names></string-name>, <string-name><surname>Edwards</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name> (<year>2021</year>). <article-title>Gradient activation of speech categories facilitates listeners&#8217; recovery from lexical garden paths, but not perception of speech-in-noise</article-title>. <source>Journal of Experimental Psychology: Human Perception and Performance</source>, <volume>47</volume>(<issue>4</issue>), <elocation-id>578</elocation-id>. <pub-id pub-id-type="doi">10.1037/xhp0000900</pub-id></mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="journal"><string-name><surname>Kingston</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Diehl</surname>, <given-names>R. L.</given-names></string-name> (<year>1994</year>). <article-title>Phonetic knowledge</article-title>. <source>Language</source>, <volume>70</volume>(<issue>3</issue>), <fpage>419</fpage>&#8211;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.2307/416481</pub-id></mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="journal"><string-name><surname>Klatt</surname>, <given-names>D. H.</given-names></string-name> (<year>1976</year>). <article-title>Linguistic uses of segmental duration in English: Acoustic and perceptual evidence</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>59</volume>(<issue>5</issue>), <fpage>1208</fpage>&#8211;<lpage>1221</lpage>. <pub-id pub-id-type="doi">10.1121/1.380986</pub-id></mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="journal"><string-name><surname>Kleinschmidt</surname>, <given-names>D. F.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2015</year>). <article-title>Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel</article-title>. <source>Psychological Review</source>, <volume>122</volume>(<issue>2</issue>), <fpage>148</fpage>&#8211;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1037/a0038695</pub-id></mixed-citation></ref>
<ref id="B30"><mixed-citation publication-type="journal"><string-name><surname>Kronrod</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Coppess</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Feldman</surname>, <given-names>N. H.</given-names></string-name> (<year>2016</year>). <article-title>A unified account of categorical effects in phonetic perception</article-title>. <source>Psychonomic Bulletin &amp; Review</source>, <volume>23</volume>(<issue>6</issue>), <fpage>1681</fpage>&#8211;<lpage>1712</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-016-1049-y</pub-id></mixed-citation></ref>
<ref id="B31"><mixed-citation publication-type="journal"><string-name><surname>Levy</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Bicknell</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Slattery</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name><surname>Rayner</surname>, <given-names>K.</given-names></string-name> (<year>2009</year>). <article-title>Eye movement evidence that readers maintain and act on uncertainty about past linguistic input</article-title>. <source>Proceedings of the National Academy of Sciences</source>, <volume>106</volume>(<issue>50</issue>), <fpage>21086</fpage>&#8211;<lpage>21090</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0907664106</pub-id></mixed-citation></ref>
<ref id="B32"><mixed-citation publication-type="journal"><string-name><surname>Lewandowski</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Kurowicka</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name><surname>Joe</surname>, <given-names>H.</given-names></string-name> (<year>2009</year>). <article-title>Generating random correlation matrices based on vines and extended onion method</article-title>. <source>Journal of Multivariate Analysis</source>, <volume>100</volume>(<issue>9</issue>), <fpage>1989</fpage>&#8211;<lpage>2001</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmva.2009.04.008</pub-id></mixed-citation></ref>
<ref id="B33"><mixed-citation publication-type="journal"><string-name><surname>Liberman</surname>, <given-names>A. M.</given-names></string-name> (<year>1957</year>). <article-title>Some results of research on speech perception</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>29</volume>(<issue>1</issue>), <fpage>117</fpage>&#8211;<lpage>123</lpage>. <pub-id pub-id-type="doi">10.1121/1.1908635</pub-id></mixed-citation></ref>
<ref id="B34"><mixed-citation publication-type="journal"><string-name><surname>Lisker</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Abramson</surname>, <given-names>A. S.</given-names></string-name> (<year>1967</year>). <article-title>Some effects of context on voice onset time in English stops</article-title>. <source>Language and Speech</source>, <volume>10</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1177/002383096701000101</pub-id></mixed-citation></ref>
<ref id="B35"><mixed-citation publication-type="journal"><string-name><surname>Lisker</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Abramson</surname>, <given-names>A. S.</given-names></string-name> (<year>1970</year>). <article-title>The voicing dimension: Some experiments in comparative phonetics</article-title>. <source>Proceedings of the 6th International Congress of Phonetic Sciences</source>, <volume>563</volume>, <fpage>563</fpage>&#8211;<lpage>567</lpage>.</mixed-citation></ref>
<ref id="B36"><mixed-citation publication-type="journal"><string-name><surname>Luce</surname>, <given-names>P. A.</given-names></string-name>, &amp; <string-name><surname>Pisoni</surname>, <given-names>D. B.</given-names></string-name> (<year>1998</year>). <article-title>Recognizing spoken words: The neighborhood activation model</article-title>. <source>Ear and Hearing</source>, <volume>19</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1097/00003446-199802000-00001</pub-id></mixed-citation></ref>
<ref id="B37"><mixed-citation publication-type="journal"><string-name><surname>Luce</surname>, <given-names>R. D.</given-names></string-name> (<year>1963</year>). <article-title>A threshold theory for simple detection experiments</article-title>. <source>Psychological Review</source>, <volume>70</volume>(<issue>1</issue>), <elocation-id>61</elocation-id>. <pub-id pub-id-type="doi">10.1037/h0039723</pub-id></mixed-citation></ref>
<ref id="B38"><mixed-citation publication-type="journal"><string-name><surname>Magnuson</surname>, <given-names>J. S.</given-names></string-name>, <string-name><surname>You</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Luthra</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Nam</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Escab&#237;</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Brown</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Allopenna</surname>, <given-names>P. D.</given-names></string-name>, <string-name><surname>Theodore</surname>, <given-names>R. M.</given-names></string-name>, <string-name><surname>Monto</surname>, <given-names>N.</given-names></string-name>, &amp; <string-name><surname>Rueckl</surname>, <given-names>J. G.</given-names></string-name> (<year>2020</year>). <article-title>EARSHOT: A minimal neural network model of incremental human speech recognition</article-title>. <source>Cognitive Science</source>, <volume>44</volume>(<issue>4</issue>), <elocation-id>e12823</elocation-id>. <pub-id pub-id-type="doi">10.1111/cogs.12823</pub-id></mixed-citation></ref>
<ref id="B39"><mixed-citation publication-type="journal"><string-name><surname>Massaro</surname>, <given-names>D. W.</given-names></string-name>, &amp; <string-name><surname>Friedman</surname>, <given-names>D.</given-names></string-name> (<year>1990</year>). <article-title>Models of integration given multiple sources of information</article-title>. <source>Psychological Review</source>, <volume>97</volume>(<issue>2</issue>), <elocation-id>225</elocation-id>. <pub-id pub-id-type="doi">10.1037//0033-295x.97.2.225</pub-id></mixed-citation></ref>
<ref id="B40"><mixed-citation publication-type="journal"><string-name><surname>McClelland</surname>, <given-names>J. L.</given-names></string-name>, &amp; <string-name><surname>Elman</surname>, <given-names>J. L.</given-names></string-name> (<year>1986</year>). <article-title>The TRACE model of speech perception</article-title>. <source>Cognitive Psychology</source>, <volume>18</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0285(86)90015-0</pub-id></mixed-citation></ref>
<ref id="B41"><mixed-citation publication-type="journal"><string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Baxelbaum</surname>, <given-names>K. S.</given-names></string-name>, <string-name><surname>Colby</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Bruce Tomblin</surname>, <given-names>J.</given-names></string-name> (<year>2023</year>). <article-title>Understanding language processing in variable populations on their own terms: Towards a functionalist psycholinguistics of individual differences, development, and disorders</article-title>. <source>Applied Psycholinguistics</source>, <volume>44</volume>(<issue>4</issue>), <fpage>565</fpage>&#8211;<lpage>592</lpage>. <pub-id pub-id-type="doi">10.1017/s0142716423000255</pub-id></mixed-citation></ref>
<ref id="B42"><mixed-citation publication-type="journal"><string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Tanenhaus</surname>, <given-names>M. K.</given-names></string-name>, &amp; <string-name><surname>Aslin</surname>, <given-names>R. N.</given-names></string-name> (<year>2002</year>). <article-title>Gradient effects of within-category phonetic variation on lexical access</article-title>. <source>Cognition</source>, <volume>86</volume>(<issue>2</issue>), <fpage>B33</fpage>&#8211;<lpage>B42</lpage>. <pub-id pub-id-type="doi">10.1016/S0010-0277(02)00157-9</pub-id></mixed-citation></ref>
<ref id="B43"><mixed-citation publication-type="journal"><string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Tanenhaus</surname>, <given-names>M. K.</given-names></string-name>, &amp; <string-name><surname>Aslin</surname>, <given-names>R. N.</given-names></string-name> (<year>2009</year>). <article-title>Within-category VOT affects recovery from &#8220;lexical&#8221; garden-paths: Evidence against phoneme-level inhibition</article-title>. <source>Journal of Memory and Language</source>, <volume>60</volume>(<issue>1</issue>), <fpage>65</fpage>&#8211;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2008.07.002</pub-id></mixed-citation></ref>
<ref id="B44"><mixed-citation publication-type="journal"><string-name><surname>Norris</surname>, <given-names>D.</given-names></string-name> (<year>1994</year>). <article-title>Shortlist: A connectionist model of continuous speech recognition</article-title>. <source>Cognition</source>, <volume>52</volume>(<issue>3</issue>), <fpage>189</fpage>&#8211;<lpage>234</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0277(94)90043-4</pub-id></mixed-citation></ref>
<ref id="B45"><mixed-citation publication-type="journal"><string-name><surname>Norris</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name><surname>McQueen</surname>, <given-names>J. M.</given-names></string-name> (<year>2008</year>). <article-title>Shortlist B: A Bayesian model of continuous speech recognition</article-title>. <source>Psychological Review</source>, <volume>115</volume>(<issue>2</issue>), <fpage>357</fpage>&#8211;<lpage>395</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.115.2.357</pub-id></mixed-citation></ref>
<ref id="B46"><mixed-citation publication-type="journal"><string-name><surname>Oden</surname>, <given-names>G. C.</given-names></string-name>, &amp; <string-name><surname>Massaro</surname>, <given-names>D. W.</given-names></string-name> (<year>1978</year>). <article-title>Integration of featural information in speech perception</article-title>. <source>Psychological Review</source>, <volume>85</volume>(<issue>3</issue>), <fpage>172</fpage>&#8211;<lpage>191</lpage>. <pub-id pub-id-type="doi">10.1037//0033-295x.85.3.172</pub-id></mixed-citation></ref>
<ref id="B47"><mixed-citation publication-type="journal"><string-name><surname>Port</surname>, <given-names>R. F.</given-names></string-name> (<year>1979</year>). <article-title>The influence of tempo on stop closure duration as a cue for voicing and place</article-title>. <source>Journal of Phonetics</source>, <volume>7</volume>(<issue>1</issue>), <fpage>45</fpage>&#8211;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1016/s0095-4470(19)31032-0</pub-id></mixed-citation></ref>
<ref id="B48"><mixed-citation publication-type="webpage"><collab>R Core Team</collab>. (<year>2016</year>). <source>R: A language and environment for statistical computing</source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>. <uri>https://www.R-project.org/</uri></mixed-citation></ref>
<ref id="B49"><mixed-citation publication-type="journal"><string-name><surname>Szostak</surname>, <given-names>C. M.</given-names></string-name>, &amp; <string-name><surname>Pitt</surname>, <given-names>M. A.</given-names></string-name> (<year>2013</year>). <article-title>The prolonged influence of subsequent context on spoken word recognition</article-title>. <source>Attention, Perception, &amp; Psychophysics</source>, <volume>75</volume>(<issue>7</issue>), <fpage>1533</fpage>&#8211;<lpage>1546</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-013-0492-3</pub-id></mixed-citation></ref>
<ref id="B50"><mixed-citation publication-type="journal"><string-name><surname>Toscano</surname>, <given-names>J. C.</given-names></string-name>, <string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Dennhardt</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Luck</surname>, <given-names>S. J.</given-names></string-name> (<year>2010</year>). <article-title>Continuous perception and graded categorization: Electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech</article-title>. <source>Psychological Science</source>, <volume>21</volume>(<issue>10</issue>), <fpage>1532</fpage>&#8211;<lpage>1540</lpage>. <pub-id pub-id-type="doi">10.1177/0956797610384142</pub-id></mixed-citation></ref>
<ref id="B51"><mixed-citation publication-type="journal"><string-name><surname>Watanabe</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Opper</surname>, <given-names>M.</given-names></string-name> (<year>2010</year>). <article-title>Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory</article-title>. <source>Journal of Machine Learning Research</source>, <volume>11</volume>(<issue>12</issue>), <fpage>3571</fpage>&#8211;<lpage>3594</lpage>.</mixed-citation></ref>
<ref id="B52"><mixed-citation publication-type="journal"><string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Dahan</surname>, <given-names>D.</given-names></string-name> (<year>2019</year>). <article-title>Listeners maintain phonological uncertainty over time and across words: The case of vowel nasality in English</article-title>. <source>Journal of Phonetics</source>, <volume>76</volume>, <elocation-id>100910</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.wocn.2019.06.001</pub-id></mixed-citation></ref>
</ref-list>
</back>
</article>