<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2767-0279</journal-id>
<journal-title-group>
<journal-title>Glossa Psycholinguistics</journal-title>
</journal-title-group>
<issn pub-type="epub">2767-0279</issn>
<publisher>
<publisher-name>eScholarship Publishing</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5070/G601121217</article-id>
<article-categories>
<subj-group>
<subject>Regular article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>When multiple talker exposure is necessary for cross-talker generalization: Insights into the emergence of sociolinguistic perception</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Aoki</surname>
<given-names>Nicholas B.</given-names>
</name>
<email>nbaoki@ucdavis.edu</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zellou</surname>
<given-names>Georgia</given-names>
</name>
<email>gzellou@ucdavis.edu</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Department of Linguistics, University of California, Davis, 469 Kerr Hall, One Shields Avenue, Davis, California, USA 95616</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-02-11">
<day>11</day>
<month>02</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<elocation-id>10</elocation-id>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2025 The Author(s)</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://glossapsycholinguistics.journalpub.escholarship.org/articles/10.5070/G601121217/"/>
<abstract>
<p>Sociolinguistic research finds that: (i) the speech signal contains talker-specific and socio-indexical structure, with talkers varying idiosyncratically within the same social category and systematically across categories; (ii) both talker-specific and socio-indexical variation influence speech perception. What is unclear is how sociolinguistic perception arises &#8211; following exposure to an unfamiliar, socially-mediated variant, how do listeners learn that this feature is characteristic of a broader social group and can generalize to other group members? The current study exposed listeners to an unattested variant in L1-English (a /p/ to [b] phonetic shift), investigating how the number of exposure talkers mediates cross-talker generalization. All participants completed an exposure phase (phrase-final keyword identification) followed by a test phase (categorization along a <italic>buy-pie</italic> continuum for a novel female and male talker in separate blocks). Experiment 1 exposed listeners to a single shifted female talker (&#8220;The novel is now in <italic>brint</italic>&#8221;) and a single unshifted male talker. Experiment 2 presented two shifted female and two shifted male talkers. We find: (i) no generalization in Experiment 1 (no difference in <italic>buy-pie</italic> response between the novel talkers); (ii) robust generalization in Experiment 2 (greater <italic>pie</italic> response for the novel female than the novel male talker), but only when the novel female block is presented first (i.e., generalization is short-lived). Taken together, the results support a <italic>numerosity account</italic>: when a previously unheard social variant is presented, multiple talkers per social group seem to be necessary for socially-mediated, cross-talker generalization. This study highlights a critical role of the listener&#8217;s social experiences on generalization &#8211; multi-talker exposure might be unnecessary when exposed to more familiar types of speech (e.g., L2-accented English) and necessary when exposed to completely unfamiliar variants. Overall, the present experiments enhance our theoretical understanding of cross-talker generalization and offer insights into the emergence of sociolinguistic perception.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>1. Introduction</title>
<p>Imagine that you are traveling to present a poster in a country you have never been to before. You get off the plane, and as you walk through the airport terminal, you trip and drop your belongings. A woman quickly approaches you and says, &#8220;Let me get your <italic>boster</italic> for you&#8221;. Interestingly, this stranger has produced the initial consonant of &#8216;poster&#8217; with a token that is acoustically [b]-like in your own dialect of English. You have a short conversation with this woman and realize that her dialect is essentially the same as yours, except that all /p/-initial words begin with an acoustically [b]-like segment (&#8216;paint&#8217; = <italic>baint</italic>, &#8216;pulse&#8217; = <italic>bulse</italic>, etc.). Given that this is the first person you have met in this country, you might wonder about the source of this /p/ to [b] phonetic shift. Is the shift limited to this particular individual, or is it a systematic characteristic of a broader social group, such as all women within this country?</p>
<p>Real-life registers and dialects do not consist of singular phonetic shifts (<xref ref-type="bibr" rid="B83">Wolfram &amp; Schilling, 2015</xref>). Yet even this simplified scenario alludes to two important observations that highlight the complexity of adapting to novel talkers. First, the speech signal simultaneously contains both <italic>talker-specific covariation</italic> and <italic>socio-indexical covariation</italic> &#8211; linguistic variants can correlate with both individual talkers (<xref ref-type="bibr" rid="B23">Chodroff &amp; Wilson, 2017</xref>; <xref ref-type="bibr" rid="B91">Yu &amp; Zellou, 2019</xref>) and broader social categories (<xref ref-type="bibr" rid="B21">Carignan &amp; Zellou, 2023</xref>; <xref ref-type="bibr" rid="B46">Labov, 1966</xref>; <xref ref-type="bibr" rid="B94">Zellou &amp; Tamminga, 2014</xref>). Second, the speech signal often contains ambiguity (<xref ref-type="bibr" rid="B42">Kleinschmidt &amp; Jaeger, 2015</xref>). In particular, it may initially be unclear whether an unfamiliar variant is an idiosyncrasy associated with an individual talker or a socially meaningful pattern associated with a group of talkers.</p>
<p>Extensive research suggests that, given sufficient experience, listeners learn which variants are correlated with social cues, and they can leverage their knowledge of socio-indexical covariation to guide perception of the speech signal (<xref ref-type="bibr" rid="B20">Campbell-Kibler, 2010</xref>). In a classic study, Strand and Johnson (<xref ref-type="bibr" rid="B70">1996</xref>) presented L1-English listeners with an acoustically gender-ambiguous voice and showed an image of either a stereotypically male or female face. Listeners were asked to categorize tokens along a <italic>sod-shod</italic> continuum, and they provided more <italic>sod</italic> responses when the voice was paired with a male face than with a female face. Participants therefore recognize that /s/- and /&#643;/-variation is gender-mediated, not purely idiosyncratic, and they <italic>generalize</italic> their prior knowledge to a novel speaker based on (visual) cues to apparent gender. Similar types of perceptual effects have been found for many other social categories, such as age (<xref ref-type="bibr" rid="B80">Walker &amp; Hay, 2011</xref>), nationality (<xref ref-type="bibr" rid="B56">Niedzielski, 1999</xref>), dialect (<xref ref-type="bibr" rid="B28">D&#8217;Onofrio, 2015</xref>), ethnicity (<xref ref-type="bibr" rid="B8">Babel &amp; Russell, 2015</xref>), and apparent humanity (<xref ref-type="bibr" rid="B2">Aoki et al., 2022</xref>).</p>
<p>Studies of sociolinguistic perception often presume that participants already possess ample experience listening to the linguistic variants that are presented. An understudied question, however, is how knowledge of socio-indexical covariation emerges in the first place. Returning to the anecdote outlined earlier, imagine that a /p/ to [b] phonetic shift is truly gender-mediated in a hypothetical English dialect, not just produced by a single talker. How do listeners learn that this acoustic feature represents socio-indexical covariation, and when do they <italic>generalize</italic> this shift to their expectations about novel speakers? Although extensive research has been conducted on perceptual adaptation of speaker idiosyncrasies (<xref ref-type="bibr" rid="B57">Norris et al., 2003</xref>; <xref ref-type="bibr" rid="B76">Tzeng et al., 2021</xref>) and L2-accented speech (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>), relatively little work explicitly frames perceptual adaptation as a question of sociolinguistic perception (cf. <xref ref-type="bibr" rid="B3">Aoki &amp; Zellou, 2023a</xref>; <xref ref-type="bibr" rid="B44">Kleinschmidt et al., 2018</xref>; <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>). The current study begins to address this gap by examining how cross-talker generalization is affected by the number of exposure talkers and the socio-indexical structure of exposure.</p>
<p>The rest of the introduction is structured as follows. 1.1 reviews lexically guided perceptual learning and the ideal adapter framework, which serve as the experimental approach and theoretical framing of the current study, respectively. 1.2 revisits the debate about how the number of the exposure talkers affects generalization, offering a sociolinguistic explanation to account for conflicting findings. 1.3 explains why this work specifically examines covariation between speaker gender and stop consonant production in American English. Finally, 1.4 delineates the study design and hypotheses.</p>
<sec>
<title>1.1 Review of lexically guided perceptual learning and the ideal adapter framework</title>
<p>Work on lexically guided perceptual learning has demonstrated that listeners can readily adapt to idiosyncratic phonetic shifts produced by individual talkers (<xref ref-type="bibr" rid="B64">Samuel &amp; Kraljic, 2009</xref>). Participants in a typical adaptation experiment complete an exposure phase (often a lexical decision task) followed by a test phase (usually a categorization task). When listeners are placed in a talker-specific condition (i.e., the same speaker is presented at both exposure and test), test phase categorization is altered based on the lexical bias in exposure. Consistently replacing /d/ with an ambiguous segment between /d/ and /t/ in exposure (e.g., &#8216;croco?ile&#8217;) results in more /ada/ response along an /ada/-/ata/ continuum at test, while replacing /t/ with the same ambiguous segment (e.g., &#8216;fron?ier&#8217;) leads to a greater /ata/ response. Talker-specific adaptation is a highly replicated phenomenon that underscores the flexible nature of the perceptual system (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>; <xref ref-type="bibr" rid="B57">Norris et al., 2003</xref>; <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>).</p>
<p>Perceptual learning can be neatly accounted for through the ideal adapter framework (<xref ref-type="bibr" rid="B42">Kleinschmidt &amp; Jaeger, 2015</xref>). An ideal adapter leverages their prior knowledge about acoustic cue distributions to make the most accurate predictions possible about the incoming speech signal. Importantly, these predictions can be updated through experience, allowing the listener to adjust to novel input in the face of high acoustic variability. For example, after exposure to a talker who repeatedly produces /d/ as an ambiguous segment between /d/ and /t/ (e.g., &#8216;croco?ile&#8217;, &#8216;legen?ary&#8217;, etc.), listeners can then develop a talker-specific mental model. The model would predict that segments with ambiguous acoustic cues (e.g., voice onset time between prototypical /d/ and /t/) are more likely to be classified as /d/ for this talker compared to the general population. Listeners can then utilize this talker-specific model to help them make future predictions (e.g., that an ambiguous token between /ada/ and /ata/ is more likely to be /ada/ when produced by this particular talker).</p>
<p>Unlike talker-specific adaptation, cross-talker generalization is much more inconsistent (i.e., after exposure to a phonetic shift or accent, listeners may or may not be willing to apply what they have learned to a <italic>novel</italic> talker; <xref ref-type="bibr" rid="B81">Weatherholtz &amp; Jaeger, 2016</xref>). The ideal adapter framework can explain this relative restraint as striking a balance between being efficient while also maintaining accurate predictions (<xref ref-type="bibr" rid="B42">Kleinschmidt &amp; Jaeger, 2015</xref>). On the one hand, if listeners are initially exposed to a talker with a phonetic shift and then hear a novel test talker who likely produces the same shift, it would be more efficient to generalize &#8211; with little decrease in prediction accuracy, the same mental model can be applied to both talkers without expending any extra effort in creating a new model. Overgeneralizing, however, can lead to incorrect predictions if the test talker is unlikely to produce the same phonetic shift. Taken together, the ideal adapter framework predicts that listeners somehow formulate expectations about the relationship between exposure talkers and novel talkers, and that these beliefs then modulate cross-talker generalization.</p>
<p>Although it is uncontroversial that cross-talker generalization is constrained by the relationship between the properties of the exposure and test phases, the specific constraints are still under debate (<xref ref-type="bibr" rid="B9">Baese-Berk et al., 2020</xref>). The current study manipulates one type of well-studied constraint (the number of exposure talkers), but re-examines prior work from a sociolinguistic perspective.</p>
</sec>
<sec>
<title>1.2 Taking a sociolinguistic perspective: How does the number of exposure talkers affect cross-talker generalization?</title>
<p>Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>) proposed that multiple exposure talkers are necessary for cross-talker generalization in perceptual adaptation to L2-accents. Relative to a control condition (exposure to L1-English speakers), transcription accuracy in noise for a novel Mandarin-accented English speaker was only facilitated for listeners with recent exposure to <italic>multiple</italic> Mandarin-accented talkers, not for participants who had previously heard a <italic>single</italic> Mandarin-accented talker. Kleinschmidt and Jaeger (<xref ref-type="bibr" rid="B42">2015</xref>) attributed this effect to the distinct mental models that listeners develop following single-talker and multi-talker exposure. On the one hand, participants who are exposed to just one talker might assume that the particularities of the accent are idiosyncrasies, thereby resulting in a talker-specific model that does not generalize to novel talkers. However, listeners who are exposed to multiple talkers with the same L2-accent can more readily recognize the covariation in the speech signal between category membership (Mandarin-accented English speaker) and acoustic variation (e.g., devoicing of word-final stop consonants). This recognition ostensibly leads to the development of a talker-general mental model for Mandarin-accented English, which then gives rise to successful generalization (i.e., enhanced comprehension of a novel talker with the same, Mandarin-accented English accent).</p>
<p>In contrast to Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>), however, more recent work has found no effect of the number of exposure talkers on generalization (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). Participants in Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) completed an exposure phase (auditory lexical decision task) followed by a test phase (cross-modal priming task). Everyone in the experiment was tested on the same (novel) Mandarin-accented English speaker. In the critical conditions, listeners were either exposed to multiple Mandarin-accented English speakers, a single Mandarin-accented English speaker that was acoustically similar to the novel test talker, or to a single Mandarin-accented English speaker that was acoustically different from the test talker. Two major findings emerged: (i) both the multi-talker exposure and the &#8216;single-talker, acoustically similar&#8217; conditions resulted in cross-talker generalization, with no meaningful difference observed between the two conditions; (ii) generalization was blocked in the &#8216;single-talker, acoustically different&#8217; condition. Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) concluded that acoustic similarity between exposure and test talkers mediates generalization, not simply the number of exposure talkers. Multi-talker exposure only leads to generalization when one of the exposure talkers happens to be acoustically similar to the novel speaker.</p>
<p>Providing further evidence against the primacy of the number of exposure talkers, Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) conducted a replication study of Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>) and did not find a meaningful difference in transcription accuracy between the critical single- and multi-talker exposure conditions. Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) attribute this lack of replication to &#8220;the removal of the design confound [in Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>)] coupled with increased statistical power&#8221; (p. e37), where the &#8220;design confound&#8221; refers to &#8220;the single- and multitalker conditions employ[ing] different L2 exposure talkers&#8221; (p. e36). In other words, the lack of generalization in the single-talker exposure condition of Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>) may have merely occurred because the authors coincidentally selected exposure talkers that were all acoustically different from the test talker (and conversely, the generalization in the multi-talker exposure condition might have been blocked if none of the exposure talkers were acoustically similar to the test talker).</p>
<p>While not downplaying methodological considerations, there could additionally be a social explanation for why the number of exposure talkers has not affected generalization in recent work. Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) both examine Mandarin-accented English, and in general, the proportion of L2-accented English speakers is rising in the United States (<xref ref-type="bibr" rid="B35">Graddol, 2003</xref>; <xref ref-type="bibr" rid="B66">ShareAmerica, 2023</xref>). Although Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) recruited &#8220;monolingual English speakers with&#8230;no or minimal prior experience with Mandarin-accented English or the Mandarin language&#8221; (p. 33) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) &#8220;excluded participants from analysis who reported a high degree of familiarity with Chinese or Chinese-accented English&#8221; (p. e27), many listeners likely had some experience with other L2-English accents.</p>
<p>L2-English accents often share acoustic properties, such as a slower speaking rate (<xref ref-type="bibr" rid="B11">Baese-Berk &amp; Morrill, 2015</xref>) and reduced usage of spectral cues when producing tense/lax vowel contrasts (<xref ref-type="bibr" rid="B32">Feng &amp; Wang, 2024</xref>; <xref ref-type="bibr" rid="B67">Sidaras et al., 2009</xref>). Therefore, despite limited self-reported experience with Mandarin-accented English, listeners in Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) could have initially engaged in accent-independent adaptation (<xref ref-type="bibr" rid="B10">Baese-Berk et al., 2013</xref>), having recognized certain features in the exposure phase (e.g., devoiced word-final stop consonants are features of both Mandarin-accented and Dutch-accented English; <xref ref-type="bibr" rid="B31">Eisner et al., 2013</xref>; <xref ref-type="bibr" rid="B87">Xie &amp; Fowler, 2013</xref>). After this head start, participants may have attuned more to the specific acoustic properties of the exposure talker, with generalization only occurring when the exposure and test talkers were sufficiently similar acoustically.</p>
<p>Given their likely prior experience with L2-English accents, listeners in Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) presumably did not consider the acoustic properties in exposure to be entirely idiosyncratic, which could make the number of exposure talkers irrelevant. The distinction between single- and multi-talker exposure conditions might only be important when listeners are truly unfamiliar with a variant (e.g., the /p/ to [b] shift discussed in Section 1, where a novel L1-English speaker produces words like &#8216;poster&#8217; as <italic>boster</italic>). A novel phonetic shift might be initially thought of as idiosyncratic if it is heard in only one talker. To prove that a variant is socially-mediated and generalizable to novel talkers, it could be necessary to hear the shift from multiple talkers within the same social group.</p>
<p>Thus, it is still unclear whether multiple exposure talkers are <italic>necessary</italic> for cross-talker generalization. L2-accented speech may not be the most appropriate test case for addressing this question, given the ubiquity of L2-accented speakers. An ideal test case could be a variant that is likely to be treated as an idiosyncrasy when produced by one speaker and as a socially-mediated feature when produced by more than one speaker from an identifiable social group.</p>
</sec>
<sec>
<title>1.3 Motivation of covarying gender and stop consonant production</title>
<p>The current study examines the covariation of (binary) gender and stop consonant production in L1-accented English, with females producing /p/ as [b] and males producing prototypical /p/.<xref ref-type="fn" rid="n1">1</xref> There are several reasons for employing this specific test case. For one, gender is a highly salient and recognizable socio-indexical cue (<xref ref-type="bibr" rid="B14">Barreda &amp; Predeck, 2024</xref>; <xref ref-type="bibr" rid="B29">Eckert, 1989</xref>). Male and female voices are often acoustically distinct (<xref ref-type="bibr" rid="B13">Barreda, 2021</xref>), and listeners can use the (on average) lower f0 and formant frequencies of males to identify speaker gender with high confidence and near-ceiling accuracy (<xref ref-type="bibr" rid="B36">Hillenbrand &amp; Clark, 2009</xref>). Listeners also know, consciously or not, that gender can be relevant for speech perception in certain cases (e.g., an ambiguous sound between /s/ and /&#643;/ is more likely to be categorized as /s/ if the talker is male; <xref ref-type="bibr" rid="B90">Yu, 2010</xref>).</p>
<p>Critically, however, compared to other phonological contrasts, such as fricatives and vowels, comparatively little covariation currently exists between gender and American English stop consonant production (<xref ref-type="bibr" rid="B41">Kleinschmidt, 2019</xref>; <xref ref-type="bibr" rid="B52">Morris et al., 2008</xref>). Among studies that do find gender effects (e.g., <xref ref-type="bibr" rid="B63">Robb et al., 2005</xref>; <xref ref-type="bibr" rid="B72">Swartz, 1992</xref>), the result is almost always the exact opposite of the covariation we have constructed for the current study &#8211; male speakers usually produce voiceless consonants with a shorter voice onset time than female speakers (i.e., it is <italic>male</italic> speakers who produce /p/ as a more acoustically [b]-like sound). A further distinctive feature of the current test case is the use of a &#8220;bad map&#8221;, or a sound that &#8220;fall[s] unambiguously into an unintended category&#8221; (<xref ref-type="bibr" rid="B71">Sumner, 2011, p. 132</xref>). Shifts from /p/ to [b] can be heard in English, but usually in L2-accented English (<xref ref-type="bibr" rid="B33">Flege &amp; Eefting, 1987</xref>; <xref ref-type="bibr" rid="B68">Sol&#233;, 2018</xref>), not L1-accented English.</p>
<p>In summary, the current study correlates a highly salient variant (a &#8220;bad map&#8221; /p/ to [b] shift) with a highly salient social cue (speaker gender) in a way that listeners have likely never experienced in L1-English. Unlike L2-accented English, which has been the focus of many prior experiments on adaptation (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>), there is a greater chance that the phonetic shift in this study will be truly regarded as an idiosyncrasy upon initial exposure. The current study can therefore more effectively investigate how the number of exposure talkers affects cross-talker generalization, while additionally lending insight into the <italic>emergence</italic> of sociolinguistic perception.</p>
</sec>
<sec>
<title>1.4 The current study and hypotheses</title>
<p>Two experiments were conducted to examine how the number of exposure talkers and the socio-indexical structure of exposure influence cross-talker generalization. All participants completed an exposure phase (identification of phrase-final keywords; e.g., &#8220;The wall needed a new coat of <italic>paint</italic>&#8221;) followed by a test phase (categorization of stimuli along a <italic>buy-pie</italic> continuum for both a novel female and novel male speaker). Within each experiment, there were two types of exposure conditions that varied in socio-indexical structure: (i) a critical condition with a gender-mediated phonetic shift (/p/ produced as [b]; e.g., &#8216;paint&#8217; as <italic>baint</italic>); (ii) a control condition with no shifted keywords. Across experiments, the critical conditions differed in the number of exposure talkers &#8211; whereas Experiment 1 only presented one female and one male talker, Experiment 2 exposed listeners to two female and two male talkers. The key question is whether the experimental manipulations induce cross-talker generalization. When comparing the control and critical conditions, is there a difference in <italic>buy-pie</italic> categorization for the novel test talkers?<xref ref-type="fn" rid="n2">2</xref></p>
<p>The present study adjudicates between three possible accounts of perceptual adaptation: (i) talker normalization; (ii) sufficient similarity; (iii) numerosity. The predictions of each account are summarized in <xref ref-type="table" rid="T1">Table 1</xref> and further explained in 1.4.1&#8211;1.4.3.</p>
<table-wrap id="T1">
<caption>
<p><bold>Table 1:</bold> Summary of predictions for each account. Each prediction (&#8216;No difference&#8217; or &#8216;More <italic>pie</italic> response&#8217;) compares the novel female talker to the novel male talker in the test phase of the critical condition. (In the control condition, neither exposure talker produces the /p/ to [b] phonetic shift, so <italic>buy-pie</italic> categorization is expected to be the same for both test talkers.) The cells immediately below the experiment titles refer to the structure of exposure in the critical condition, where F and M refer to &#8216;female&#8217; and &#8216;male&#8217;, respectively.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top" rowspan="2"><bold>Theoretical Account</bold></td>
<td align="left" valign="top"><bold>Experiment 1</bold></td>
<td align="left" valign="top"><bold>Experiment 2</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Shifted: 1 F</bold><break/><bold>Unshifted: 1 M</bold></td>
<td align="left" valign="top"><bold>Shifted: 2 F</bold><break/><bold>Unshifted: 2 M</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Talker Normalization</bold></td>
<td align="left" valign="top">No difference</td>
<td align="left" valign="top">No difference</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Sufficient Similarity</bold></td>
<td align="left" valign="top">More <italic>pie</italic> response</td>
<td align="left" valign="top">More <italic>pie</italic> response</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Numerosity</bold></td>
<td align="left" valign="top">No difference</td>
<td align="left" valign="top">More <italic>pie</italic> response</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>1.4.1 Talker normalization</title>
<p>A <italic>talker normalization account</italic> proposes that listeners will &#8216;normalize&#8217;, or disregard, speaker gender differences in the exposure phase, resulting in similar responses for the novel female and male test talkers across all conditions (<xref ref-type="bibr" rid="B39">Joos, 1948</xref>; <xref ref-type="bibr" rid="B48">Liberman &amp; Mattingly, 1985</xref>).<xref ref-type="fn" rid="n3">3</xref> Given that the /p/ to [b] phonetic shift involves stop consonants, and stop consonant production does not strongly covary with gender in American English (<xref ref-type="bibr" rid="B41">Kleinschmidt, 2019</xref>), participants might ignore gender altogether (cf. Experiment 1 of <xref ref-type="bibr" rid="B45">Kraljic &amp; Samuel, 2007</xref>). Besides gender, all of the critical exposure conditions are the same regarding the phonetic shift, since all conditions contain an equal number of shifted and prototypical tokens (cf. Experiment 2 of <xref ref-type="bibr" rid="B76">Tzeng et al., 2021</xref>). If talker gender is disregarded and, thus, the experiences hearing shifted and prototypical tokens are all weighted equally, then test phase categorization should be the same across all experiments. In other words, there should be no meaningful differences in the proportion of <italic>pie</italic> responses between the novel female and male talkers in the critical conditions.</p>
</sec>
<sec>
<title>1.4.2 Sufficient similarity</title>
<p>Unlike a talker normalization account, sufficient similarity and numerosity accounts both assume that speaker gender will impact adaptation. However, the latter two accounts diverge in their predictions about how the number of exposure talkers might affect the results. A <italic>sufficient similarity account</italic> claims that generalization should be triggered for any test talker who is sufficiently similar to the phonetically shifted exposure talkers, regardless of how many exposure talkers are heard (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). In the critical conditions, the /p/ to [b] phonetic shift is always associated with female speech in the exposure phase &#8211; thus, listeners in both experiments are expected to generalize the shift to the novel female test talker, given similarity in social category (gender) and in acoustics (e.g., higher f0 and formant frequencies; <xref ref-type="bibr" rid="B36">Hillenbrand &amp; Clark, 2009</xref>). Generalization would specifically be realized as a greater proportion of <italic>pie</italic> responses for the novel female test talker than for the novel male test talker (i.e., if a speaker is expected to produce a /p/ to [b] shift, then more acoustically [b]-like tokens should be categorized as /p/ along a <italic>buy-pie</italic> continuum; see <xref ref-type="bibr" rid="B71">Sumner, 2011</xref>).</p>
</sec>
<sec>
<title>1.4.3 Numerosity</title>
<p>A <italic>numerosity account</italic> predicts that the number of exposure talkers should have an impact on the results, with generalization only occurring when listeners hear multiple talkers per gender (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>). Given their lack of experience with the particular /p/ to [b] shift under investigation (see 1.3 for details), participants might treat the shift as a talker-specific idiosyncrasy when produced by only one talker, thus blocking generalization in Experiment 1 and leading to no differences in <italic>pie</italic> response between the novel test talkers. Listeners should only generalize in the critical condition of Experiment 2, since there is evidence that the shift is produced by a group of similar talkers (two female speakers) and is not a talker-specific idiosyncrasy. Generalization should specifically occur towards the novel female test talker, who aligns more closely with the phonetically shifted exposure speakers in terms of social and acoustic similarity.</p>
</sec>
</sec>
</sec>
<sec>
<title>2. Experiment 1: Exposure to a single talker per gender</title>
<p>Experiment 1 first exposed listeners to one male and one female speaker, and then presented a categorization task at test for a novel male and a novel female speaker. Listeners were either assigned to a critical condition (where the exposure female produced /p/ as [b] and the male remained unshifted) or to a control condition (where both exposure talkers were unshifted). Whereas a sufficient similarity account predicts a greater proportion of <italic>pie</italic> responses for the novel female talker than the novel male talker in the test phase, both talker normalization and numerosity accounts predict no difference in <italic>pie</italic> response between the novel talkers (see 1.4 for a more thorough explanation).</p>
<sec>
<title>2.1 Methods</title>
<sec>
<title>2.1.1 Stimuli</title>
<p>48 semantically predictable sentences were constructed for the exposure phase (see Appendix 1 for the full list). There were 32 critical stimuli and 16 filler stimuli (i.e., two-thirds critical and one-third filler, following the ratio used in Lai and Tamminga (<xref ref-type="bibr" rid="B47">2024</xref>)), with all sentences containing a monosyllabic target word in phrase-final position. The filler sentences did not have any words with /b/ or /p/ (e.g., &#8220;She came out to the lake for a <italic>swim</italic>&#8221;). The critical sentences were also designed to avoid words with /b/ or /p/, except for the target words.<xref ref-type="fn" rid="n4">4</xref> The critical target words contained one instance of /p/ (all word-initial), did not contain /b/, and were not part of a /b/-/p/ minimal pair (e.g., <italic>paint</italic>/*<italic>baint</italic>; &#8220;The wall needed a new coat of <italic>paint</italic>&#8221;).</p>
<p>All of the exposure sentences were automatically generated using neural text-to-speech (TTS) synthesis (<xref ref-type="bibr" rid="B2">Aoki et al., 2022</xref>; <xref ref-type="bibr" rid="B4">Aoki &amp; Zellou, 2023b</xref>; <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>), a highly naturalistic generation method (as opposed to concatenative TTS; <xref ref-type="bibr" rid="B26">Cohn &amp; Zellou, 2020</xref>).<xref ref-type="fn" rid="n5">5</xref> TTS voices were used in the current study because relative to recording speech in a laboratory setting, generating synthetic speech offers a high degree of control over the acoustic properties of each stimulus (e.g., every time &#8216;paint&#8217; is generated with the Joanna voice, the voice onset time of the initial /p/ is always around 53 seconds). Extensive intra-speaker acoustic variation is well-documented using traditional recording methods (<xref ref-type="bibr" rid="B1">Aoki &amp; Zellou, 2023c</xref>; <xref ref-type="bibr" rid="B79">Vonessen et al., 2024</xref>), and given that stimulus acoustics can affect perceptual adaptation (<xref ref-type="bibr" rid="B47">Lai &amp; Tamminga, 2024</xref>; <xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>), using controlled synthetic speech should make the results more reliable and replicable using the same methodology.</p>
<p>The test stimuli consisted of a 5-step <italic>buy-pie</italic> continuum for each of the six voices (see Appendix 2 for details about how the test stimuli were developed and normed). The exposure sentences were generated in three female (Joanna, Ruth, Salli) and three male (Joey, Matthew, Stephen) US-English voices from Amazon Polly. For each of the six voices, 16 filler sentences and two versions of the critical sentences (shifted and unshifted) were typed into the Amazon Web Services console (i.e., 6 voices * 16 filler + 6 voices * 32 critical * 2 versions = 96 + 384 = 480 stimuli total). All of the sentences were entered directly into the console and then downloaded. The only difference between the two versions of the critical sentences was that in the shifted version, the /p/-initial target words were changed orthographically to begin with <italic>b</italic> (e.g., &#8216;paint&#8217; was typed as <italic>baint</italic>). All downloaded sentences were converted from .mp3 to .wav in the command line using FFmpeg (<xref ref-type="bibr" rid="B74">Tomar, 2006</xref>) and set to a presentation level of 60 dB SPL in Praat (<xref ref-type="bibr" rid="B16">Boersma &amp; Weenink, 2021</xref>).</p>
<p>A brief acoustic analysis was conducted on both the exposure and test stimuli through Bonferroni-corrected paired t-tests. The average voice onset time (VOT) of word-initial stop consonants is displayed in <xref ref-type="table" rid="T2">Table 2</xref> for each speaker. Consistent with prior studies (<xref ref-type="bibr" rid="B63">Robb et al., 2005</xref>; <xref ref-type="bibr" rid="B72">Swartz, 1992</xref>), VOT of the unshifted /p/-initial stimuli was higher for each female speaker compared to each male speaker (<italic>p</italic> &lt; 0.001 for nearly all comparisons; the only exception was the comparison between Joanna and Matthew, where <italic>p</italic> = 0.01). VOT of the unshifted /p/-initial stimuli was higher than that of the shifted /p/ to [b] for all female speakers in the exposure phase (all <italic>p</italic> &lt; 0.001), in alignment with past work (<xref ref-type="bibr" rid="B23">Chodroff &amp; Wilson, 2017</xref>). No statistically significant VOT differences were observed for the test phase continua between any of the speakers after using the Bonferroni correction (all <italic>p</italic> &gt; 0.006).</p>
<table-wrap id="T2">
<caption>
<p><bold>Table 2:</bold> Average voice onset time (ms) of each speaker for word-initial stop consonants in unshifted exposure stimuli (e.g., &#8216;paint&#8217;), shifted /p/ to [b] exposure stimuli (e.g., &#8216;paint&#8217; as <italic>baint</italic>), and test phase stimuli (a 5-step <italic>buy-pie</italic> continuum). Dashes indicate speakers who did not produce shifted stimuli in the current study. The second, third, and fourth columns correspond to the female speakers (Joanna, Ruth, Salli), while the last three columns correspond to the male speakers (Joey, Matthew, Stephen).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joey</bold></td>
<td align="left" valign="top"><bold>Matthew</bold></td>
<td align="left" valign="top"><bold>Stephen</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Exposure (Unshifted /p/)</bold></td>
<td align="left" valign="top">59.66</td>
<td align="left" valign="top">67.60</td>
<td align="left" valign="top">68.05</td>
<td align="left" valign="top">53.45</td>
<td align="left" valign="top">54.29</td>
<td align="left" valign="top">44.72</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Exposure (/p/ to [b])</bold></td>
<td align="left" valign="top">21.31</td>
<td align="left" valign="top">20.68</td>
<td align="left" valign="top">19.93</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Test</bold></td>
<td align="left" valign="top">37.71</td>
<td align="left" valign="top">39.78</td>
<td align="left" valign="top">36.75</td>
<td align="left" valign="top">36.11</td>
<td align="left" valign="top">39.80</td>
<td align="left" valign="top">37.45</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A custom-made Praat script measured the mean f0 over the entire duration of all exposure and test phase productions (<xref ref-type="bibr" rid="B25">Cohn et al., 2021</xref>). The average f0 for each speaker is shown in <xref ref-type="table" rid="T3">Table 3</xref> and confirms that all three female speakers have a higher f0 than each of the male speakers in both the exposure and test phases (all <italic>p</italic> &lt; 0.001).</p>
<table-wrap id="T3">
<caption>
<p><bold>Table 3:</bold> Average f0 (Hz) of each speaker for both the exposure and test phase stimuli. The second, third, and fourth columns correspond to the female speakers (Joanna, Ruth, Salli), while the last three columns correspond to the male speakers (Joey, Matthew, Stephen).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joey</bold></td>
<td align="left" valign="top"><bold>Matthew</bold></td>
<td align="left" valign="top"><bold>Stephen</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Exposure</bold></td>
<td align="left" valign="top">184.36</td>
<td align="left" valign="top">195.75</td>
<td align="left" valign="top">191.24</td>
<td align="left" valign="top">102.43</td>
<td align="left" valign="top">103.49</td>
<td align="left" valign="top">113.89</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Test</bold></td>
<td align="left" valign="top">172.93</td>
<td align="left" valign="top">217.80</td>
<td align="left" valign="top">220.27</td>
<td align="left" valign="top">102.12</td>
<td align="left" valign="top">107.84</td>
<td align="left" valign="top">95.17</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>2.1.2 Participants</title>
<p>All participants were recruited from Prolific, an online crowdsourcing platform. The Prolific demographic filters were used to narrow down the subject pool to individuals who were living in the United States, between 18 and 35 years old (inclusive), and whose reported first language was English. 419 participants from this restricted pool were recruited and compensated with $9 per hour ($1.80 for a 12-minute study). All subjects provided informed consent at the outset of the study, which received approval from the Institutional Review Board at the University of California, Davis.</p>
<p>Anyone who self-reported a hearing difficulty (n = 8), whose strongest self-reported language was not solely English (n = 10), or whose exposure phase accuracy was more than three standard deviations below the mean (n = 3; cutoff = 44/48) was removed from the statistical analysis. Certain participants were also taken out of the data set due to atypical responses in the test phase. As discussed in 2.1.3, the test phase consisted of a categorization task, where listeners heard two blocks of a 5-step <italic>buy-pie</italic> continuum. For each participant, the responses for both blocks were combined, and a &#8216;difference score&#8217; was calculated by subtracting the percentage of <italic>pie</italic> responses at Step 5 (expected to be near 100%) by the percentage of <italic>pie</italic> responses at Step 1 (expected to be near 0%). Anyone whose difference score was more than three standard deviations below the mean was excluded (n = 15; cutoff = 72.71%).</p>
<p>The final analysis included responses from 383 subjects (231 women, 144 men, 8 non-binary; mean age = 27.68 years, sd = 4.58; self-reported ethnicity: Asian = 34, Black = 59, Latino = 11, Mixed = 55, Native Hawaiian or Pacific Islander = 1, White = 223). Given that the current experiment had two between-subjects variables with two levels each (i.e., four cells total; see 2.1.3 and 2.1.4 below for more details), there were approximately 96 listeners per cell, which doubles the sample size of a comparable perceptual adaptation study with 80% power (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>).</p>
</sec>
<sec>
<title>2.1.3 Procedure</title>
<p>The study was conducted through a self-paced, online Qualtrics survey. After providing informed consent, participants were asked to wear headphones and to take the study in a quiet room with no background noise. These initial instructions were followed by a sound calibration procedure (for details, refer to the Procedure portion of 2.1 in <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>). Subjects then completed the main task (an exposure phase followed by a test phase) and ended the study by filling out a demographic questionnaire. Prior to each phase, participants were explicitly told that they would hear &#8220;a male speaker and a female speaker&#8221;.</p>
<p>Both the exposure and test phases generally adhered to the procedure of Zellou et al. (<xref ref-type="bibr" rid="B93">2023</xref>). During exposure, listeners heard a semantically predictable sentence on each trial (e.g., &#8220;The dog had a furry paw&#8221;) and were asked to identify the final keyword from one of two options: (i) the target (&#8216;paw&#8217;); (ii) a phonologically similar competitor (&#8216;pawns&#8217;). Both the target and competitor items for the critical stimuli were always real words with initial /p/, to promote the mapping from [b] to /p/ in the Female Shifted condition (see Appendix 1 for the list of competitor items).</p>
<p>There were 48 trials in the exposure phase (32 critical, 16 filler), which were presented in a pseudo-randomized order. The exposure stimuli were evenly divided among one male speaker and one female speaker (16 critical and eight filler each), with sentence content and talker being evenly counterbalanced. Participants were randomly assigned to either a Female Shifted or No Shift condition, which varied the production of critical items across exposure talkers (the production of filler items remained the same across conditions). In the Female Shifted condition, the female speaker produced /p/ in the critical items as [b] (e.g., &#8220;The novel is now in <italic>brint</italic>&#8221;), while the male speaker produced a canonical /p/ for all critical items. Both exposure speakers in the No Shift condition produced a canonical /p/ across all critical items.</p>
<p>Note that presenting a keyword identification task in the exposure phase, as opposed to the more commonly used lexical decision task (<xref ref-type="bibr" rid="B45">Kraljic &amp; Samuel, 2007</xref>; <xref ref-type="bibr" rid="B73">Tamminga et al., 2020</xref>), is an important methodological choice. Lexical decision is appropriate when the critical items contain an <italic>ambiguous</italic> sound (e.g., &#8216;croco?ile&#8217;, where ? is between /d/ and /t/) and are thus still interpreted as real words for the most part (<xref ref-type="bibr" rid="B45">Kraljic &amp; Samuel, 2007</xref>). However, the critical items in the current study have entirely <italic>remapped</italic> sounds (e.g., &#8216;print&#8217; as <italic>brint</italic>), meaning that they would ordinarily be considered nonwords when presented in isolation (as in a typical lexical decision task). Given that perceptual adaptation is absent or reduced when critical items are interpreted as non-words (<xref ref-type="bibr" rid="B7">Babel et al., 2019</xref>; <xref ref-type="bibr" rid="B57">Norris et al., 2003</xref>), an alternative task is needed to bias listeners into perceiving bad map stimuli as real words (<xref ref-type="bibr" rid="B22">Charoy &amp; Samuel, 2023</xref>; <xref ref-type="bibr" rid="B71">Sumner, 2011</xref>). The identification task in the current study is effective because listeners can leverage prior semantic context to deduce the identity of any shifted word (e.g., given the sentence &#8220;The novel is now in <italic>brint</italic>&#8221;, listeners can use the prior word &#8216;novel&#8217; to deduce that <italic>brint</italic> is intended as &#8216;print&#8217;; <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>). Moreover, given the focus of the current study on a gender-mediated phonetic shift, an added benefit of using sentence-length exposure stimuli (as opposed to isolated words) is that participants can potentially hear even more acoustic cues to speaker gender (e.g., through greater exposure to suprasegmental information; <xref ref-type="bibr" rid="B37">Holliday, 2021</xref>).</p>
<p>After the exposure phase, participants completed a test phase. Listeners heard a single stimulus on each trial and were asked whether they heard <italic>buy</italic> or <italic>pie</italic>. The test phase consisted of two blocks that each presented 45 trials in a pseudo-randomized order. The test stimuli came from a 5-step <italic>buy-pie</italic> continuum, with each step being heard nine times per block. One block presented a novel female speaker, while the other block presented a novel male speaker. Test block order was evenly counterbalanced, such that participants either heard the novel female speaker first (Female &#8594; Male) or the novel male speaker first (Male &#8594; Female). The three female and three male speakers referenced in 2.1.1 were always evenly selected across listeners as either an exposure or test talker.</p>
</sec>
<sec>
<title>2.1.4 Analysis</title>
<p>Test phase responses were coded binomially as <italic>pie</italic> (= 1) or <italic>buy</italic> (= 0) and analyzed with Bayesian mixed-effects logistic regression in R (<xref ref-type="bibr" rid="B61">R Core Team, 2021</xref>) using the <italic>brms</italic> package (<xref ref-type="bibr" rid="B18">B&#252;rkner, 2017</xref>) and <italic>Stan</italic> (<xref ref-type="bibr" rid="B69">Stan Development Team, 2023</xref>). No statistical analysis was conducted on the exposure phase in any experiment, given that accuracy was essentially at ceiling (99.72% across all conditions).</p>
<p>The model contained main effects of Step (within-subjects; scaled and centered), Speaker Gender (within-subjects; Female, Male), Block Order (between-subjects; Female &#8594; Male, Male &#8594; Female), and Exposure Condition (between-subjects; Female Shifted, No Shift). Step was treated as a numeric variable, while the other main effects were sum-coded, categorical variables. All possible interactions were included. The random effects structure consisted of by-speaker and by-listener random intercepts, as well as by-listener random slopes for Step, Speaker Gender, and their interaction. For clarity, the model structure in R syntax is shown in Equation (1).</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(1)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Response &#126; Step * Speaker Gender * Block Order * Exposure Condition + (1 + Step * Speaker Gender &#124; Listener) + (1 &#124; Speaker)</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>Effects are interpreted as meaningful if 95% credible intervals do not contain zero (<xref ref-type="bibr" rid="B78">Vasishth et al., 2018</xref>). Following recent work on Bayesian mixed-effects logistic regression (<xref ref-type="bibr" rid="B5">Aoki &amp; Zellou, 2024</xref>; <xref ref-type="bibr" rid="B15">Barreda &amp; Silbert, 2023</xref>), the prior distributions in R syntax for the intercept, non-intercept fixed effects (b), and the standard deviation of the random intercepts (sd) were all set to: student_t(3, 0, 3).</p>
</sec>
</sec>
<sec>
<title>2.2 Results</title>
<p><xref ref-type="table" rid="T4">Table 4</xref> and <xref ref-type="fig" rid="F1">Figure 1</xref> present the aggregated test phase results and model summary statistics, respectively. There was a consistent effect of Step, which demonstrates that as continuum step increased, listeners selected <italic>pie</italic> more often. There were no other meaningful main effects or interactions. Notably, no interactions between Speaker Gender and Exposure Condition surfaced, implying that the exposure conditions (Female Shifted and No Shift) did not differ in <italic>pie</italic> responses by speaker gender.</p>
<table-wrap id="T4">
<caption>
<p><bold>Table 4:</bold> Summary statistics for the statistical model in Experiment 1. Meaningful effects are in bold.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Estimate</bold></td>
<td align="left" valign="top"><bold>Est. Error</bold></td>
<td align="left" valign="top"><bold>l-95% CI</bold></td>
<td align="left" valign="top"><bold>u-95% CI</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Intercept</bold></td>
<td align="left" valign="top"><bold>1.49</bold></td>
<td align="left" valign="top"><bold>0.72</bold></td>
<td align="left" valign="top"><bold>0.04</bold></td>
<td align="left" valign="top"><bold>2.94</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Step</bold></td>
<td align="left" valign="top"><bold>4.92</bold></td>
<td align="left" valign="top"><bold>0.12</bold></td>
<td align="left" valign="top"><bold>4.67</bold></td>
<td align="left" valign="top"><bold>5.16</bold></td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender (Female)</td>
<td align="left" valign="top">0.20</td>
<td align="left" valign="top">0.73</td>
<td align="left" valign="top">&#8211;1.34</td>
<td align="left" valign="top">1.67</td>
</tr>
<tr>
<td align="left" valign="top">Block Order (Female &#8594; Male)</td>
<td align="left" valign="top">&#8211;0.14</td>
<td align="left" valign="top">0.12</td>
<td align="left" valign="top">-0.38</td>
<td align="left" valign="top">0.08</td>
</tr>
<tr>
<td align="left" valign="top">Exposure Condition (Female Shifted)</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">0.12</td>
<td align="left" valign="top">&#8211;0.14</td>
<td align="left" valign="top">0.31</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender</td>
<td align="left" valign="top">&#8211;0.01</td>
<td align="left" valign="top">0.09</td>
<td align="left" valign="top">&#8211;0.19</td>
<td align="left" valign="top">0.16</td>
</tr>
<tr>
<td align="left" valign="top">Step : Block Order</td>
<td align="left" valign="top">&#8211;0.14</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">&#8211;0.35</td>
<td align="left" valign="top">0.07</td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender : Block Order</td>
<td align="left" valign="top">0.00</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.14</td>
<td align="left" valign="top">0.14</td>
</tr>
<tr>
<td align="left" valign="top">Step : Exposure Condition</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">&#8211;0.16</td>
<td align="left" valign="top">0.26</td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender : Exposure Condition</td>
<td align="left" valign="top">&#8211;0.02</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.16</td>
<td align="left" valign="top">0.13</td>
</tr>
<tr>
<td align="left" valign="top">Block Order : Exposure Condition</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">0.12</td>
<td align="left" valign="top">&#8211;0.17</td>
<td align="left" valign="top">0.28</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Block Order</td>
<td align="left" valign="top">&#8211;0.09</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.22</td>
<td align="left" valign="top">0.04</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Exposure Condition</td>
<td align="left" valign="top">0.02</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.11</td>
<td align="left" valign="top">0.15</td>
</tr>
<tr>
<td align="left" valign="top">Step : Block Order : Exposure Condition</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">&#8211;0.17</td>
<td align="left" valign="top">0.27</td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender : Block Order : Exposure Condition</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.07</td>
<td align="left" valign="top">0.21</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Block Order : Exposure Condition</td>
<td align="left" valign="top">0.04</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.09</td>
<td align="left" valign="top">0.16</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> Aggregated results by Step, Speaker Gender (female = red, male = blue), Exposure Condition (Female Shifted, No Shift), and Test Block Order (Female &#8594; Male, Male &#8594; Female) in Experiment 1.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-21217-g1.png"/>
</fig>
</sec>
<sec>
<title>2.3 Experiment 1: Interim discussion</title>
<p>Experiment 1 placed listeners in either a critical or control condition (labelled as the Female Shifted and No Shift conditions, respectively). Both conditions exposed participants to a single male and a single female talker, but differed in whether the female talker produced a /p/ to [b] phonetic shift (the critical condition) or remained unshifted (the control condition). The male exposure talker was unshifted for all listeners. In the test phases for both the critical and control conditions, no difference in <italic>buy-pie</italic> categorization was observed between the novel female and novel male talkers, reflecting a lack of cross-talker generalization.</p>
<p>The clearest finding from Experiment 1 is that the results go against a sufficient similarity account (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). Generalization did not occur from the phonetically shifted female exposure talker to the novel female test talker, even though both were similar in terms of acoustics (higher f0) and social category (same gender). The absence of generalization is especially notable given the wording of the instructions in the present study. Speaker gender was explicitly mentioned before both the exposure and test phases, which in theory could have biased listeners towards generalization based on shared social category membership.</p>
<p>The Experiment 1 results could be explained by one of two theoretical accounts. First, in accordance with a talker normalization account (<xref ref-type="bibr" rid="B39">Joos, 1948</xref>; <xref ref-type="bibr" rid="B48">Liberman &amp; Mattingly, 1985</xref>), perhaps listeners in the critical condition overlooked social cues in the speech signal, disregarding the systematic gender covariation in the exposure phase (i.e., that the /p/ to [b] phonetic shift was only produced by the female exposure talker, not the male exposure talker). If speaker gender is ignored, then the exposure phase would be perceived as containing conflicting information, where /p/ is produced as a canonical [p] half of the time and as [b] for the other half of trials. This type of scenario has resulted in a null effect in recent work (e.g., <xref ref-type="bibr" rid="B76">Tzeng et al., 2021</xref>), which would be realized in the current study as no difference in test phase categorization across talkers and conditions.</p>
<p>Alternatively, the absence of generalization in Experiment 1 could be explained by a numerosity account (e.g., <xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>). The /p/ to [b] shift was designed to be unattested in L1-English (see 1.3 for details), and since the production of /p/ as [b] was only heard in one exposure talker, listeners may have considered the shift to be a talker-specific trait that does not generalize to a novel talker. According to a numerosity account, generalization should only occur if listeners hear the /p/ to [b] shift in more than one talker, as it would confirm that the shift is not merely an idiosyncrasy.</p>
<p>Although the findings of Experiment 1 challenge a sufficient similarity account and match the predictions of talker normalization and numerosity accounts, (see <xref ref-type="table" rid="T1">Table 1</xref>), a null effect is not sufficient evidence to make a firm conclusion (<xref ref-type="bibr" rid="B77">Vasishth &amp; Gelman, 2021</xref>). This issue is addressed in Experiment 2, which is designed to tease apart the talker normalization and numerosity accounts.</p>
</sec>
</sec>
<sec>
<title>3. Experiment 2: Exposure to multiple talkers per gender</title>
<p>Experiment 2 has the exact same design as the previous experiment, except that participants were exposed to two female and two male talkers (instead of just one female and one male talker). Listening to <italic>multiple</italic> female talkers producing /p/ as [b] constitutes greater evidence that rather than being a talker-specific trait, the phonetic shift is generally associated with female talkers (i.e., there is covariation between a linguistic feature and a broader category of speakers). If a numerosity account is supported, robust cross-talker generalization should be observed, with listeners providing more <italic>pie</italic> responses for the female test talker than for the male talker in the critical exposure condition. A talker normalization account, meanwhile, predicts that listeners should ignore speaker gender, leading to no difference in <italic>buy-pie</italic> categorization between the novel test talkers.</p>
<sec>
<title>3.1 Methods</title>
<sec>
<title>3.1.1 Stimuli</title>
<p>The stimuli were exactly the same as in Experiment 1.</p>
</sec>
<sec>
<title>3.1.2 Participants</title>
<p>420 participants, none of whom completed Experiment 1, were recruited on Prolific and provided informed consent. The amount of compensation and the demographic filters were the same as in Experiment 1. Participants were excluded from the analysis if they either self-reported a hearing difficulty (n = 8), self-reported that their strongest language was not solely English (n = 13), self-reported being older than 35 years old (i.e., a mismatch from their official Prolific profile; n = 1), had an exposure phase accuracy more than three standard deviations below the mean (n = 3, cutoff = 44/48), or had a test phase difference score more than three standard deviations below the mean (n = 9, cutoff = 70.42%; for details on this measure, see 2.1.2). After exclusions, the Experiment 2 data set consisted of responses from 386 participants (191 women, 184 men, 11 non-binary; mean age = 28.61 years, sd = 4.43; self-reported ethnicity: Asian = 59, Black = 58, Latino = 19, Mixed = 45, Native American or Alaska Native = 3, Native Hawaiian or Pacific Islander = 2, White = 200).</p>
</sec>
<sec>
<title>3.1.3 Procedure</title>
<p>The procedure largely mirrored that in Experiment 1. The only difference was that, instead of presenting one male and one female speaker in the exposure phase, two male and two female exposure speakers were presented in Experiment 2 (the number and gender of the exposure talkers were again explicitly mentioned in the instructions). All four exposure talkers in the No Shift control condition produced /p/-initial words with a canonical [p]. By contrast, both female exposure talkers in the Female Shifted condition produced /p/-initial words with a canonical [b], while both male exposure speakers were unshifted. A critical point is that the relative amount of exposure to /p/-shifted words remained the same across listeners in the critical conditions of Experiment 1 and Experiment 2 &#8211; the /p/-shifted stimuli in the latter experiment were evenly distributed across two female talkers, rather than just produced by one female talker.</p>
</sec>
<sec>
<title>3.1.4 Analysis</title>
<p>The statistical model was identical to that in Experiment 1.</p>
</sec>
</sec>
<sec>
<title>3.2 Results</title>
<p>Exposure phase accuracy was nearly at ceiling (99.70% across all conditions). The model summary statistics and aggregated test phase results for Experiment 2 are shown in <xref ref-type="table" rid="T5">Table 5</xref> and <xref ref-type="fig" rid="F2">Figure 2</xref>, respectively.</p>
<table-wrap id="T5">
<caption>
<p><bold>Table 5:</bold> Summary statistics for the statistical model in Experiment 2. Meaningful effects are in bold.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Estimate</bold></td>
<td align="left" valign="top"><bold>Est. Error</bold></td>
<td align="left" valign="top"><bold>l-95% CI</bold></td>
<td align="left" valign="top"><bold>u-95% CI</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Intercept</bold></td>
<td align="left" valign="top"><bold>1.29</bold></td>
<td align="left" valign="top"><bold>0.62</bold></td>
<td align="left" valign="top"><bold>0.04</bold></td>
<td align="left" valign="top"><bold>2.50</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Step</bold></td>
<td align="left" valign="top"><bold>4.56</bold></td>
<td align="left" valign="top"><bold>0.11</bold></td>
<td align="left" valign="top"><bold>4.35</bold></td>
<td align="left" valign="top"><bold>4.77</bold></td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender (Female)</td>
<td align="left" valign="top">0.17</td>
<td align="left" valign="top">0.62</td>
<td align="left" valign="top">&#8211;1.06</td>
<td align="left" valign="top">1.39</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Block Order (Female</bold> &#8594; <bold>Male)</bold></td>
<td align="left" valign="top"><bold>&#8211;0.26</bold></td>
<td align="left" valign="top"><bold>0.11</bold></td>
<td align="left" valign="top"><bold>&#8211;0.48</bold></td>
<td align="left" valign="top"><bold>&#8211;0.05</bold></td>
</tr>
<tr>
<td align="left" valign="top">Exposure Condition (Female Shifted)</td>
<td align="left" valign="top">&#8211;0.02</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">&#8211;0.25</td>
<td align="left" valign="top">0.19</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender</td>
<td align="left" valign="top">&#8211;0.13</td>
<td align="left" valign="top">0.09</td>
<td align="left" valign="top">&#8211;0.30</td>
<td align="left" valign="top">0.04</td>
</tr>
<tr>
<td align="left" valign="top">Step : Block Order</td>
<td align="left" valign="top">&#8211;0.01</td>
<td align="left" valign="top">0.10</td>
<td align="left" valign="top">&#8211;0.20</td>
<td align="left" valign="top">0.18</td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender : Block Order</td>
<td align="left" valign="top">0.13</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">&#8211;0.02</td>
<td align="left" valign="top">0.28</td>
</tr>
<tr>
<td align="left" valign="top">Step : Exposure Condition</td>
<td align="left" valign="top">&#8211;0.08</td>
<td align="left" valign="top">0.10</td>
<td align="left" valign="top">&#8211;0.27</td>
<td align="left" valign="top">0.12</td>
</tr>
<tr>
<td align="left" valign="top">Speaker Gender : Exposure Condition</td>
<td align="left" valign="top">0.09</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.05</td>
<td align="left" valign="top">0.23</td>
</tr>
<tr>
<td align="left" valign="top">Block Order : Exposure Condition</td>
<td align="left" valign="top">&#8211;0.05</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">&#8211;0.26</td>
<td align="left" valign="top">0.16</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Block Order</td>
<td align="left" valign="top">&#8211;0.12</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.25</td>
<td align="left" valign="top">0.01</td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Exposure Condition</td>
<td align="left" valign="top">&#8211;0.03</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.15</td>
<td align="left" valign="top">0.11</td>
</tr>
<tr>
<td align="left" valign="top">Step : Block Order : Exposure Condition</td>
<td align="left" valign="top">&#8211;0.12</td>
<td align="left" valign="top">0.10</td>
<td align="left" valign="top">&#8211;0.31</td>
<td align="left" valign="top">0.07</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Speaker Gender : Block Order : Exposure Condition</bold></td>
<td align="left" valign="top"><bold>0.18</bold></td>
<td align="left" valign="top"><bold>0.07</bold></td>
<td align="left" valign="top"><bold>0.03</bold></td>
<td align="left" valign="top"><bold>0.32</bold></td>
</tr>
<tr>
<td align="left" valign="top">Step : Speaker Gender : Block Order : Exposure Condition</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.05</td>
<td align="left" valign="top">0.21</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> Aggregated results by Step, Speaker Gender (female = red, male = blue), Exposure Condition (Female Shifted, No Shift), and Test Block Order (Female &#8594; Male, Male &#8594; Female) in Experiment 2.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-21217-g2.png"/>
</fig>
<p>A meaningful main effect of Step emerged, indicating that listeners provided more <italic>pie</italic> responses as step number increased. There was also a consistent main effect of Block Order, such that on average, listeners who heard the female block first made fewer <italic>pie</italic> responses. Critically, however, the effect of Block Order was modulated by a 3-way interaction between Speaker Gender, Block Order, and Exposure Condition.</p>
<p>To examine this 3-way interaction, the <italic>hypothesis</italic> function from <italic>brms</italic> was employed (a method of inspecting interactions without running additional post-hoc models, as in a frequentist analysis; for details, see Chapter 7 of <xref ref-type="bibr" rid="B15">Barreda &amp; Silbert, 2023</xref>). The hypothesis compared the 2-way interaction between Speaker Gender and Exposure Condition for each block order. A meaningful interaction with a positive coefficient was found for listeners who heard the female block first [&#946;: 0.26, SE: 0.11, 95% HDI = (0.05, 0.47)], but not for listeners who heard the male block first [&#946;: &#8211;0.09, SE: 0.10, 95% HDI = (&#8211;0.29, 0.11)]. In other words, relative to the No Shift control condition, participants in the Female Shifted exposure condition gave more <italic>pie</italic> responses for the female speaker than for the male speaker, but only if they heard the female block first (i.e., Female &#8594; Male, not Male &#8594; Female).</p>
<p>Looking just at the top-left graph of <xref ref-type="fig" rid="F2">Figure 2</xref>, it becomes visually evident how listeners assigned to the Female Shifted exposure condition and Female &#8594; Male test block order provided a greater <italic>pie</italic> response for the female speaker relative to the male speaker. However, when this top-left cell is compared to the other three cells within <xref ref-type="fig" rid="F2">Figure 2</xref>, the adaptation effect might initially appear to be more nuanced &#8211; the three-way interaction seems to be driven by a reduced <italic>pie</italic> response for the novel male speaker, rather than a greater <italic>pie</italic> response for the novel female speaker. This was investigated statistically through an additional hypothesis examining the 2-way interaction between Block Order and Exposure Condition for each gender. There is ultimately no clear evidence to support this impressionistic observation of <xref ref-type="fig" rid="F2">Figure 2</xref>, as there were no meaningful interactions found for either the male test speaker [&#946;: &#8211;0.22, SE: 0.14, 95% HDI = (&#8211;0.49, 0.05)], or the female test speaker [&#946;: 0.13, SE: 0.12, 95% HDI = (&#8211;0.11, 0.36)]. (That said, there is a marginal interaction for the male test speaker, which accounts for why the meaningful effect of Block Order in the main model has a negative coefficient.)</p>
</sec>
<sec>
<title>3.3 Experiment 2: Interim discussion</title>
<p>Unlike Experiment 1, Experiment 2 found evidence for cross-talker generalization, with listeners in the Female &#8594; Male test block order providing more <italic>pie</italic> responses for the novel female test talker than for the novel male test talker. The only difference between the two experiments was the number of exposure talkers &#8211; Experiment 2 presented two female and two male exposure talkers, while Experiment 1 only exposed participants to one female and one male talker. It therefore seems that exposure to multiple talkers of each gender is necessary for generalization in this case, contrary to findings reported in recent work (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>).</p>
<p>Taken together, the results of Experiments 1 and 2 provide the clearest support for a numerosity account for cross-talker generalization. Since the phonetic shift is heard in <italic>multiple</italic> female speakers in Experiment 2, listeners might generate a mental model that groups the female exposure speakers together (either through shared acoustic features or shared social category membership) and designate a /p/ to [b] shift as a property of female speech in the context of the experiment. The talker-general nature of this model then results in generalization of the shift to a novel female talker, who is more similar (acoustically and socially) to the female exposure speakers. Overall, the current study finds a critical role of the number of exposure talkers on generalization, bolstering numerosity accounts and challenging both talker normalization and sufficient similarity accounts.</p>
<p>There are two caveats of the analysis and results in Experiments 1 and 2. First, generalization was somewhat short-lived in Experiment 2. A test phase categorization shift only occurs among listeners assigned to the Female &#8594; Male block order (i.e., when the female talker was presented first in the test phase), not among those placed in the Male &#8594; Female order (i.e., when there is a delay in presenting the novel (female) talker who is more similar to the shifted (female) exposure talkers). There are at least three different, non-mutually exclusive explanations for this order effect.</p>
<p>One explanation involves the particular test case used in the critical condition of the current study. As discussed in 1.3, there is no attested covariation in L1-English between a bad map stop consonant shift and speaker gender, such that female speakers produce /p/ as [b] and male speakers remain unshifted. The lack of familiarity with this type of phonetic shift may have made it more challenging to learn, resulting in more ephemeral generalization. A more robust effect might be observed if one or more aspects of the present test case are adjusted to align with attested L1-English phenomena (e.g., if the <italic>male</italic> speakers produced /p/ as [b] and the female speakers were unshifted; <xref ref-type="bibr" rid="B63">Robb et al., 2005</xref>).</p>
<p>Another possibility is that perhaps there was not enough exposure to result in prolonged generalization. Listeners in the critical condition of Experiment 2 heard 16 phonetically shifted stimuli from only two female talkers during a several-minute-long exposure phase. Given the novelty of the phonetic shift, greater exposure may be needed for the shift to be retained in memory. A more stable effect could be observed if listeners were presented with more stimuli<xref ref-type="fn" rid="n6">6</xref> (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>), completed several iterations of this task over multiple sessions (<xref ref-type="bibr" rid="B88">Xie &amp; Kurumada, 2024</xref>), and/or heard more than two shifted talkers (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>).</p>
<p>Finally, the transitory nature of generalized adaptation in Experiment 2 could also be explained by the structure of the exposure phase. Listeners heard a mixture of both filler and critical stimuli from two different male and two different female talkers in a randomized order, rather than in blocks. Perhaps presenting the exposure sentences in a blocked order (either by talker or by gender) would have made it easier to keep track of the speakers and to learn that only the females were phonetically shifted. This account accords with work showing detrimental effects of talker-switching on speech perception (<xref ref-type="bibr" rid="B50">Magnuson et al., 2021</xref>) and with the intuition that blocked training may be more helpful at the incipient stages of learning (for discussion, see <xref ref-type="bibr" rid="B62">Raviv et al., 2022, p. 476</xref>).<xref ref-type="fn" rid="n7">7</xref></p>
<p>Besides the order effect in Experiment 2, a second caveat is that so far, only the aggregated results have been shown, not the results by individual test talker. As mentioned in 2.1.3, the three female and three male talkers were fully counterbalanced so that each was presented as a test talker equally often. There is a minor possibility that the results genuinely support a sufficient similarity account (rather than a numerosity account), but that the aggregated analysis is masking this finding. The following section investigates this idea in greater depth and then presents a post-hoc analysis of both experiments.</p>
</sec>
</sec>
<sec>
<title>4. Post-hoc analysis of Experiments 1 and 2</title>
<p>A numerosity account claims that listeners must hear multiple talkers of each gender in order to generalize a gender-mediated phonetic shift to novel speakers (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>). In this case, as depicted in the hypothetical set of results in <xref ref-type="table" rid="T6">Table 6</xref>, there should be no evidence of cross-talker generalization for any exposure-test talker pair in Experiment 1, where only one talker of each gender was presented in exposure. By contrast, all possible exposure-test talker combinations in Experiment 2 should lead to generalization.</p>
<table-wrap id="T6">
<caption>
<p><bold>Table 6:</bold> Hypothetical set of results supporting a numerosity account. Each cell represents a particular exposure-test talker combination, where the column labels reflect the possible exposure talker(s) and the rows stand for the possible test talker. The second, third, and fourth columns (Joanna, Ruth, Salli) refer to Experiment 1 (one exposure talker), while the fifth, sixth, and seventh columns (Joanna + Ruth, Joanna + Salli, Ruth + Salli) refer to Experiment 2 (two exposure talkers). The values within each cell refer to presence (&#8216;Yes&#8217;) or absence (&#8216;No&#8217;) of cross-talker generalization in this hypothetical scenario. The dashes reflect the lack of talker-specific conditions in the current study (i.e., the exposure talkers were never presented in the test phase).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joanna + Ruth</bold></td>
<td align="left" valign="top"><bold>Joanna + Salli</bold></td>
<td align="left" valign="top"><bold>Ruth + Salli</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>According to a sufficient similarity account, the degree of similarity between exposure and test talkers mediates cross-talker generalization, not the number of talkers in exposure (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). Multi-talker exposure should only enhance generalization when there happens to be at least one exposure talker that is sufficiently similar to the test talker (and by the same token, if none of the exposure talkers are similar to the test talker, then multi-talker exposure is not expected to facilitate generalization). Critically, the test talker does not need to be similar to all of the exposure talkers for generalization to take place (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>).</p>
<p>One possibility is that both the null effect in Experiment 1 and the generalization effect in Experiment 2 could be explained by a sufficient similarity account, not by a numerosity account. Recall that there are three female talkers (named Joanna, Ruth, and Salli; see 2.1.1 for details). Experiment 1 presents one female talker in exposure and one novel female talker at test, meaning that there are six possible exposure-test combinations (Joanna &#8594; Ruth, Joanna &#8594; Salli, Ruth &#8594; Joanna, Ruth &#8594; Salli, Salli &#8594; Joanna, Salli &#8594; Ruth). Experiment 2 exposes listeners to two female talkers in the exposure phase and one novel female talker in the test phase, resulting in three possible exposure-test combinations (Joanna + Ruth &#8594; Salli, Joanna + Salli &#8594; Ruth, Ruth + Salli &#8594; Joanna).</p>
<p>If two of the female talkers are similar to each other and both are different from the third female talker, then the results in Experiments 1 and 2 could be accounted for by a sufficient similarity account. This idea is visually depicted within the hypothetical set of results in <xref ref-type="table" rid="T7">Table 7</xref>. If Joanna and Ruth are similar to each other, but both differ from Salli, then Experiment 1 should only show generalization for two of six exposure-test combinations (Joanna &#8594; Ruth, Ruth &#8594; Joanna). However, given the same similarity relations, two of three exposure-test combinations in Experiment 2 should result in generalization (Joanna + Salli &#8594; Ruth, Ruth + Salli &#8594; Joanna), since at least one exposure talker is similar to the test talker. In an analysis that glosses over the individual exposure-test groupings, as in 2.2 and 3.2 of the current study, Experiment 2 should show a test phase categorization shift, since the majority of participants (two-thirds) hear at least one exposure talker that is similar to the test talker. By contrast, only a minority of participants in Experiment 1 (one-third) are presented with sufficiently similar exposure and test talkers, which could lead to a null aggregate effect.</p>
<table-wrap id="T7">
<caption>
<p><bold>Table 7:</bold> Hypothetical set of results supporting a sufficient similarity account that would still lead to the aggregated results presented in 2.2 and 3.2. The structure of the table is the same as in <xref ref-type="table" rid="T6">Table 6</xref>, such that the row labels represent the test talkers and the column labels represent the exposure talkers. &#8216;Yes&#8217; and &#8216;No&#8217; refer to the presence or absence of generalization, respectively.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joanna + Ruth</bold></td>
<td align="left" valign="top"><bold>Joanna + Salli</bold></td>
<td align="left" valign="top"><bold>Ruth + Salli</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Although a cursory acoustic analysis was already presented in 2.1.1 confirming that all of the female speakers have higher f0 than the male speakers, this is not necessarily enough evidence to confidently conclude that the female speakers are sufficiently similar to each other. Speech contrasts are highly multidimensional (<xref ref-type="bibr" rid="B65">Schertz &amp; Clare, 2020</xref>), meaning that the female speakers in the current study could be similar or different from each other in additional variables besides f0. Since there are individual differences among listeners in cue weighting (<xref ref-type="bibr" rid="B40">Kapnoula et al., 2017</xref>), it is difficult in practice to determine which specific acoustic cues are being leveraged in judgments of cross-talker &#8220;similarity&#8221;. Therefore, the current post-hoc analysis takes a perceptual approach, assuming that two individuals must be sufficiently similar to each other if exposure to one talker leads to cross-talker generalization for the other.</p>
<sec>
<title>4.1 Analysis and results</title>
<p>The aggregated data analysis in 3.2 showed evidence of cross-talker generalization in Experiment 2 (i.e., greater <italic>pie</italic> response for the novel female than for the novel male speaker), but only among listeners who heard the novel female block first in the test phase. Given this order effect, the post-hoc analysis data only contained participants assigned to the Female &#8594; Male block order. Furthermore, there were no meaningful interactions involving continuum step in either experiment, so Step was removed as a predictor variable from the post-hoc models.</p>
<p>As depicted in Equation 2, all post-hoc models contained fixed effects of Speaker Gender (within-subjects; Female, Male), Exposure Condition (between-subjects; Female Shifted, No Shift), and their interaction, by-speaker and by-listener random intercepts, and by-listener random slopes for Speaker Gender. A separate model was fitted for each of the nine possible exposure-test talker combinations across both experiments (six models for Experiment 1, three models for Experiment 2). Although this post-hoc analysis decreases statistical power, the number of subjects analyzed in each model (Experiment 1 Models: n = 33, on average; Experiment 2 Models: n = 62, on average) is still comparable to the sample sizes of prior work with 80% power (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>).</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(2)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Response &#126; Speaker Gender * Exposure Condition + (1 + Speaker Gender &#124; Listener) + (1 &#124; Speaker)</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>If cross-talker generalization has occurred, then there should be a meaningful interaction between Speaker Gender and Exposure Condition with a positive coefficient (i.e., indicating that greater <italic>pie</italic> response was provided for the novel female talker than the novel male talker, but only in the Female Shifted exposure condition). The credible intervals in each post-hoc model for this interaction are listed in <xref ref-type="table" rid="T8">Table 8</xref>. The other fixed effects are not directly relevant for the current research question and are thus omitted for clarity (but see the Data accessibility statement to examine all of the statistical models).</p>
<table-wrap id="T8">
<caption>
<p><bold>Table 8:</bold> Summary statistics for the interaction between Speaker Gender and Exposure Condition in each post-hoc model. The first six rows are from the data from Experiment 1, while the last three rows are derived from the data from Experiment 2. Meaningful effects are in bold.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Experiment</bold></td>
<td align="left" valign="top"><bold>Exposure Talker</bold></td>
<td align="left" valign="top"><bold>Test Talker</bold></td>
<td align="left" valign="top"><bold>Estimate</bold></td>
<td align="left" valign="top"><bold>Est. Error</bold></td>
<td align="left" valign="top"><bold>l-95% CI</bold></td>
<td align="left" valign="top"><bold>u-95% CI</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Joanna</td>
<td align="left" valign="top">Ruth</td>
<td align="left" valign="top">0.03</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.11</td>
<td align="left" valign="top">0.17</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Joanna</td>
<td align="left" valign="top">Salli</td>
<td align="left" valign="top">0.02</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">&#8211;0.11</td>
<td align="left" valign="top">0.13</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Ruth</td>
<td align="left" valign="top">Joanna</td>
<td align="left" valign="top">&#8211;0.04</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.19</td>
<td align="left" valign="top">0.11</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Ruth</td>
<td align="left" valign="top">Salli</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">&#8211;0.03</td>
<td align="left" valign="top">0.15</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Salli</td>
<td align="left" valign="top">Joanna</td>
<td align="left" valign="top">&#8211;0.03</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.17</td>
<td align="left" valign="top">0.11</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 1</td>
<td align="left" valign="top">Salli</td>
<td align="left" valign="top">Ruth</td>
<td align="left" valign="top">0.03</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">&#8211;0.10</td>
<td align="left" valign="top">0.16</td>
</tr>
<tr>
<td align="left" valign="top">Experiment 2</td>
<td align="left" valign="top"><bold>Joanna + Salli</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>0.13</bold></td>
<td align="left" valign="top"><bold>0.04</bold></td>
<td align="left" valign="top"><bold>0.05</bold></td>
<td align="left" valign="top"><bold>0.22</bold></td>
</tr>
<tr>
<td align="left" valign="top">Experiment 2</td>
<td align="left" valign="top"><bold>Joanna + Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>0.10</bold></td>
<td align="left" valign="top"><bold>0.05</bold></td>
<td align="left" valign="top"><bold>0.002</bold></td>
<td align="left" valign="top"><bold>0.19</bold></td>
</tr>
<tr>
<td align="left" valign="top">Experiment 2</td>
<td align="left" valign="top">Ruth + Salli</td>
<td align="left" valign="top">Joanna</td>
<td align="left" valign="top">&#8211;0.06</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">&#8211;0.16</td>
<td align="left" valign="top">0.04</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Meaningful interactions were only present for two of three multi-talker exposure conditions (Joanna + Salli &#8594; Ruth, Joanna + Ruth &#8594; Salli). In other words, when there was only one exposure talker in Experiment 1, none of the individual exposure-test talker pairs facilitated cross-talker generalization.</p>
</sec>
<sec>
<title>4.2 Post-hoc analysis: Interim discussion</title>
<p>The results of the post-hoc analysis are summarized in <xref ref-type="table" rid="T9">Table 9</xref>. Overall, clear support is provided for a numerosity account over a sufficient similarity account. No evidence of generalization was observed between any of the six female exposure-test talker combinations in Experiment 1, where only one talker was presented in the exposure phase. A sufficient similarity account would attribute this null effect to a lack of sufficient similarity between all female talkers to each other in the current study (i.e., the Joanna voice is dissimilar to both the Ruth and Salli voices, and the Ruth voice is dissimilar to the Salli voice). In theory, if the exposure phase presents two female talkers as in Experiment 2, and both are dissimilar to the test talker, then generalization should be blocked (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>).</p>
<table-wrap id="T9">
<caption>
<p><bold>Table 9:</bold> Actual results of the post-hoc analysis. The structure of this table is the same as <xref ref-type="table" rid="T6">Tables 6</xref> and <xref ref-type="table" rid="T7">7</xref>, such that the row labels represent the test talkers and the column labels represent the exposure talkers. &#8216;Yes&#8217; and &#8216;No&#8217; refer to the presence or absence of generalization, respectively.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joanna + Ruth</bold></td>
<td align="left" valign="top"><bold>Joanna + Salli</bold></td>
<td align="left" valign="top"><bold>Ruth + Salli</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">Yes</td>
<td align="left" valign="top">-----</td>
<td align="left" valign="top">-----</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>However, this prediction is not borne out. Experiment 2 found robust evidence of cross-talker generalization, at least for two of the possible exposure-test talker combinations (Joanna + Salli &#8594; Ruth, Joanna + Ruth &#8594; Salli). This suggests that, as predicted by a numerosity account (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>), exposure to multiple talkers per gender is necessary to generalize an unfamiliar phonetic shift to a novel talker.</p>
<p>Why was no generalization observed for the Experiment 2 participants who were exposed to Ruth and Salli and tested on Joanna? One tentative possibility is that effects of acoustic similarity are at play. Even though Joanna has a higher f0 than all three male speakers in the current study (see <xref ref-type="table" rid="T3">Table 3</xref>), t-tests demonstrate that the test stimuli of Joanna have a lower f0 than both the exposure stimuli of Ruth (mean difference = 22.88 Hz, <italic>t</italic> = 23.953, <italic>p</italic> &lt; 0.001) and the exposure stimuli of Salli (mean difference = 18.37 Hz, <italic>t</italic> = 20.585, <italic>p</italic> &lt; 0.001). The exposure stimuli of Ruth and Salli, meanwhile, are more similar to each other acoustically. Although Salli does have a consistently greater f0 than Ruth (mean difference = 4.50 Hz, <italic>t</italic> = 5.58, <italic>p</italic> &lt; 0.001), the mean difference is reduced compared to the pairwise comparisons between Joanna/Ruth and Joanna/Salli. Perhaps Joanna was not sufficiently similar acoustically to Ruth and Salli, which blocked generalization among listeners in Experiment 2 who were exposed to Ruth and Salli and later tested on Joanna.</p>
<p>However, a sufficient similarity account is not entirely satisfactory. Assuming that Ruth and Salli are more similar to each other, it becomes unclear why no generalization took place in Experiment 1 for listeners who were exposed to Ruth and tested on Salli (Ruth &#8594; Salli) or for participants exposed to Salli and tested on Ruth (Salli &#8594; Ruth). Additionally, a sufficient similarity account would predict generalization to be <italic>weaker</italic> among listeners in Experiment 2 who were either exposed to Joanna and Salli and tested on Ruth (Joanna + Salli &#8594; Ruth) or exposed to Joanna and Ruth and tested on Salli (Joanna + Ruth &#8594; Salli). All participants in the critical conditions across Experiments 1 and 2 heard the same number of phonetically shifted stimuli (n = 16). Whereas all of these /p/ to [b] shifted tokens were produced by a single female exposure talker in Experiment 1, they were evenly distributed across two female exposure talkers in Experiment 2 (n = 8 each). If Ruth and Salli are truly the only two female speakers who share sufficient similarity, then the Experiment 2 listeners in the Joanna + Salli &#8594; Ruth and Joanna + Ruth &#8594; Salli conditions were exposed to fewer sufficiently similar tokens than the Experiment 1 listeners in the Salli &#8594; Ruth and Ruth &#8594; Salli conditions (n = 8 in Experiment 2 versus n = 16 in Experiment 1). Reduced exposure generally leads to less robust adaptation (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>), so according to a sufficient similarity account, generalization should have been less strongly facilitated in the Joanna + Salli &#8594; Ruth and Joanna + Ruth &#8594; Salli conditions of Experiment 2 compared to the Salli &#8594; Ruth and Ruth &#8594; Salli conditions of Experiment 1. Yet, the opposite pattern is depicted in <xref ref-type="table" rid="T8">Table 8</xref> &#8211; generalization was only facilitated in the Joanna + Salli &#8594; Ruth and Joanna + Ruth &#8594; Salli conditions of Experiment 2 and blocked in the Salli &#8594; Ruth and Ruth &#8594; Salli conditions of Experiment 1. Overall, the acoustic differences between Joanna and Ruth/Salli do not appear to adequately explain the current results.</p>
<p>An alternative explanation could involve the properties of the test continua themselves, which are depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>. Although efforts were made to match the continua as closely as possible (see Appendix 2), <xref ref-type="fig" rid="F3">Figure 3</xref> shows that listeners in the No Shift condition (where no phonetic shift was presented in the exposure phase) are already heavily biased towards a <italic>pie</italic> response for Joanna compared to Ruth and Salli (Joanna: 67.86% <italic>pie</italic> response; Ruth: 51.93% <italic>pie</italic> response; Salli: 51.89% <italic>pie</italic> response).<xref ref-type="fn" rid="n8">8</xref> The presence of generalization is also indicated by a greater proportion of <italic>pie</italic> responses, so it is possible that there was a ceiling effect &#8211; among the Experiment 2 listeners who were tested on Joanna, listeners in the Female Shifted condition could not provide a greater <italic>pie</italic> response than the No Shift condition, resulting in a null effect. The test phase stimuli for Ruth and Salli are not nearly as biased towards a <italic>pie</italic> response, so participants tested on Ruth and Salli in Experiment 2 showed meaningful generalization.</p>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> Aggregated results by Step and Speaker (JOA = Joanna; RUT = Ruth; SAL = Salli) in Experiment 2 for participants assigned to the No Shift condition (i.e., where no phonetic shift was presented in the exposure phase).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-21217-g3.png"/>
</fig>
<p>Although it is true that there are certain acoustic disparities among the female speakers (e.g., Joanna has a lower f0 than Ruth and Salli), the differences in f0 between the female and male speakers are much starker (see <xref ref-type="table" rid="T2">Table 2</xref>). It therefore seems more probable for listeners to group Joanna with one of the other female speakers than with the male speakers, based on (relative) acoustic and social similarity. All participants in the critical condition of Experiment 2 heard two shifted female and two unshifted male talkers in the exposure phase, and in the Joanna + Salli &#8594; Ruth and Joanna + Ruth &#8594; Salli conditions, listeners likely placed Joanna within the same mental model as Salli/Ruth. In accordance with a numerosity account, listeners attuned to the number of talkers with the /p/ to [b] shift and, given that the shift was heard in more than one exposure speaker, participants then generalized the shift to the most similar test talker (i.e., the novel female speaker).</p>
<p>Taken together, the results of the post-hoc analysis are more consistent with a numerosity account than a sufficient similarity account. A numerosity account more satisfactorily rationalizes why cross-talker generalization solely occurs with multi-talker exposure in Experiment 2 and never with single-talker exposure in Experiment 1. The absence of generalization in one of the three possible exposure-test talker combinations in Experiment 2 seems to reflect idiosyncratic variation in the test phase continua, not evidence against a numerosity account.</p>
</sec>
</sec>
<sec>
<title>5. General discussion</title>
<p>A substantial body of work has documented extensive talker-specific and socio-indexical structure within the speech signal, with productions varying systematically based on individual talkers (<xref ref-type="bibr" rid="B23">Chodroff &amp; Wilson, 2017</xref>; <xref ref-type="bibr" rid="B55">Newman et al., 2001</xref>) and based on social categories, such as gender, dialect, age, and many other variables (<xref ref-type="bibr" rid="B41">Kleinschmidt, 2019</xref>; <xref ref-type="bibr" rid="B46">Labov, 1966</xref>). Listeners have been consistently shown to apply their knowledge of this structure in speech perception (e.g., <xref ref-type="bibr" rid="B28">D&#8217;Onofrio, 2015</xref>; <xref ref-type="bibr" rid="B56">Niedzielski, 1999</xref>). What is not clear is how sociolinguistic perception emerges. When a listener hears a socially-conditioned variant for the first time, how do they learn that rather than being a talker-specific trait, this feature is characteristic of a broader social group that can generalize to other members of the same group?</p>
<p>The current study explored this question by presenting listeners with an unattested variant in L1-English (a gender-mediated /p/ to [b] phonetic shift). Experiment 1 exposed listeners to a single shifted female talker and a single unshifted male talker, while Experiment 2 exposed listeners to two shifted female talkers and two unshifted male talkers. In contrast to Experiment 1, cross-talker generalization (measured as a shift in /p/-/b/ categorization for a novel female talker) was only found in Experiment 2. Taken together, these results support a <italic>numerosity account</italic>. The number of exposure talkers can be critical in perceptual adaptation, and when listeners are presented with a previously unheard variant, multiple talkers per social group seem to be necessary for cross-talker generalization.</p>
<p>The rest of the general discussion is structured as follows. 5.1 leverages a sociolinguistic explanation to account for seeming discrepancies between the current study and prior work about how the number of exposure talkers impacts generalization. Then, 5.2 discusses a key step for future work.</p>
<sec>
<title>5.1 Taking a sociolinguistic perspective: When is multi-talker exposure necessary for cross-talker generalization?</title>
<p>The present findings address the ongoing debate about whether multi-talker exposure is necessary for cross-talker generalization. In a seminal paper, Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>) offered evidence in support of this claim: comprehension in noise of a novel Mandarin-accented English talker only improved for listeners exposed to multiple talkers of the same accent, not for listeners exposed to a single talker. In a replication of Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>), however, Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) did not observe any difference in perceptual performance between single and multi-talker exposure conditions. This later finding corroborated Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>), supporting the hypothesis that the degree of acoustic similarity between the exposure and test talkers mediates generalization, not the number of talkers in exposure. Now, seemingly in contrast to Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>), the current study indicates that multi-talker exposure may be required for generalized adaptation in some instances.</p>
<p>Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) account for the lack of replicability of the multi-talker exposure benefit by pointing to three methodological issues of Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>): (i) relatively low statistical power (n = 87 across five between-subjects conditions); (ii) the use of different exposure talkers in the single- and multi-talker exposure conditions; (iii) the presentation of only one test talker for all participants. These factors cannot explain the current findings because a large number of listeners were recruited (n = 383 in Experiment 1, n = 386 in Experiment 2), the exposure talkers were drawn from the same pool of talkers for all conditions, and test talker identity was evenly counterbalanced. Therefore, the presence of cross-talker generalization in only Experiment 2 of the current study, when listeners heard multiple talkers per gender in exposure, cannot merely be attributed to the confounds present in Bradlow and Bent (<xref ref-type="bibr" rid="B17">2008</xref>).</p>
<p>Even though the empirical result of this work differs from Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>), these apparent contradictions can be reconciled by appealing to a <italic>sociolinguistic</italic> framework. More specifically, the types of exposure conditions that induce generalization might vary based on the amount of experience listeners have with the presented variants/accents. Both the current work and prior work might be situated along the following timeline: (i) when listeners have little or no prior experience with a variant (e.g., the current study), then multi-talker exposure is necessary for generalization; (ii) when listeners have some prior experience with a variant (e.g., work on L2-accent adaptation; <xref ref-type="bibr" rid="B84">Xie et al., 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>), then some relevant exposure is necessary for generalization, but either a single talker or multiple talkers can be heard in exposure; (iii) when listeners have extensive prior experience with a variant (e.g., work on sociolinguistic perception; <xref ref-type="bibr" rid="B56">Niedzielski, 1999</xref>; <xref ref-type="bibr" rid="B70">Strand &amp; Johnson, 1996</xref>), then no exposure is needed for generalization.</p>
<p>On one end of the spectrum, the current study presented a phonetic variant that was intentionally designed to be unattested in L1-English (a &#8220;bad map&#8221; /p/ to [b] shift covarying with gender; see 1.3 for details). Given their unfamiliarity with this variant, listeners in Experiment 1 likely considered it to be a talker-specific idiosyncrasy when heard in a single female exposure talker, which blocked generalization towards a novel female talker. Generalization was only observed in Experiment 2, when participants heard the /p/ to [b] shift in multiple female exposure talkers and could thus presume that the shift was not merely a talker-specific trait. Even with multi-talker exposure, however, the generalization effect dissipates within only several minutes (see 3.2 and 3.3 for details), suggesting that listeners are reluctant to generalize when they have had little to no prior experience with a variant.</p>
<p>Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>), meanwhile, presented Mandarin-accented English to either undergraduates at a diverse American school (University of Connecticut; <xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>) or L1 speakers of American English (<xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). Even if participants self-reported a lack of familiarity with Mandarin-accented English, many likely had at least some previous exposure to other accents or dialects, considering the prevalence of L2-accented speakers in the United States (<xref ref-type="bibr" rid="B35">Graddol, 2003</xref>). Participants could have therefore discerned accent-independent properties within the speech signal (e.g., slower speaking rate, difficulty producing the tense/lax vowel contrast in English; <xref ref-type="bibr" rid="B10">Baese-Berk et al., 2013</xref>), deducing that the acoustic features of the exposure talkers are likely part of some broader L2-accent, not entirely idiosyncratic. Compared to participants in the current experiments, listeners are much less conservative when exposed to L2-accented speech &#8211; given the appropriate conditions (e.g., sufficient acoustic similarity between the exposure and novel talkers), generalization can be facilitated with either single- or multi-talker exposure (<xref ref-type="bibr" rid="B89">Xie and Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>) and can even occur with a 12-hour delay between the exposure and test phases (<xref ref-type="bibr" rid="B84">Xie et al., 2017</xref>).</p>
<p>On the opposite end of the spectrum as the current study are experiments on sociolinguistic perception, in which listeners hear variants that they already have ample experience hearing in their everyday lives (<xref ref-type="bibr" rid="B28">D&#8217;Onofrio, 2015</xref>; <xref ref-type="bibr" rid="B56">Niedzielski, 1999</xref>; <xref ref-type="bibr" rid="B70">Strand &amp; Johnson, 1996</xref>). These cases do not require any exposure phase at all for generalization to be triggered. By merely presenting a single cue to social identity, listeners can generalize their prior knowledge of sociolinguistic variation to a novel speaker. For example, participants&#8217; mental models of speaker gender and sibilant production are so robust that, without any prior exposure phase during an experiment, their categorization of tokens along an /s/-/&#643;/ continuum is altered based on whether a gender-ambiguous voice is presented with a stereotypically male or female face (<xref ref-type="bibr" rid="B53">Munson, 2011</xref>; <xref ref-type="bibr" rid="B70">Strand &amp; Johnson, 1996</xref>).</p>
<p>In summary, highlighting the role of listeners&#8217; social experiences not only unifies the current study with Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>), but also ties perceptual adaptation to sociolinguistic perception. This explanation also adds nuance to the literature, shifting from a binary debate about <italic>whether</italic> multi-talker exposure is necessary for cross-talker generalization (<xref ref-type="bibr" rid="B17">Bradlow &amp; Bent, 2008</xref>; <xref ref-type="bibr" rid="B84">Xie et al., 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>) to a broader discussion about <italic>when</italic> multi-talker exposure is necessary for cross-talker generalization.</p>
</sec>
<sec>
<title>5.2 Acoustic versus social similarity?</title>
<p>According to a numerosity account, listeners in the critical condition of Experiment 2 grouped the two phonetically shifted female exposure talkers together and generalized the shift towards the test talker who was the most similar (i.e., the novel female test talker, not the novel male test talker). A key question is whether this judgment of similarity is determined by acoustics or by social category membership. The current study cannot tease apart these two possibilities because female and male speakers are acoustically distinct (e.g., through differences in f0; see <xref ref-type="table" rid="T3">Table 3</xref>) and can also trigger differing categorical judgments about apparent gender (<xref ref-type="bibr" rid="B36">Hillenbrand &amp; Clark, 2009</xref>).</p>
<p>Prior work on cross-talker generalization has found evidence for both types of mechanisms. As an example of a socially-mediated effect, Aoki and Zellou (<xref ref-type="bibr" rid="B3">2023a</xref>) exposed listeners to Mandarin-accented English and observed that generalization towards a novel Mandarin-accented English speaker was facilitated when both the exposure and novel talkers were presented with an image of an East Asian face (i.e., similarity in apparent ethnicity). Meanwhile, in the absence of visual cues, Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) suggested that generalization of Mandarin-accented English stop consonants was based primarily on acoustic similarity, considering that listeners could not identify the accent of the speakers. Both acoustic and social similarity can lead to generalization, and it is thus possible that both are motivating the effects in the current study.</p>
<p>Future work on cross-talker generalization could explore what factors mediate the relative weighting of acoustic versus social similarity. For instance, there might be a critical role played by the types of voices that are presented. The current work deliberately focused on gender as a social category because it is highly salient perceptually and easily recognized by adult speakers (<xref ref-type="bibr" rid="B36">Hillenbrand &amp; Clark, 2009</xref>). By contrast, Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) and Xie et al. (<xref ref-type="bibr" rid="B86">2021</xref>) examined cross-talker generalization of Mandarin-accented speech. While naive listeners generally demonstrate above-chance performance in identification and discrimination tasks for L2-accents, accurate categorization of a speaker&#8217;s accent is often lower than for gender (<xref ref-type="bibr" rid="B6">Atagi &amp; Bent, 2017</xref>). Therefore, generalization effects for gender and L2-accented speech could be inherently different, with the former recruiting the influence of social categories (since listeners can easily match exposure and test talkers by gender) and the latter relying more heavily on acoustic mechanisms (since listeners are unsure whether the exposure and test talkers actually share the same accent).</p>
<p>Building upon Aoki and Zellou (<xref ref-type="bibr" rid="B3">2023a</xref>), however, presenting socially relevant visual cues might modulate effects of acoustic similarity. Generalization between two acoustically different Mandarin-accented English talkers in Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>) could be enhanced if both speakers share the same apparent ethnicity (and by the same token, a diminished generalization effect might be observed if two acoustically similar talkers do not have the same apparent ethnicity). Socially-mediated perceptual effects also do not necessarily have to be mediated by preconceived social categories &#8211; if a phonetically shifted or accented speaker has a pen in their mouth (<xref ref-type="bibr" rid="B49">Liu &amp; Jaeger, 2018</xref>), for example, it might trigger generalization towards another individual who similarly has a pen in their mouth.</p>
<p>Another important factor could be the experimental paradigm that is employed. The instructions of the current study mentioned the number and gender of the speakers within each phase, which could have brought more attention to social category membership. Indeed, at the end of the demographic survey following the main task, a handful of listeners in the critical conditions (n = 24 across both experiments) explicitly commented on the /p/ to [b] phonetic shift (e.g., &#8220;The female voices in part one sounded like they were saying all of the &#8216;P&#8217; words with a B instead and I thought that might&#8217;ve been intentional&#8221;). The phonetic shift in the current study was intended to be clearly noticeable, and the exposure phase task (keyword identification) was designed to draw attention to the phonetic shift. Stating whether stimuli are <italic>buy</italic> or <italic>pie</italic> in the test phase patently relates to the exposure phase manipulation, so listeners in the current study may have been influenced by strategic, decision-making biases based on social similarity (<xref ref-type="bibr" rid="B85">Xie et al., 2023</xref>). It seems plausible for listeners to take a different approach towards generalization when given a more implicit task with a less obvious phonological contrast (e.g., as in Xie and Myers (<xref ref-type="bibr" rid="B89">2017</xref>), who examined devoiced word-final stop consonants in Mandarin-accented speech through auditory lexical decision and cross-modal priming tasks).</p>
<p>Overall, as recent work has noted (<xref ref-type="bibr" rid="B85">Xie et al., 2023</xref>), the precise mechanisms of cross-talker generalization are still unclear, and additional work is needed to unpack effects of acoustic versus social similarity.</p>
</sec>
</sec>
<sec>
<title>6. Conclusion</title>
<p>Recent work on L2-accented English has claimed that multi-talker exposure is not necessary for cross-talker generalization (<xref ref-type="bibr" rid="B89">Xie &amp; Myers, 2017</xref>; <xref ref-type="bibr" rid="B86">Xie et al., 2021</xref>). However, the findings from the current study suggest that in certain cases, multi-talker exposure does appear to be necessary for generalization &#8211; listeners only apply a gender-mediated phonetic shift to novel talkers when they are previously exposed to multiple talkers per gender, not just a single talker per gender.</p>
<p>This seeming contradiction can be resolved by appealing to the social experiences of the listener. In particular, whereas many listeners have encountered some type of L2-accented English, the phonetic shift in the present experiments was designed to be unattested in L1-English (i.e., covariation of gender and stop consonant production, where female speakers produce /p/ as [b] and male speakers produce prototypical /p/). Multi-talker exposure might be unnecessary when exposed to more familiar types of speech and necessary when exposed to completely unfamiliar variants.</p>
<p>More broadly, relatively little work explicitly integrates perceptual adaptation and sociolinguistics &#8211; the present experiments begin to fill this gap, offering insights into cross-talker generalization and the emergence of sociolinguistic perception.</p>
</sec>
</body>
<back>
<sec>
<title>Appendix 1: Exposure stimuli</title>
<table-wrap id="T10">
<caption>
<p><bold>Table 10:</bold> The list of exposure stimuli, along with the number of words in each sentence, the target word, and the competitor item. The first 32 rows contain the critical stimuli, while the last 16 rows contain the filler stimuli.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Sentence</bold></td>
<td align="left" valign="top"><bold>N of words</bold></td>
<td align="left" valign="top"><bold>Competitor</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">The wall needed a new coat of paint.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">part</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">Instead of eating the entire cake, Joey only ate a part.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">past</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">Martha focuses on the future without dwelling on the past.</td>
<td align="left" valign="top">10</td>
<td align="left" valign="top">pause</td>
</tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">She started talking after a long pause.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">paw</td>
</tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">The dog had a furry paw.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">pawns</td>
</tr>
<tr>
<td align="left" valign="top">6</td>
<td align="left" valign="top">Chess games start with sixteen pawns.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">peace</td>
</tr>
<tr>
<td align="left" valign="top">7</td>
<td align="left" valign="top">The minister vowed to achieve world peace.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">peel</td>
</tr>
<tr>
<td align="left" valign="top">8</td>
<td align="left" valign="top">A key ingredient in the dessert is the zest from an orange peel.</td>
<td align="left" valign="top">13</td>
<td align="left" valign="top">perk</td>
</tr>
<tr>
<td align="left" valign="top">9</td>
<td align="left" valign="top">Free food was a great perk.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">pinch</td>
</tr>
<tr>
<td align="left" valign="top">10</td>
<td align="left" valign="top">Do not add too much salt to the stew, just add a pinch.</td>
<td align="left" valign="top">13</td>
<td align="left" valign="top">pine</td>
</tr>
<tr>
<td align="left" valign="top">11</td>
<td align="left" valign="top">The travelers walked through forests of oak and pine.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">pink</td>
</tr>
<tr>
<td align="left" valign="top">12</td>
<td align="left" valign="top">Flamingos are animals that are pink.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">place</td>
</tr>
<tr>
<td align="left" valign="top">13</td>
<td align="left" valign="top">New York City is a crowded place.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">plain</td>
</tr>
<tr>
<td align="left" valign="top">14</td>
<td align="left" valign="top">The dessert was unsatisfactory as the flavor was dull and plain.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">plan</td>
</tr>
<tr>
<td align="left" valign="top">15</td>
<td align="left" valign="top">To save the organization, the CEO had a plan.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">plant</td>
</tr>
<tr>
<td align="left" valign="top">16</td>
<td align="left" valign="top">The seedling grew into a healthy plant.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">plate</td>
</tr>
<tr>
<td align="left" valign="top">17</td>
<td align="left" valign="top">The naughty girl had no veggies on her plate.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">plea</td>
</tr>
<tr>
<td align="left" valign="top">18</td>
<td align="left" valign="top">The crying man made an emotional plea.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">plug</td>
</tr>
<tr>
<td align="left" valign="top">19</td>
<td align="left" valign="top">Charging a cell phone often requires a plug.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">plunge</td>
</tr>
<tr>
<td align="left" valign="top">20</td>
<td align="left" valign="top">With her swimsuit on, Rachel took a plunge.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">point</td>
</tr>
<tr>
<td align="left" valign="top">21</td>
<td align="left" valign="top">The soccer team won the match by one point.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">pork</td>
</tr>
<tr>
<td align="left" valign="top">22</td>
<td align="left" valign="top">The tacos contained chicken and pork.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">port</td>
</tr>
<tr>
<td align="left" valign="top">23</td>
<td align="left" valign="top">A horn announced that the ferry had left the port.</td>
<td align="left" valign="top">10</td>
<td align="left" valign="top">pose</td>
</tr>
<tr>
<td align="left" valign="top">24</td>
<td align="left" valign="top">At the end of the runway, the model struck a pose.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">pouch</td>
</tr>
<tr>
<td align="left" valign="top">25</td>
<td align="left" valign="top">The kangaroos were nursed in the mother&#8217;s pouch.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">prank</td>
</tr>
<tr>
<td align="left" valign="top">26</td>
<td align="left" valign="top">Danny was a jokester who was known to love a good prank.</td>
<td align="left" valign="top">12</td>
<td align="left" valign="top">priest</td>
</tr>
<tr>
<td align="left" valign="top">27</td>
<td align="left" valign="top">After Sunday mass, the churchgoer chatted with the priest.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">prince</td>
</tr>
<tr>
<td align="left" valign="top">28</td>
<td align="left" valign="top">The resident of the castle is a young prince.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">print</td>
</tr>
<tr>
<td align="left" valign="top">29</td>
<td align="left" valign="top">The novel is now in print.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">prize</td>
</tr>
<tr>
<td align="left" valign="top">30</td>
<td align="left" valign="top">In the athletic contest, the fastest runner won the grand prize.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">pulse</td>
</tr>
<tr>
<td align="left" valign="top">31</td>
<td align="left" valign="top">The doctor tried to revive the girl who had no pulse.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">purse</td>
</tr>
<tr>
<td align="left" valign="top">32</td>
<td align="left" valign="top">Sandra always stored her wallet in her purse.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">paint</td>
</tr>
<tr>
<td align="left" valign="top">33</td>
<td align="left" valign="top">Lisa had no shovel so she could not dig.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">sick</td>
</tr>
<tr>
<td align="left" valign="top">34</td>
<td align="left" valign="top">A large aquarium holds many fish.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">switch</td>
</tr>
<tr>
<td align="left" valign="top">35</td>
<td align="left" valign="top">The shoes were so small that her feet would not fit.</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">wig</td>
</tr>
<tr>
<td align="left" valign="top">36</td>
<td align="left" valign="top">Molly received the doll as a gift.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">dish</td>
</tr>
<tr>
<td align="left" valign="top">37</td>
<td align="left" valign="top">She smiled with a cheeky grin.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">twin</td>
</tr>
<tr>
<td align="left" valign="top">38</td>
<td align="left" valign="top">The mother gave her child a kiss.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">fin</td>
</tr>
<tr>
<td align="left" valign="top">39</td>
<td align="left" valign="top">The man who owned the mansion was rich.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">kid</td>
</tr>
<tr>
<td align="left" valign="top">40</td>
<td align="left" valign="top">Gordon tried to climb over the ridge.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">rim</td>
</tr>
<tr>
<td align="left" valign="top">41</td>
<td align="left" valign="top">He had many dishes to rinse.</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">dense</td>
</tr>
<tr>
<td align="left" valign="top">42</td>
<td align="left" valign="top">He would not skydive due to the risk.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">desk</td>
</tr>
<tr>
<td align="left" valign="top">43</td>
<td align="left" valign="top">At the soccer game, Kayla hurt her shin.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">thin</td>
</tr>
<tr>
<td align="left" valign="top">44</td>
<td align="left" valign="top">She came out to the lake for a swim.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">swag</td>
</tr>
<tr>
<td align="left" valign="top">45</td>
<td align="left" valign="top">My wool socks are warm and thick.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">fix</td>
</tr>
<tr>
<td align="left" valign="top">46</td>
<td align="left" valign="top">The storm reduced the large tree into a twig.</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">fig</td>
</tr>
<tr>
<td align="left" valign="top">47</td>
<td align="left" valign="top">The kite was lost to the wind.</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">hint</td>
</tr>
<tr>
<td align="left" valign="top">48</td>
<td align="left" valign="top">The genie announced he would grant one wish.</td>
<td align="left" valign="top">8</td>
<td align="left" valign="top">kick</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Appendix 2: Test stimuli creation and norming</title>
<p>For each of the six voices used in the current experiments (female: Joanna, Ruth, Salli; male: Joey, Matthew, Stephen), the words &#8216;buy&#8217; and &#8216;pie&#8217; were separately typed into the Amazon Web Services console and downloaded. Similar to the exposure stimuli (see Section 2.1.1), all productions of <italic>buy</italic> and <italic>pie</italic> were generated with neural text-to-speech synthesis, converted from .mp3 to .wav files (<xref ref-type="bibr" rid="B74">Tomar, 2006</xref>), and set to 60 dB SPL (<xref ref-type="bibr" rid="B16">Boersma &amp; Weenink, 2021</xref>). A 9-step <italic>buy-pie</italic> continuum was created in Praat for each voice, using the original <italic>buy</italic> and <italic>pie</italic> stimuli as endpoints (<xref ref-type="bibr" rid="B82">Winn, 2022</xref>). To create the most naturalistic stimuli possible, voice onset time (VOT) and f0 were covaried at each step, meaning that both VOT and f0 increased as the stimuli became more <italic>pie</italic>-like (<xref ref-type="bibr" rid="B24">Clayards, 2017</xref>). The VOT and f0 values for each speaker at each continuum step are shown in <xref ref-type="table" rid="T11">Table 11</xref> and <xref ref-type="table" rid="T12">Table 12</xref>, respectively.</p>
<table-wrap id="T11">
<caption>
<p><bold>Table 11:</bold> Voice onset time (ms) of each speaker at each continuum step. The second, third, and fourth columns correspond to the female speakers (Joanna, Ruth, Salli), while the last three columns correspond to the male speakers (Joey, Matthew, Stephen).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Step</bold></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joey</bold></td>
<td align="left" valign="top"><bold>Matthew</bold></td>
<td align="left" valign="top"><bold>Stephen</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">9.81</td>
<td align="left" valign="top">14.88</td>
<td align="left" valign="top">10.66</td>
<td align="left" valign="top">14.65</td>
<td align="left" valign="top">8.34</td>
<td align="left" valign="top">12.10</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">14.17</td>
<td align="left" valign="top">18.99</td>
<td align="left" valign="top">15.59</td>
<td align="left" valign="top">17.76</td>
<td align="left" valign="top">13.23</td>
<td align="left" valign="top">14.28</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">22.20</td>
<td align="left" valign="top">25.68</td>
<td align="left" valign="top">24.21</td>
<td align="left" valign="top">24.50</td>
<td align="left" valign="top">20.52</td>
<td align="left" valign="top">22.39</td>
</tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">31.37</td>
<td align="left" valign="top">30.51</td>
<td align="left" valign="top">30.78</td>
<td align="left" valign="top">32.12</td>
<td align="left" valign="top">28.77</td>
<td align="left" valign="top">29.26</td>
</tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">35.10</td>
<td align="left" valign="top">38.77</td>
<td align="left" valign="top">39.11</td>
<td align="left" valign="top">32.54</td>
<td align="left" valign="top">35.44</td>
<td align="left" valign="top">33.83</td>
</tr>
<tr>
<td align="left" valign="top">6</td>
<td align="left" valign="top">44.49</td>
<td align="left" valign="top">45.92</td>
<td align="left" valign="top">46.78</td>
<td align="left" valign="top">44.65</td>
<td align="left" valign="top">43.27</td>
<td align="left" valign="top">43.84</td>
</tr>
<tr>
<td align="left" valign="top">7</td>
<td align="left" valign="top">52.02</td>
<td align="left" valign="top">54.30</td>
<td align="left" valign="top">56.16</td>
<td align="left" valign="top">52.08</td>
<td align="left" valign="top">51.94</td>
<td align="left" valign="top">52.91</td>
</tr>
<tr>
<td align="left" valign="top">8</td>
<td align="left" valign="top">60.97</td>
<td align="left" valign="top">56.75</td>
<td align="left" valign="top">62.72</td>
<td align="left" valign="top">57.38</td>
<td align="left" valign="top">58.55</td>
<td align="left" valign="top">61.42</td>
</tr>
<tr>
<td align="left" valign="top">9</td>
<td align="left" valign="top">67.77</td>
<td align="left" valign="top">68.83</td>
<td align="left" valign="top">71.31</td>
<td align="left" valign="top">64.23</td>
<td align="left" valign="top">66.69</td>
<td align="left" valign="top">68.22</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T12">
<caption>
<p><bold>Table 12:</bold> Onset f0 (Hz) of each speaker at each continuum step. The second, third, and fourth columns correspond to the female speakers (Joanna, Ruth, Salli), while the last three columns correspond to the male speakers (Joey, Matthew, Stephen).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Step</bold></td>
<td align="left" valign="top"><bold>Joanna</bold></td>
<td align="left" valign="top"><bold>Ruth</bold></td>
<td align="left" valign="top"><bold>Salli</bold></td>
<td align="left" valign="top"><bold>Joey</bold></td>
<td align="left" valign="top"><bold>Matthew</bold></td>
<td align="left" valign="top"><bold>Stephen</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">173.92</td>
<td align="left" valign="top">227.11</td>
<td align="left" valign="top">199.62</td>
<td align="left" valign="top">104.71</td>
<td align="left" valign="top">135.59</td>
<td align="left" valign="top">139.48</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">176.29</td>
<td align="left" valign="top">228.81</td>
<td align="left" valign="top">204.68</td>
<td align="left" valign="top">107.29</td>
<td align="left" valign="top">136.49</td>
<td align="left" valign="top">141.14</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">179.72</td>
<td align="left" valign="top">232.07</td>
<td align="left" valign="top">206.43</td>
<td align="left" valign="top">110.18</td>
<td align="left" valign="top">140.27</td>
<td align="left" valign="top">142.61</td>
</tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">183.23</td>
<td align="left" valign="top">235.08</td>
<td align="left" valign="top">209.25</td>
<td align="left" valign="top">113.48</td>
<td align="left" valign="top">142.07</td>
<td align="left" valign="top">148.73</td>
</tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">186.25</td>
<td align="left" valign="top">238.66</td>
<td align="left" valign="top">212.69</td>
<td align="left" valign="top">116.08</td>
<td align="left" valign="top">144.90</td>
<td align="left" valign="top">148.89</td>
</tr>
<tr>
<td align="left" valign="top">6</td>
<td align="left" valign="top">188.10</td>
<td align="left" valign="top">241.63</td>
<td align="left" valign="top">214.87</td>
<td align="left" valign="top">119.37</td>
<td align="left" valign="top">148.46</td>
<td align="left" valign="top">153.37</td>
</tr>
<tr>
<td align="left" valign="top">7</td>
<td align="left" valign="top">190.37</td>
<td align="left" valign="top">244.95</td>
<td align="left" valign="top">217.28</td>
<td align="left" valign="top">121.45</td>
<td align="left" valign="top">150.30</td>
<td align="left" valign="top">157.53</td>
</tr>
<tr>
<td align="left" valign="top">8</td>
<td align="left" valign="top">194.69</td>
<td align="left" valign="top">247.63</td>
<td align="left" valign="top">221.03</td>
<td align="left" valign="top">124.00</td>
<td align="left" valign="top">152.73</td>
<td align="left" valign="top">159.85</td>
</tr>
<tr>
<td align="left" valign="top">9</td>
<td align="left" valign="top">195.90</td>
<td align="left" valign="top">251.62</td>
<td align="left" valign="top">222.93</td>
<td align="left" valign="top">126.57</td>
<td align="left" valign="top">154.38</td>
<td align="left" valign="top">164.16</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Using the same demographic filters as in Experiments 1 and 2 (see 2.1.2 for details), 120 listeners were recruited from Prolific and paid $0.80 (approximately $9 per hour) to norm the stimuli. None of the listeners in the norming study participated in either of the main experiments. After giving informed consent, participants completed a categorization task in which they listened to a stimulus on each trial and stated whether they heard <italic>buy</italic> or <italic>pie</italic>. Each listener was randomly assigned to one of the six speakers and categorized each continuum step five times in a randomized order (9 steps * 5 repetitions = 45 total trials). Data from five subjects were removed who either self-reported a hearing difficulty (n = 2), self-reported being older than 35 years old (i.e., a mismatch from their official Prolific profile; n = 1) or whose first language was not solely English (n = 2). 115 participants were included in the final analysis of the norming data (53 women, 60 men, 2 non-binary; mean age = 29.27 years, sd = 4.12; self-reported ethnicity: Asian = 27, Black = 15, Latino = 3, Mixed = 11, White = 59).</p>
<p>The aggregated results by speaker are shown in <xref ref-type="fig" rid="F4">Figure 4</xref> and by speaker gender in the left panel of <xref ref-type="fig" rid="F5">Figure 5</xref>. Impressionistically, the left panel of <xref ref-type="fig" rid="F5">Figure 5</xref> indicates that while the results for Steps 1&#8211;3 and 7&#8211;9 are evenly matched by speaker gender, Steps 4&#8211;6 induce a greater <italic>pie</italic> response for the female speakers than for the male speakers overall. Note that in Experiments 1 and 2, evidence for adaptation constitutes a greater <italic>pie</italic> response for novel female speakers compared to novel male speakers (see 1.4.2 for details) &#8211; this implies that the responses at each continuum step should ideally be matched by speaker gender in the norming phase, so that any gender difference in <italic>pie</italic> responses in the actual experiments can be attributed to an adaptation effect, rather than an inherent property of the stimuli.</p>
<fig id="F4">
<caption>
<p><bold>Figure 4:</bold> Aggregated results by Step and Speaker (female = top row; male = bottom row) in the norming experiment.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-21217-g4.png"/>
</fig>
<fig id="F5">
<caption>
<p><bold>Figure 5:</bold> Aggregated results by Step and Speaker Gender (female = red; male = blue) for the original 9-step continuum (left panel) and the adjusted 5-step continuum used in Experiments 1 and 2 (right panel).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-21217-g5.png"/>
</fig>
<p>To match the proportion of <italic>pie</italic> responses by gender as closely as possible, five specific steps were selected for each speaker (Joanna, Ruth, Stephen: 1, 4, 5, 6, 9; Salli: 1, 3, 4, 6, 9; Joey: 1, 3, 5, 6, 9; Matthew: 1, 4, 6, 7, 9). The aggregated 5-step continua are shown by speaker gender in the right panel of <xref ref-type="fig" rid="F5">Figure 5</xref>. To confirm the impressionistic observation that the continua are gender-matched, a Bayesian mixed-effects logistic regression model was fitted (<italic>pie</italic> = 1, <italic>buy</italic> = 0) through <italic>brms</italic> (<xref ref-type="bibr" rid="B18">B&#252;rkner, 2017</xref>) and Stan (<xref ref-type="bibr" rid="B69">Stan Development Team, 2023</xref>) in R (<xref ref-type="bibr" rid="B61">R Core Team, 2021</xref>) using the same priors as in the main experiments (see 2.1.4). As shown in Equation 3, the model had fixed effects of Step (within-subjects; numeric, scaled and centered), Speaker Gender (between-subjects; sum-coded; Female, Male) and their interaction, as well as by-speaker and by-listener random intercepts and by-listener random slopes for Step. The model revealed a meaningful effect of Step [&#946;: 4.32, SE: 0.29, 95% highest density interval (HDI) = (3.79, 4.91)], indicating that <italic>pie</italic> responses increased alongside continuum step. Critically, there was no main effect of Speaker Gender [&#946;: 0.05, SE: 0.54, 95% HDI = (&#8211;1.02, 1.09)] and no interaction [&#946;: 0.01, SE: 0.22, 95% HDI = (&#8211;0.43, 0.44)]. Given the lack of gender effect, the 5-step continua were deemed to be suitable and used in the test phase across all experiments.</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(3)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Response &#126; Step * Speaker Gender + (1 + Step &#124; Listener) + (1 &#124; Speaker)</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
</sec>
<sec>
<title>Data accessibility statement</title>
<p>The data and R scripts used to perform the analyses and generate the graphs are available on the Open Science Framework (OSF) at the following link: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.17605/osf.io/x9jgw">https://doi.org/10.17605/osf.io/x9jgw</ext-link>.</p>
</sec>
<sec>
<title>Ethics and consent</title>
<p>The procedures used in this study adhere to the tenets of the Declaration of Helsinki. Approval was obtained from the Institutional Review Board of the University of California, Davis (reference number: 1328085-2). The identity of all research subjects has been anonymized. All participants provided informed consent prior to entering the study.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors have no competing interests to declare.</p>
</sec>
<sec>
<title>Author contributions</title>
<p>NBA: Conceptualization, Data curation, Methodology, Investigation, Software, Formal analysis, Visualization, Investigation, Writing &#8211; original draft, Writing &#8211; review &amp; editing.</p>
<p>GZ: Conceptualization, Methodology, Writing &#8211; review &amp; editing, Supervision.</p>
</sec>
<sec>
<title>ORCiD IDs</title>
<p>NBA: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://orcid.org/0000-0002-6267-281X">https://orcid.org/0000-0002-6267-281X</ext-link></p>
<p>GZ: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://orcid.org/0000-0001-9167-0744">https://orcid.org/0000-0001-9167-0744</ext-link></p>
</sec>
<fn-group>
<fn id="n1"><p>The current test case revolves around binary gender. However, gender is much more complex than a male-female dichotomy &#8211; many individuals identify as non-binary (<xref ref-type="bibr" rid="B30">Eckert, 2014</xref>), and gender intersects with other social categories (<xref ref-type="bibr" rid="B19">Calder &amp; King, 2022</xref>). Expanding the present study beyond binary gender is an important step for future work.</p></fn>
<fn id="n2"><p>Xie et al. (<xref ref-type="bibr" rid="B85">2023</xref>) have recently pointed out that lexically guided perceptual learning can potentially be accounted for by at least three different mechanisms: (i) low-level normalization; (ii) shifts in category representations; (iii) changes in decision-making processes. The current study does not tease apart these possibilities and, thus, remains agnostic as to the underlying mechanisms behind any observed test phase categorization shift.</p></fn>
<fn id="n3"><p>Note that the term &#8216;normalization&#8217; has been employed in many different ways in the speech perception literature. Here, normalization is intended to hearken back to more traditional theories which claimed that &#8220;variants of a particular type of linguistic unit are <italic>normalized</italic> [emphasis added] to arrive at an abstract, prototypical representation&#8221; (<xref ref-type="bibr" rid="B58">Nygaard, 2005, p. 391</xref>). More modern theories of normalization do not necessarily assume that social information is stripped away from the speech signal (<xref ref-type="bibr" rid="B12">Barreda, 2020</xref>).</p></fn>
<fn id="n4"><p>It should be mentioned that in one of the stimuli, there is a word containing /b/ (&#8220;The soccer team won the match <italic>by</italic> one point&#8221;; refer to the 21st row of <xref ref-type="table" rid="T10">Table 10</xref> in Appendix 1). However, there are three reasons why this erroneous word is considered a relatively minor issue: (i) all listeners heard this instance of /b/, which does not explain the difference in results across experiments; (ii) the word <italic>by</italic> represents only 0.2% of the words listeners heard in the exposure phase (1/401); (iii) hearing canonical /b/ reinforces the mapping from [b] to /p/ in the shifted speakers (a more grave error would have been including a non-target word with canonical /p/, as in &#8220;<italic>Paris</italic> is a crowded place&#8221;, since this could have blocked adaptation; <xref ref-type="bibr" rid="B76">Tzeng et al., 2021</xref>). Nevertheless, the intention was to avoid any instances of /b/ in the exposure stimuli, following prior work (<xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>), and the erroneous stimulus is noted here for transparency.</p></fn>
<fn id="n5"><p>Note that the use of synthetic voices, as opposed to naturally produced voices, is not expected to block effects of gender or other social attributes on perceptual adaptation. According to the Computers Are Social Actors (CASA) framework, if human-like qualities are observed (e.g., the use of language, as in the exposure phase of all experiments in the current study), then users often treat technological agents as humans, applying the same social heuristics and stereotypes (<xref ref-type="bibr" rid="B54">Nass &amp; Moon, 2000</xref>; cf. <xref ref-type="bibr" rid="B34">Gambino et al., 2020</xref>). Indeed, synthetic speech has been successfully used in perceptual adaptation experiments (<xref ref-type="bibr" rid="B51">Maye et al., 2008</xref>; <xref ref-type="bibr" rid="B93">Zellou et al., 2023</xref>). Socially mediated linguistic behavior has also been observed with synthetic voices for a variety of other phenomena and tasks, including phonetic imitation (<xref ref-type="bibr" rid="B92">Zellou et al., 2021</xref>), apparent race judgments (<xref ref-type="bibr" rid="B38">Holliday, 2023</xref>), and judgments of perceived credibility (<xref ref-type="bibr" rid="B59">Pycha &amp; Zellou, 2024</xref>). Overall, it is predicted that the results of the current study are replicable with naturally-produced voices, although future work should explicitly test this claim.</p></fn>
<fn id="n6"><p>Although increasing the number of critical stimuli should facilitate adaptation to a certain point (<xref ref-type="bibr" rid="B27">Cummings &amp; Theodore, 2023</xref>), <italic>too many</italic> critical stimuli can lead to a negative after-effect, resulting in diminished adaptation (for details, see <xref ref-type="bibr" rid="B43">Kleinschmidt &amp; Jaeger, 2016, p. 683</xref>).</p></fn>
<fn id="n7"><p>The effect of variability on speech perception is highly complex (<xref ref-type="bibr" rid="B60">Quam &amp; Creel, 2021</xref>), and in contrast to the proposal in 3.3, Tzeng et al. (<xref ref-type="bibr" rid="B75">2016</xref>) found that a randomized exposure stimulus order facilitated cross-talker generalization, not a blocked order. However, the experimental design of Tzeng et al. (<xref ref-type="bibr" rid="B75">2016</xref>) is not the same as that of the current study. The latter authors focus on a different phenomenon (adaptation towards Spanish-accented English), using a different task (sentence-transcription-in-noise) with a different method of measuring generalization (transcription accuracy of a novel test talker). Given the differences present in Tzeng et al. (<xref ref-type="bibr" rid="B75">2016</xref>), it is unclear whether the same results should be expected in the current study. Future work examining the effect of training structure on perceptual adaptation is warranted.</p></fn>
<fn id="n8"><p>As discussed in Appendix 2, a norming study was conducted and efforts were made to make the continua as even as possible across speakers. These efforts were successful overall, given that there was no effect of Speaker Gender in the No Shift condition in the aggregated, group-level analysis (see <xref ref-type="fig" rid="F1">Figures 1</xref> and <xref ref-type="fig" rid="F2">2</xref>). However, as in Experiment 2, participants in the No Shift condition of Experiment 1 who heard Joanna in the test phase were more biased towards a <italic>pie</italic> response, relative to listeners who heard either Ruth or Salli at test (Joanna: 71.93% <italic>pie</italic> response; Ruth: 51.67% <italic>pie</italic> response; Salli: 50.38% <italic>pie</italic> response). It should be noted that the norming study had fewer participants than the main experiments (n = 120 versus n = 383 in Experiment 1 and n = 386 in Experiment 2) and a between-subjects design, with participants categorizing test phase stimuli from only one of the six speakers (i.e., &#126;20 listeners assigned to each speaker). The comparatively lower statistical power of the norming study could explain why the <italic>buy-pie</italic> continuum of Joanna seemed more evenly matched to the other female speakers in the norming study, but not in the main experiments.</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><mixed-citation publication-type="book"><string-name><surname>Aoki</surname>, <given-names>N.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2023c</year>). <chapter-title>Speakers talk more clearly when they see an East Asian face: Effects of visual guise on speech production</chapter-title>. In <string-name><given-names>R.</given-names> <surname>Skarnitzl</surname></string-name> &amp; <string-name><given-names>J.</given-names> <surname>Vol&#237;n</surname></string-name> (Eds.), <source>Proceedings of the 20th International Congress of Phonetic Sciences</source> (pp. <fpage>2294</fpage>&#8211;<lpage>2298</lpage>). <publisher-name>Guarant International</publisher-name>.</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><string-name><surname>Aoki</surname>, <given-names>N. B.</given-names></string-name>, <string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2022</year>). <article-title>The clear speech intelligibility benefit for text-to-speech voices: Effects of speaking style and visual guise</article-title>. <source>JASA Express Letters</source>, <volume>2</volume>(<issue>4</issue>), <fpage>045204</fpage>. <pub-id pub-id-type="doi">10.1121/10.0010274</pub-id></mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><string-name><surname>Aoki</surname>, <given-names>N. B.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2023a</year>). <article-title>Visual information affects adaptation to novel talkers: Ethnicity-specific and -independent learning of L2-accented speech</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>154</volume>(<issue>4</issue>), <fpage>2290</fpage>&#8211;<lpage>2304</lpage>. <pub-id pub-id-type="doi">10.1121/10.0021289</pub-id></mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><string-name><surname>Aoki</surname>, <given-names>N. B.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2023b</year>). <article-title>When clear speech does not enhance memory: Effects of speaking style, voice naturalness, and listener age</article-title>. <source>Proceedings of Meetings on Acoustics</source>, <volume>51</volume>(<issue>1</issue>), <fpage>060002</fpage>. <pub-id pub-id-type="doi">10.1121/2.0001766</pub-id></mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><string-name><surname>Aoki</surname>, <given-names>N. B.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2024</year>). <article-title>Being clear about clear speech: Intelligibility of hard-of-hearing-directed, non-native-directed, and casual speech for L1- and L2-English listeners</article-title>. <source>Journal of Phonetics</source>, <volume>104</volume>, <elocation-id>101328</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.wocn.2024.101328</pub-id></mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><string-name><surname>Atagi</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Bent</surname>, <given-names>T.</given-names></string-name> (<year>2017</year>). <article-title>Nonnative accent discrimination with words and sentences</article-title>. <source>Phonetica</source>, <volume>74</volume>(<issue>3</issue>), <fpage>173</fpage>&#8211;<lpage>191</lpage>. <pub-id pub-id-type="doi">10.1159/000452956</pub-id></mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="journal"><string-name><surname>Babel</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>McAuliffe</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Norton</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Senior</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Vaughn</surname>, <given-names>C.</given-names></string-name> (<year>2019</year>). <article-title>The Goldilocks zone of perceptual learning</article-title>. <source>Phonetica</source>, <volume>76</volume>(<issue>2&#8211;3</issue>), <fpage>179</fpage>&#8211;<lpage>200</lpage>. <pub-id pub-id-type="doi">10.1159/000494929</pub-id></mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><string-name><surname>Babel</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Russell</surname>, <given-names>J.</given-names></string-name> (<year>2015</year>). <article-title>Expectations and speech intelligibility</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>137</volume>(<issue>5</issue>), <fpage>2823</fpage>&#8211;<lpage>2833</lpage>. <pub-id pub-id-type="doi">10.1121/1.4919317</pub-id></mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><string-name><surname>Baese-Berk</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>McLaughlin</surname>, <given-names>D. J.</given-names></string-name>, &amp; <string-name><surname>McGowan</surname>, <given-names>K. B.</given-names></string-name> (<year>2020</year>). <article-title>Perception of non-native speech</article-title>. <source>Language and Linguistics Compass</source>, <volume>14</volume>(<issue>7</issue>), <elocation-id>e12375</elocation-id>. <pub-id pub-id-type="doi">10.1111/lnc3.12375</pub-id></mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="journal"><string-name><surname>Baese-Berk</surname>, <given-names>M. M.</given-names></string-name>, <string-name><surname>Bradlow</surname>, <given-names>A. R.</given-names></string-name>, &amp; <string-name><surname>Wright</surname>, <given-names>B. A.</given-names></string-name> (<year>2013</year>). <article-title>Accent-independent adaptation to foreign accented speech</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>133</volume>(<issue>3</issue>), <fpage>EL174</fpage>&#8211;<lpage>EL180</lpage>. <pub-id pub-id-type="doi">10.1121/1.4789864</pub-id></mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="journal"><string-name><surname>Baese-Berk</surname>, <given-names>M. M.</given-names></string-name>, &amp; <string-name><surname>Morrill</surname>, <given-names>T. H.</given-names></string-name> (<year>2015</year>). <article-title>Speaking rate consistency in native and non-native speakers of English</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>138</volume>(<issue>3</issue>), <fpage>EL223</fpage>&#8211;<lpage>EL228</lpage>. <pub-id pub-id-type="doi">10.1121/1.4929622</pub-id></mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><string-name><surname>Barreda</surname>, <given-names>S.</given-names></string-name> (<year>2020</year>). <article-title>Vowel normalization as perceptual constancy</article-title>. <source>Language</source>, <volume>96</volume>(<issue>2</issue>), <fpage>224</fpage>&#8211;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1353/lan.2020.0018</pub-id></mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><string-name><surname>Barreda</surname>, <given-names>S.</given-names></string-name> (<year>2021</year>). <article-title>Perceptual validation of vowel normalization methods for variationist research</article-title>. <source>Language Variation and Change</source>, <volume>33</volume>(<issue>1</issue>), <fpage>27</fpage>&#8211;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1017/S0954394521000016</pub-id></mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><string-name><surname>Barreda</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Predeck</surname>, <given-names>K.</given-names></string-name> (<year>2024</year>). <article-title>Inaccurate but predictable: Vocal-tract length estimation and gender stereotypes in height perception</article-title>. <source>Journal of Phonetics</source>, <volume>102</volume>, <elocation-id>101290</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.wocn.2023.101290</pub-id></mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="book"><string-name><surname>Barreda</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Silbert</surname>, <given-names>N.</given-names></string-name> (<year>2023</year>). <source>Bayesian multilevel models for repeated measures data: A conceptual and practical introduction in R</source>. <publisher-name>Routledge</publisher-name>. <pub-id pub-id-type="doi">10.4324/9781003285878</pub-id></mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="webpage"><string-name><surname>Boersma</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Weenink</surname>, <given-names>D.</given-names></string-name> (<year>2021</year>). <article-title>Praat: Doing phonetics by computer (Version 6.1.40)</article-title> [Computer program]. <uri>https://www.fon.hum.uva.nl/praat/</uri></mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="journal"><string-name><surname>Bradlow</surname>, <given-names>A. R.</given-names></string-name>, &amp; <string-name><surname>Bent</surname>, <given-names>T.</given-names></string-name> (<year>2008</year>). <article-title>Perceptual adaptation to non-native speech</article-title>. <source>Cognition</source>, <volume>106</volume>(<issue>2</issue>), <fpage>707</fpage>&#8211;<lpage>729</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2007.04.005</pub-id></mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="journal"><string-name><surname>B&#252;rkner</surname>, <given-names>P.-C.</given-names></string-name> (<year>2017</year>). <article-title>brms: An R package for Bayesian multilevel models using Stan</article-title>. <source>Journal of Statistical Software</source>, <volume>80</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v080.i01</pub-id></mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><string-name><surname>Calder</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>King</surname>, <given-names>S.</given-names></string-name> (<year>2022</year>). <article-title>Whose gendered voices matter?: Race and gender in the articulation of /s/ in Bakersfield, California</article-title>. <source>Journal of Sociolinguistics</source>, <volume>26</volume>(<issue>5</issue>), <fpage>604</fpage>&#8211;<lpage>623</lpage>. <pub-id pub-id-type="doi">10.1111/josl.12584</pub-id></mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="journal"><string-name><surname>Campbell-Kibler</surname>, <given-names>K.</given-names></string-name> (<year>2010</year>). <article-title>Sociolinguistics and perception</article-title>. <source>Language and Linguistics Compass</source>, <volume>4</volume>(<issue>6</issue>), <fpage>377</fpage>&#8211;<lpage>389</lpage>. <pub-id pub-id-type="doi">10.1111/j.1749-818X.2010.00201.x</pub-id></mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="book"><string-name><surname>Carignan</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2023</year>). <chapter-title>Sociophonetics and vowel and nasality</chapter-title>. In <string-name><given-names>C.</given-names> <surname>Strelluf</surname></string-name> (Ed.), <source>The Routledge handbook of sociolinguistics</source> (pp. <fpage>237</fpage>&#8211;<lpage>260</lpage>). <publisher-name>Routledge</publisher-name>. <pub-id pub-id-type="doi">10.4324/9781003034636-12</pub-id></mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="journal"><string-name><surname>Charoy</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Samuel</surname>, <given-names>A. G.</given-names></string-name> (<year>2023</year>). <article-title>Bad maps may not always get you lost: Lexically driven perceptual recalibration for substituted phonemes</article-title>. <source>Attention, Perception &amp; Psychophysics</source>, <volume>85</volume>, <fpage>2437</fpage>&#8211;<lpage>2458</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-023-02725-1</pub-id></mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="journal"><string-name><surname>Chodroff</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Wilson</surname>, <given-names>C.</given-names></string-name> (<year>2017</year>). <article-title>Structure in talker-specific phonetic realization: Covariation of stop consonant VOT in American English</article-title>. <source>Journal of Phonetics</source>, <volume>61</volume>, <fpage>30</fpage>&#8211;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2017.01.001</pub-id></mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Clayards</surname>, <given-names>M.</given-names></string-name> (<year>2017</year>). <article-title>Individual talker and token covariation in the production of multiple cues to stop voicing</article-title>. <source>Phonetica</source>, <volume>75</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1159/000448809</pub-id></mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Pycha</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2021</year>). <article-title>Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech</article-title>. <source>Cognition</source>, <volume>210</volume>, <elocation-id>104570</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cognition.2020.104570</pub-id></mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="journal"><string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2020</year>). <article-title>Perception of concatenative vs. neural text-to-speech (TTS): Differences in intelligibility in noise and language attitudes</article-title>. <source>Proceedings of Interspeech</source>, <fpage>1733</fpage>&#8211;<lpage>1737</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2020-1336</pub-id></mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="journal"><string-name><surname>Cummings</surname>, <given-names>S. N.</given-names></string-name>, &amp; <string-name><surname>Theodore</surname>, <given-names>R. M.</given-names></string-name> (<year>2023</year>). <article-title>Hearing is believing: Lexically guided perceptual learning is graded to reflect the quantity of evidence in speech input</article-title>. <source>Cognition</source>, <volume>235</volume>, <elocation-id>105404</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cognition.2023.105404</pub-id></mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="journal"><string-name><surname>D&#8217;Onofrio</surname>, <given-names>A.</given-names></string-name> (<year>2015</year>). <article-title>Persona-based information shapes linguistic perception: Valley Girls and California vowels</article-title>. <source>Journal of Sociolinguistics</source>, <volume>19</volume>(<issue>2</issue>), <fpage>241</fpage>&#8211;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1111/josl.12115</pub-id></mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="journal"><string-name><surname>Eckert</surname>, <given-names>P.</given-names></string-name> (<year>1989</year>). <article-title>The whole woman: Sex and gender differences in variation</article-title>. <source>Language Variation and Change</source>, <volume>1</volume>(<issue>3</issue>), <fpage>245</fpage>&#8211;<lpage>267</lpage>. <pub-id pub-id-type="doi">10.1017/S095439450000017X</pub-id></mixed-citation></ref>
<ref id="B30"><mixed-citation publication-type="journal"><string-name><surname>Eckert</surname>, <given-names>P.</given-names></string-name> (<year>2014</year>). <article-title>The problem with binaries: Coding for gender and sexuality</article-title>. <source>Language and Linguistics Compass</source>, <volume>8</volume>(<issue>11</issue>), <fpage>529</fpage>&#8211;<lpage>535</lpage>. <pub-id pub-id-type="doi">10.1111/lnc3.12113</pub-id></mixed-citation></ref>
<ref id="B31"><mixed-citation publication-type="journal"><string-name><surname>Eisner</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Melinger</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Weber</surname>, <given-names>A.</given-names></string-name> (<year>2013</year>). <article-title>Constraints on the transfer of perceptual learning in accented speech</article-title>. <source>Frontiers in Psychology</source>, <volume>4</volume>, <elocation-id>148</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00148</pub-id></mixed-citation></ref>
<ref id="B32"><mixed-citation publication-type="journal"><string-name><surname>Feng</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name> (<year>2024</year>). <article-title>Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>155</volume>(<issue>5</issue>), <fpage>3071</fpage>&#8211;<lpage>3089</lpage>. <pub-id pub-id-type="doi">10.1121/10.0025931</pub-id></mixed-citation></ref>
<ref id="B33"><mixed-citation publication-type="journal"><string-name><surname>Flege</surname>, <given-names>J. E.</given-names></string-name>, &amp; <string-name><surname>Eefting</surname>, <given-names>W.</given-names></string-name> (<year>1987</year>). <article-title>Production and perception of English stops by native Spanish speakers</article-title>. <source>Journal of Phonetics</source>, <volume>15</volume>(<issue>1</issue>), <fpage>67</fpage>&#8211;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1016/S0095-4470(19)30538-8</pub-id></mixed-citation></ref>
<ref id="B34"><mixed-citation publication-type="journal"><string-name><surname>Gambino</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Fox</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Ratan</surname>, <given-names>R. A.</given-names></string-name> (<year>2020</year>). <article-title>Building a stronger CASA: Extending the computers are social actors paradigm</article-title>. <source>Human-Machine Communication</source>, <volume>1</volume>, <fpage>71</fpage>&#8211;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.30658/hmc.1.5</pub-id></mixed-citation></ref>
<ref id="B35"><mixed-citation publication-type="book"><string-name><surname>Graddol</surname>, <given-names>D.</given-names></string-name> (<year>2003</year>). <chapter-title>The decline of the native speaker</chapter-title>. In <string-name><given-names>G.</given-names> <surname>Anderman</surname></string-name> &amp; <string-name><given-names>M.</given-names> <surname>Rogers</surname></string-name> (Eds.), <source>Translation today: Trends and perspectives</source> (pp. <fpage>152</fpage>&#8211;<lpage>167</lpage>). <publisher-name>Multilingual Matters</publisher-name>. <pub-id pub-id-type="doi">10.21832/9781853596179-013</pub-id></mixed-citation></ref>
<ref id="B36"><mixed-citation publication-type="journal"><string-name><surname>Hillenbrand</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name><surname>Clark</surname>, <given-names>M. J.</given-names></string-name> (<year>2009</year>). <article-title>The role of f0 and formant frequencies in distinguishing the voices of men and women</article-title>. <source>Attention, Perception, &amp; Psychophysics</source>, <volume>71</volume>, <fpage>1150</fpage>&#8211;<lpage>1166</lpage>. <pub-id pub-id-type="doi">10.3758/APP.71.5.1150</pub-id></mixed-citation></ref>
<ref id="B37"><mixed-citation publication-type="journal"><string-name><surname>Holliday</surname>, <given-names>N.</given-names></string-name> (<year>2021</year>). <article-title>Prosody and sociolinguistic variation in American Englishes</article-title>. <source>Annual Review of Linguistics</source>, <volume>7</volume>, <fpage>55</fpage>&#8211;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-linguistics-031220-093728</pub-id></mixed-citation></ref>
<ref id="B38"><mixed-citation publication-type="journal"><string-name><surname>Holliday</surname>, <given-names>N.</given-names></string-name> (<year>2023</year>). <article-title>Siri, you&#8217;ve changed! Acoustic properties and racialized judgments of voice assistants</article-title>. <source>Frontiers in Communication</source>, <volume>8</volume>, <elocation-id>1116955</elocation-id>. <pub-id pub-id-type="doi">10.3389/fcomm.2023.1116955</pub-id></mixed-citation></ref>
<ref id="B39"><mixed-citation publication-type="journal"><string-name><surname>Joos</surname>, <given-names>M.</given-names></string-name> (<year>1948</year>). <article-title>Acoustic phonetics</article-title>. <source>Language</source>, <volume>24</volume>(<issue>2</issue>), <fpage>5</fpage>&#8211;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.2307/522229</pub-id></mixed-citation></ref>
<ref id="B40"><mixed-citation publication-type="journal"><string-name><surname>Kapnoula</surname>, <given-names>E. C.</given-names></string-name>, <string-name><surname>Winn</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>Kong</surname>, <given-names>E. J.</given-names></string-name>, <string-name><surname>Edwards</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>McMurray</surname>, <given-names>B.</given-names></string-name> (<year>2017</year>). <article-title>Evaluating the sources and functions of gradiency in phoneme categorization: An individual differences approach</article-title>. <source>Journal of Experimental Psychology: Human Perception and Performance</source>, <volume>43</volume>(<issue>9</issue>), <fpage>1594</fpage>&#8211;<lpage>1611</lpage>. <pub-id pub-id-type="doi">10.1037/xhp0000410</pub-id></mixed-citation></ref>
<ref id="B41"><mixed-citation publication-type="journal"><string-name><surname>Kleinschmidt</surname>, <given-names>D. F.</given-names></string-name> (<year>2019</year>). <article-title>Structure in talker variability: How much is there and how much can it help?</article-title> <source>Language, Cognition and Neuroscience</source>, <volume>34</volume>(<issue>1</issue>), <fpage>43</fpage>&#8211;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2018.1500698</pub-id></mixed-citation></ref>
<ref id="B42"><mixed-citation publication-type="journal"><string-name><surname>Kleinschmidt</surname>, <given-names>D. F.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2015</year>). <article-title>Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel</article-title>. <source>Psychological Review</source>, <volume>122</volume>(<issue>2</issue>), <fpage>148</fpage>&#8211;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1037/a0038695</pub-id></mixed-citation></ref>
<ref id="B43"><mixed-citation publication-type="journal"><string-name><surname>Kleinschmidt</surname>, <given-names>D. F.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2016</year>). <article-title>Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?</article-title> <source>Psychonomic Bulletin &amp; Review</source>, <volume>23</volume>, <fpage>678</fpage>&#8211;<lpage>691</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-015-0943-z</pub-id></mixed-citation></ref>
<ref id="B44"><mixed-citation publication-type="journal"><string-name><surname>Kleinschmidt</surname>, <given-names>D. F.</given-names></string-name>, <string-name><surname>Weatherholtz</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2018</year>). <article-title>Sociolinguistic perception as inference under uncertainty</article-title>. <source>Topics in Cognitive Science</source>, <volume>10</volume>(<issue>4</issue>), <fpage>818</fpage>&#8211;<lpage>834</lpage>. <pub-id pub-id-type="doi">10.1111/tops.12331</pub-id></mixed-citation></ref>
<ref id="B45"><mixed-citation publication-type="journal"><string-name><surname>Kraljic</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name><surname>Samuel</surname>, <given-names>A. G.</given-names></string-name> (<year>2007</year>). <article-title>Perceptual adjustments to multiple speakers</article-title>. <source>Journal of Memory and Language</source>, <volume>56</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2006.07.010</pub-id></mixed-citation></ref>
<ref id="B46"><mixed-citation publication-type="book"><string-name><surname>Labov</surname>, <given-names>W.</given-names></string-name> (<year>1966</year>). <source>The social stratification of English in New York City</source>. <publisher-name>Center for Applied Linguistics</publisher-name>.</mixed-citation></ref>
<ref id="B47"><mixed-citation publication-type="journal"><string-name><surname>Lai</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name><surname>Tamminga</surname>, <given-names>M.</given-names></string-name> (<year>2024</year>). <article-title>Phonetics-phonology mapping in the generalization of perceptual learning</article-title>. <source>Journal of Phonetics</source>, <volume>103</volume>, <elocation-id>101295</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.wocn.2024.101295</pub-id></mixed-citation></ref>
<ref id="B48"><mixed-citation publication-type="journal"><string-name><surname>Liberman</surname>, <given-names>A. M.</given-names></string-name>, &amp; <string-name><surname>Mattingly</surname>, <given-names>I. G.</given-names></string-name> (<year>1985</year>). <article-title>The motor theory of speech perception revised</article-title>. <source>Cognition</source>, <volume>21</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0277(85)90021-6</pub-id></mixed-citation></ref>
<ref id="B49"><mixed-citation publication-type="journal"><string-name><surname>Liu</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2018</year>). <article-title>Inferring causes during speech perception</article-title>. <source>Cognition</source>, <volume>174</volume>, <fpage>55</fpage>&#8211;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2018.01.003</pub-id></mixed-citation></ref>
<ref id="B50"><mixed-citation publication-type="journal"><string-name><surname>Magnuson</surname>, <given-names>J. S.</given-names></string-name>, <string-name><surname>Nusbaum</surname>, <given-names>H. C.</given-names></string-name>, <string-name><surname>Akahane-Yamada</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name><surname>Saltzman</surname>, <given-names>D.</given-names></string-name> (<year>2021</year>). <article-title>Talker familiarity and the accommodation of talker variability</article-title>. <source>Attention, Perception, &amp; Psychophysics</source>, <volume>83</volume>, <fpage>1842</fpage>&#8211;<lpage>1860</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-020-02203-y</pub-id></mixed-citation></ref>
<ref id="B51"><mixed-citation publication-type="journal"><string-name><surname>Maye</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Aslin</surname>, <given-names>R. N.</given-names></string-name>, &amp; <string-name><surname>Tanenhaus</surname>, <given-names>M. K.</given-names></string-name> (<year>2008</year>). <article-title>The Weckud Wetch of the Wast: Lexical adaptation to a novel accent</article-title>. <source>Cognitive Science</source>, <volume>32</volume>(<issue>3</issue>), <fpage>543</fpage>&#8211;<lpage>562</lpage>. <pub-id pub-id-type="doi">10.1080/03640210802035357</pub-id></mixed-citation></ref>
<ref id="B52"><mixed-citation publication-type="journal"><string-name><surname>Morris</surname>, <given-names>R. J.</given-names></string-name>, <string-name><surname>McCrea</surname>, <given-names>C. R.</given-names></string-name>, &amp; <string-name><surname>Herring</surname>, <given-names>K. D.</given-names></string-name> (<year>2008</year>). <article-title>Voice onset time differences between adult males and females: Isolated syllables</article-title>. <source>Journal of Phonetics</source>, <volume>36</volume>(<issue>2</issue>), <fpage>308</fpage>&#8211;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2007.06.003</pub-id></mixed-citation></ref>
<ref id="B53"><mixed-citation publication-type="journal"><string-name><surname>Munson</surname>, <given-names>B.</given-names></string-name> (<year>2011</year>). <article-title>The influence of actual and imputed talker gender of fricative perception, revisited (L)</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>130</volume>(<issue>5</issue>), <fpage>2631</fpage>&#8211;<lpage>2634</lpage>. <pub-id pub-id-type="doi">10.1121/1.3641410</pub-id></mixed-citation></ref>
<ref id="B54"><mixed-citation publication-type="journal"><string-name><surname>Nass</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Moon</surname>, <given-names>Y.</given-names></string-name> (<year>2000</year>). <article-title>Machines and mindlessness: Social responses to computers</article-title>. <source>Journal of Social Issues</source>, <volume>56</volume>(<issue>1</issue>), <fpage>81</fpage>&#8211;<lpage>103</lpage>. <pub-id pub-id-type="doi">10.1111/0022-4537.00153</pub-id></mixed-citation></ref>
<ref id="B55"><mixed-citation publication-type="journal"><string-name><surname>Newman</surname>, <given-names>R. S.</given-names></string-name>, <string-name><surname>Clouse</surname>, <given-names>S. A.</given-names></string-name>, &amp; <string-name><surname>Burnham</surname>, <given-names>J. L.</given-names></string-name> (<year>2001</year>). <article-title>The perceptual consequences of within-talker variability in fricative production</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>109</volume>(<issue>3</issue>), <fpage>1181</fpage>&#8211;<lpage>1196</lpage>. <pub-id pub-id-type="doi">10.1121/1.1348009</pub-id></mixed-citation></ref>
<ref id="B56"><mixed-citation publication-type="journal"><string-name><surname>Niedzielski</surname>, <given-names>N.</given-names></string-name> (<year>1999</year>). <article-title>The effect of social information on the perception of sociolinguistic variables</article-title>. <source>Journal of Language and Social Psychology</source>, <volume>18</volume>(<issue>1</issue>), <fpage>62</fpage>&#8211;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1177/0261927X99018001005</pub-id></mixed-citation></ref>
<ref id="B57"><mixed-citation publication-type="journal"><string-name><surname>Norris</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>McQueen</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name><surname>Cutler</surname>, <given-names>A.</given-names></string-name> (<year>2003</year>). <article-title>Perceptual learning in speech</article-title>. <source>Cognitive Psychology</source>, <volume>47</volume>(<issue>2</issue>), <fpage>204</fpage>&#8211;<lpage>238</lpage>. <pub-id pub-id-type="doi">10.1016/S0010-0285(03)00006-9</pub-id></mixed-citation></ref>
<ref id="B58"><mixed-citation publication-type="book"><string-name><surname>Nygaard</surname>, <given-names>L. C.</given-names></string-name> (<year>2005</year>). <chapter-title>Perceptual integration of linguistic and nonlinguistic properties of speech</chapter-title>. In <string-name><given-names>D. B.</given-names> <surname>Pisoni</surname></string-name> &amp; <string-name><given-names>R. E.</given-names> <surname>Remez</surname></string-name> (Eds.), <source>The handbook of speech perception</source> (pp. <fpage>390</fpage>&#8211;<lpage>413</lpage>). <publisher-name>Blackwell Publishing Ltd</publisher-name>. <pub-id pub-id-type="doi">10.1002/9780470757024.ch16</pub-id></mixed-citation></ref>
<ref id="B59"><mixed-citation publication-type="journal"><string-name><surname>Pycha</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2024</year>). <article-title>The influence of accent and device usage on perceived credibility during interactions with voice-AI assistants</article-title>. <source>Frontiers in Computer Science</source>, <volume>6</volume>, <elocation-id>1411414</elocation-id>. <pub-id pub-id-type="doi">10.3389/fcomp.2024.1411414</pub-id></mixed-citation></ref>
<ref id="B60"><mixed-citation publication-type="journal"><string-name><surname>Quam</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Creel</surname>, <given-names>S. C.</given-names></string-name> (<year>2021</year>). <article-title>Impacts of acoustic-phonetic variability on perceptual development for spoken language: A review</article-title>. <source>WIREs Cognitive Science</source>, <volume>12</volume>(<issue>5</issue>), <elocation-id>e1558</elocation-id>. <pub-id pub-id-type="doi">10.1002/wcs.1558</pub-id></mixed-citation></ref>
<ref id="B61"><mixed-citation publication-type="webpage"><collab>R Core Team</collab>. (<year>2021</year>). <source>R: A language and environment for statistical computing</source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>. <uri>https://www.R-project.org/</uri></mixed-citation></ref>
<ref id="B62"><mixed-citation publication-type="journal"><string-name><surname>Raviv</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Lupyan</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Green</surname>, <given-names>S. C.</given-names></string-name> (<year>2022</year>). <article-title>How variability shapes learning and generalization</article-title>. <source>Trends in Cognitive Science</source>, <volume>26</volume>(<issue>6</issue>), <fpage>462</fpage>&#8211;<lpage>483</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2022.03.007</pub-id></mixed-citation></ref>
<ref id="B63"><mixed-citation publication-type="journal"><string-name><surname>Robb</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Gilbert</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name><surname>Lerman</surname>, <given-names>J.</given-names></string-name> (<year>2005</year>). <article-title>Influence of gender and environmental setting on voice onset time</article-title>. <source>Folia Phoniatrica et Logopaedica</source>, <volume>57</volume>(<issue>3</issue>), <fpage>125</fpage>&#8211;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1159/000084133</pub-id></mixed-citation></ref>
<ref id="B64"><mixed-citation publication-type="journal"><string-name><surname>Samuel</surname>, <given-names>A. G.</given-names></string-name>, &amp; <string-name><surname>Kraljic</surname>, <given-names>T.</given-names></string-name> (<year>2009</year>). <article-title>Perceptual learning for speech</article-title>. <source>Attention, Perception, &amp; Psychophysics</source>, <volume>71</volume>, <fpage>1207</fpage>&#8211;<lpage>1218</lpage>. <pub-id pub-id-type="doi">10.3758/APP.71.6.1207</pub-id></mixed-citation></ref>
<ref id="B65"><mixed-citation publication-type="journal"><string-name><surname>Schertz</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Clare</surname>, <given-names>E. J.</given-names></string-name> (<year>2020</year>). <article-title>Phonetic cue weighting in perception and production</article-title>. <source>WIREs Cognitive Science</source>, <volume>11</volume>(<issue>2</issue>), <elocation-id>e1521</elocation-id>. <pub-id pub-id-type="doi">10.1002/wcs.1521</pub-id></mixed-citation></ref>
<ref id="B66"><mixed-citation publication-type="webpage"><collab>ShareAmerica</collab>. (<year>2023</year>, <month>December</month> <day>14</day>). <source>The United States is rich in languages</source>. <uri>https://share.america.gov/united-states-is-rich-in-languages/#:&#126;:text=In%20the%20U.S.%2C%20the%20number,according%20to%20the%20Census%20Bureau</uri>.</mixed-citation></ref>
<ref id="B67"><mixed-citation publication-type="journal"><string-name><surname>Sidaras</surname>, <given-names>S. K.</given-names></string-name>, <string-name><surname>Alexander</surname>, <given-names>J. E.</given-names></string-name>, &amp; <string-name><surname>Nygaard</surname>, <given-names>L. C.</given-names></string-name> (<year>2009</year>). <article-title>Perceptual learning of systematic variation in Spanish-accented speech</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>125</volume>(<issue>5</issue>), <fpage>3306</fpage>&#8211;<lpage>3316</lpage>. <pub-id pub-id-type="doi">10.1121/1.3101452</pub-id></mixed-citation></ref>
<ref id="B68"><mixed-citation publication-type="journal"><string-name><surname>Sol&#233;</surname>, <given-names>M.-J.</given-names></string-name> (<year>2018</year>). <article-title>Articulatory adjustments in initial voiced stops in Spanish, French and English</article-title>. <source>Journal of Phonetics</source>, <volume>66</volume>, <fpage>217</fpage>&#8211;<lpage>241</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2017.10.002</pub-id></mixed-citation></ref>
<ref id="B69"><mixed-citation publication-type="webpage"><collab>Stan Development Team</collab>. (<year>2023</year>). <source>Stan modeling language users guide and reference manual, Version</source>. Available online at: <uri>https://mc-stan.org</uri> (accessed January 3, 2024).</mixed-citation></ref>
<ref id="B70"><mixed-citation publication-type="book"><string-name><surname>Strand</surname>, <given-names>E. A.</given-names></string-name>, &amp; <string-name><surname>Johnson</surname>, <given-names>K.</given-names></string-name> (<year>1996</year>). <chapter-title>Gradient and visual speaker normalization in the perception of fricatives</chapter-title>. In <string-name><given-names>D.</given-names> <surname>Gibbon</surname></string-name> (Ed.), <source>Natural language processing and speech technology: Results of the 3rd KONVENS conference</source> (pp. <fpage>14</fpage>&#8211;<lpage>26</lpage>). <publisher-name>De Gruyter Mouton</publisher-name>. <pub-id pub-id-type="doi">10.1515/9783110821895-003</pub-id></mixed-citation></ref>
<ref id="B71"><mixed-citation publication-type="journal"><string-name><surname>Sumner</surname>, <given-names>M.</given-names></string-name> (<year>2011</year>). <article-title>The role of variation in the perception of accented speech</article-title>. <source>Cognition</source>, <volume>119</volume>(<issue>1</issue>), <fpage>131</fpage>&#8211;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2010.10.018</pub-id></mixed-citation></ref>
<ref id="B72"><mixed-citation publication-type="journal"><string-name><surname>Swartz</surname>, <given-names>B. L.</given-names></string-name> (<year>1992</year>). <article-title>Gender difference in Voice Onset Time</article-title>. <source>Perceptual and Motor Skills</source>, <volume>75</volume>(<issue>3</issue>), <fpage>983</fpage>&#8211;<lpage>992</lpage>. <pub-id pub-id-type="doi">10.2466/pms.1992.75.3.983</pub-id></mixed-citation></ref>
<ref id="B73"><mixed-citation publication-type="journal"><string-name><surname>Tamminga</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Wilder</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Lai</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name><surname>Wade</surname>, <given-names>L.</given-names></string-name> (<year>2020</year>). <article-title>Perceptual learning, talker specificity, and sound change</article-title>. <source>Papers in Historical Phonology</source>, <volume>5</volume>, <fpage>90</fpage>&#8211;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.2218/pihph.5.2020.4439</pub-id></mixed-citation></ref>
<ref id="B74"><mixed-citation publication-type="webpage"><string-name><surname>Tomar</surname>, <given-names>S.</given-names></string-name> (<year>2006</year>). <article-title>Converting video formats with FFmpeg</article-title>. <source>Linux Journal</source>, <volume>2006</volume>(<issue>146</issue>), <fpage>1</fpage>&#8211;<lpage>10</lpage>. <uri>https://dl.acm.org/doi/abs/10.5555/1134782.1134792</uri></mixed-citation></ref>
<ref id="B75"><mixed-citation publication-type="journal"><string-name><surname>Tzeng</surname>, <given-names>C. Y.</given-names></string-name>, <string-name><surname>Alexander</surname>, <given-names>J. E. D.</given-names></string-name>, <string-name><surname>Sidaras</surname>, <given-names>S. K.</given-names></string-name>, &amp; <string-name><surname>Nygaard</surname>, <given-names>L. C.</given-names></string-name> (<year>2016</year>). <article-title>The role of training structure in perceptual learning of accented speech</article-title>. <source>Journal of Experimental Psychology: Human Perception and Performance</source>, <volume>42</volume>(<issue>11</issue>), <fpage>1793</fpage>&#8211;<lpage>1805</lpage>. <pub-id pub-id-type="doi">10.1037/xhp0000260</pub-id></mixed-citation></ref>
<ref id="B76"><mixed-citation publication-type="journal"><string-name><surname>Tzeng</surname>, <given-names>C. Y.</given-names></string-name>, <string-name><surname>Nygaard</surname>, <given-names>L. C.</given-names></string-name>, &amp; <string-name><surname>Theodore</surname>, <given-names>R. M.</given-names></string-name> (<year>2021</year>). <article-title>A second chance for a first impression: Sensitivity to cumulative input statistics for lexically guided perceptual learning</article-title>. <source>Psychonomic Bulletin &amp; Review</source>, <volume>28</volume>, <fpage>1003</fpage>&#8211;<lpage>1014</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-020-01840-6</pub-id></mixed-citation></ref>
<ref id="B77"><mixed-citation publication-type="journal"><string-name><surname>Vasishth</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Gelman</surname>, <given-names>A.</given-names></string-name> (<year>2021</year>). <article-title>How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis</article-title>. <source>Linguistics</source>, <volume>59</volume>(<issue>5</issue>), <fpage>1311</fpage>&#8211;<lpage>1342</lpage>. <pub-id pub-id-type="doi">10.1515/ling-2019-0051</pub-id></mixed-citation></ref>
<ref id="B78"><mixed-citation publication-type="journal"><string-name><surname>Vasishth</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Nicenboim</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Beckman</surname>, <given-names>M. E.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Kong</surname>, <given-names>E. J.</given-names></string-name> (<year>2018</year>). <article-title>Bayesian data analysis in the phonetic sciences: A tutorial introduction</article-title>. <source>Journal of Phonetics</source>, <volume>71</volume>, <fpage>147</fpage>&#8211;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2018.07.008</pub-id></mixed-citation></ref>
<ref id="B79"><mixed-citation publication-type="journal"><string-name><surname>Vonessen</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Aoki</surname>, <given-names>N. B.</given-names></string-name>, <string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2024</year>). <article-title>Comparing perception of L1 and L2 English by human listeners and machines: Effect of interlocutor adaptations</article-title>. <source>The Journal of the Acoustical Society of America</source>, <volume>155</volume>(<issue>5</issue>), <fpage>3060</fpage>&#8211;<lpage>3070</lpage>. <pub-id pub-id-type="doi">10.1121/10.0025930</pub-id></mixed-citation></ref>
<ref id="B80"><mixed-citation publication-type="journal"><string-name><surname>Walker</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Hay</surname>, <given-names>J.</given-names></string-name> (<year>2011</year>). <article-title>Congruence between &#8216;word age&#8217; and &#8216;voice age&#8217; facilitates lexical access</article-title>. <source>Laboratory Phonology</source>, <volume>2</volume>(<issue>1</issue>), <fpage>219</fpage>&#8211;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1515/labphon.2011.007</pub-id></mixed-citation></ref>
<ref id="B81"><mixed-citation publication-type="journal"><string-name><surname>Weatherholtz</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2016</year>). <article-title>Speech perception and generalization across talkers and accents</article-title>. <source>Oxford Research Encyclopedia of Linguistics</source>. <pub-id pub-id-type="doi">10.1093/acrefore/9780199384655.013.95</pub-id></mixed-citation></ref>
<ref id="B82"><mixed-citation publication-type="webpage"><string-name><surname>Winn</surname>, <given-names>M.</given-names></string-name> (<year>2022</year>). <article-title>Make Voice Onset Time (VOT)/F0 continuum [Praat script]</article-title>. <uri>http://www.mattwinn.com/praat/Make_VOT_Continuum_v33.txt</uri></mixed-citation></ref>
<ref id="B83"><mixed-citation publication-type="book"><string-name><surname>Wolfram</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name><surname>Schilling</surname>, <given-names>N.</given-names></string-name> (<year>2015</year>). <source>American English: Dialects and variation</source>. <publisher-name>Wiley-Blackwell</publisher-name>.</mixed-citation></ref>
<ref id="B84"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Earle</surname>, <given-names>F. S.</given-names></string-name>, &amp; <string-name><surname>Myers</surname>, <given-names>E. B.</given-names></string-name> (<year>2017</year>). <article-title>Sleep facilitates generalisation of accent adaptation to a new talker</article-title>. <source>Language, Cognition and Neuroscience</source>, <volume>33</volume>(<issue>2</issue>), <fpage>196</fpage>&#8211;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2017.1369551</pub-id></mixed-citation></ref>
<ref id="B85"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name>, &amp; <string-name><surname>Kurumada</surname>, <given-names>C.</given-names></string-name> (<year>2023</year>). <article-title>What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review</article-title>. <source>Cortex</source>, <volume>166</volume>, <fpage>377</fpage>&#8211;<lpage>424</lpage>. <pub-id pub-id-type="doi">10.1016/j.cortex.2023.05.003</pub-id></mixed-citation></ref>
<ref id="B86"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Jaeger</surname>, <given-names>T. F.</given-names></string-name> (<year>2021</year>). <article-title>Cross-talker generalization in the perception of nonnative speech: A large-scale replication</article-title>. <source>Journal of Experimental Psychology</source>, <volume>150</volume>(<issue>11</issue>), <fpage>e22</fpage>&#8211;<lpage>e56</lpage>. <pub-id pub-id-type="doi">10.1037/xge0001039</pub-id></mixed-citation></ref>
<ref id="B87"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, &amp; <string-name><surname>Fowler</surname>, <given-names>C. A.</given-names></string-name> (<year>2013</year>). <article-title>Listening with a foreign-accent: The interlanguage speech intelligibility benefit in Mandarin speakers of English</article-title>. <source>Journal of Phonetics</source>, <volume>41</volume>(<issue>5</issue>), <fpage>369</fpage>&#8211;<lpage>378</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2013.06.003</pub-id></mixed-citation></ref>
<ref id="B88"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, &amp; <string-name><surname>Kurumada</surname>, <given-names>C.</given-names></string-name> (<year>2024</year>). <article-title>From first encounters to longitudinal exposure: A repeated exposure-test paradigm for monitoring speech adaptation</article-title>. <source>Frontiers in Psychology</source>, <volume>15</volume>, <elocation-id>1383904</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2024.1383904</pub-id></mixed-citation></ref>
<ref id="B89"><mixed-citation publication-type="journal"><string-name><surname>Xie</surname>, <given-names>X.</given-names></string-name>, &amp; <string-name><surname>Myers</surname>, <given-names>E. B.</given-names></string-name> (<year>2017</year>). <article-title>Learning a talker or learning an accent: Acoustic similarity constrains generalization of foreign accent adaptation to new talkers</article-title>. <source>Journal of Memory and Language</source>, <volume>97</volume>, <fpage>30</fpage>&#8211;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2017.07.005</pub-id></mixed-citation></ref>
<ref id="B90"><mixed-citation publication-type="journal"><string-name><surname>Yu</surname>, <given-names>A. C. L.</given-names></string-name> (<year>2010</year>). <article-title>Perceptual compensation is correlated with individuals&#8217; &#8220;autistic&#8221; traits: Implications for models of sound change</article-title>. <source>PLOS One</source>, <volume>5</volume>(<issue>8</issue>), <elocation-id>e11950</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0011950</pub-id></mixed-citation></ref>
<ref id="B91"><mixed-citation publication-type="journal"><string-name><surname>Yu</surname>, <given-names>A. C. L.</given-names></string-name>, &amp; <string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name> (<year>2019</year>). <article-title>Individual differences in language processing: Phonology</article-title>. <source>Annual Review of Linguistics</source>, <volume>5</volume>, <fpage>131</fpage>&#8211;<lpage>150</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-linguistics-011516-033815</pub-id></mixed-citation></ref>
<ref id="B92"><mixed-citation publication-type="journal"><string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Kline</surname>, <given-names>T.</given-names></string-name> (<year>2021</year>). <article-title>The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors</article-title>. <source>Language, Cognition and Neuroscience</source>, <volume>36</volume>(<issue>10</issue>), <fpage>1298</fpage>&#8211;<lpage>1312</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2021.1931372</pub-id></mixed-citation></ref>
<ref id="B93"><mixed-citation publication-type="journal"><string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Cohn</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Pycha</surname>, <given-names>A.</given-names></string-name> (<year>2023</year>). <article-title>Listener beliefs and perceptual learning</article-title>. <source>Language</source>, <volume>99</volume>(<issue>4</issue>), <fpage>692</fpage>&#8211;<lpage>725</lpage>. <pub-id pub-id-type="doi">10.1353/lan.2023.a914191</pub-id></mixed-citation></ref>
<ref id="B94"><mixed-citation publication-type="journal"><string-name><surname>Zellou</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Tamminga</surname>, <given-names>M.</given-names></string-name> (<year>2014</year>). <article-title>Nasal coarticulation changes over time in Philadelphia English</article-title>. <source>Journal of Phonetics</source>, <volume>47</volume>, <fpage>18</fpage>&#8211;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2014.09.002</pub-id></mixed-citation></ref>
</ref-list>
</back>
</article>