<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2767-0279</journal-id>
<journal-title-group>
<journal-title>Glossa Psycholinguistics</journal-title>
</journal-title-group>
<issn pub-type="epub">2767-0279</issn>
<publisher>
<publisher-name>eScholarship Publishing</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5070/G601147896</article-id>
<article-categories>
<subj-group>
<subject>Brief article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Statistical reporting inconsistencies in experimental linguistics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0006-0083-3944</contrib-id>
<name>
<surname>Etemady</surname>
<given-names>Dara Leonard Jenssen</given-names>
</name>
<email>daraetemady@gmail.com</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-1400-2739</contrib-id>
<name>
<surname>Roettger</surname>
<given-names>Timo B.</given-names>
</name>
<email>timo.b.roettger@gmail.com</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Department of Linguistics &amp; Scandinavian Studies, University of Oslo, Norway</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-10-10">
<day>10</day>
<month>10</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<elocation-id>21</elocation-id>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2025 The Author(s)</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://glossapsycholinguistics.journalpub.escholarship.org/articles/10.5070/G601147896/"/>
<abstract>
<p>The present article investigates the prevalence of statistical reporting inconsistencies across articles in thirteen experimental linguistics journals published between 2000 and 2023. Using the R package Statcheck, we retrieved 82,991 statistical tests from 13,065 articles and assessed whether p-values were consistent with their test statistic and degrees of freedom. Almost half of the articles (49%) that used null-hypothesis significance testing contained at least one inconsistent p-value. Around one in eight articles (12%) contained an inconsistency that may have affected the statistical conclusion. The inconsistency rates were comparable across journals and seem stable over publication years. We discuss possible reasons for this high rate and offer actionable steps for authors, reviewers, and editors to remedy this state of affairs.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>1. Introduction</title>
<p>What we know about human language and its cognitive underpinnings is often informed by experimental data. Researchers test theoretical predictions with their data using statistical tests. Depending on the results of their tests, researchers make claims for or against theoretical assumptions. Since these tests play such an integral part in the argumentation process of experimentalists, both the data these tests are based on and the computational procedure of the tests itself should be error-free. But humans are fallible; they make mistakes. We cannot avoid making errors, but we can make them at least detectable. Transparent sharing allows others to detect and correct human error.</p>
<p>In recent years, quantitative linguistics has seen repeated calls to become more transparent and reproducible through sharing data and statistical protocols, often under the banner of &#8220;open science&#8221; (<xref ref-type="bibr" rid="B2">Arvan et al., 2022</xref>; <xref ref-type="bibr" rid="B30">Laurinavichyute et al., 2022</xref>; <xref ref-type="bibr" rid="B40">Roettger, 2019</xref>). Despite these calls, the sharing of statistical protocols is still rather rare across the language sciences (<xref ref-type="bibr" rid="B7">Bochynska et al., 2023</xref>). If statistical procedures cannot be critically evaluated, human errors might be left undetected and, thus, remain uncorrected in the publication record. And if undetected errors affect the decision procedure of the analysis, i.e. whether a hypothesis is rejected or accepted, these errors might lead to &#8211; at best &#8211; overconfident, and &#8211; at worst &#8211; false, theoretical conclusions. The present article will present evidence that the published literature in experimental linguistics contains a concerning amount of such statistical inconsistencies, a state of affairs which warrants more rigorous data sharing practices.</p>
</sec>
<sec>
<title>2. Statistical reporting inconsistencies</title>
<p>The null-hypothesis significance testing (NHST) framework is, to date, the most dominant statistical framework that researchers use to test hypotheses in the language sciences (<xref ref-type="bibr" rid="B46">Sonderegger &amp; S&#243;skuthy, 2024</xref>). NHST tests are commonly reported in specific formats which usually contain the name of the test (e.g. F, t, &#967;<sup>2</sup>), a test statistic, the degrees of freedom of that test (if applicable), and the p-value, representing the probability of observing the data (or more extreme data) given the null hypothesis (i.e. given that the test statistic is zero) (see (1)):</p>
<fig><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-47896-g4.png"/></fig>
<p>Without access to the data and scripts, interested readers are left in the position of trusting the authors that the statistical analysis has been run and reported correctly. However, the three sets of indices in (1) have clearly defined mathematical relationships and can, thus, be easily assessed for consistency. For example, an F test with the degrees of freedom specified in (1) and a test-statistic of 3.88 should result in a p-value of 0.053, which is larger, not smaller, than 0.05. Possible reasons for this inconsistency are manifold: It could be a typo of the comparison sign, i.e. the authors meant to use = or &gt;, rather than &lt;. Additionally, any of the numbers could contain a typo, and sometimes an error might indicate erroneous rounding (e.g. 0.053 being rounded down to 0.05). Without access to data and scripts, it remains unclear to the reader what has caused this inconsistency. Such inconsistencies can be particularly concerning if the calculated p-value (here, 0.053) and the reported p-values (here, &lt; 0.05) are not on the same side of the alpha threshold. In NHST, p-values below a conventionalized alpha threshold, most commonly 0.05, are interpreted as evidence that the data are sufficiently inconsistent with the null hypothesis (&#8220;significant&#8221;). P-values above that threshold are considered consistent with the null hypothesis, and practically speaking, lead to a rejection of the alternative hypothesis (&#8220;non-significant&#8221;). In (1) above, the reported p-value suggests a significant result, but the p-value derived from the degrees of freedom and the test statistic suggests a non-significant result. In the following, we refer to these inconsistencies as <italic>decision inconsistencies</italic>.</p>
<p>The consistency of these values can be automatically assessed if statistical tests are reported in an unambiguous format. Recently, a series of studies used such automatic assessments to evaluate the prevalence of inconsistent statistical reporting in psychology (<xref ref-type="bibr" rid="B3">Bakker &amp; Wicherts, 2011</xref>, <xref ref-type="bibr" rid="B4">2014</xref>; <xref ref-type="bibr" rid="B10">Caperos &amp; Pardo, 2013</xref>; <xref ref-type="bibr" rid="B12">Claesen et al., 2023</xref>; <xref ref-type="bibr" rid="B21">Green et al., 2018</xref>; <xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>; <xref ref-type="bibr" rid="B38">Nuijten &amp; Polanin, 2020</xref>; <xref ref-type="bibr" rid="B51">Veldkamp et al., 2014</xref>; <xref ref-type="bibr" rid="B52">Wicherts et al., 2011</xref>), medical sciences (<xref ref-type="bibr" rid="B20">Garc&#305;&#769;a-Berthou &amp; Alcaraz, 2004</xref>; <xref ref-type="bibr" rid="B48">Van Aert et al., 2023</xref>), psychiatry (<xref ref-type="bibr" rid="B6">Berle &amp; Starcevic, 2007</xref>), cyber security studies (<xref ref-type="bibr" rid="B22">Gro&#223;, 2021</xref>), technological education research (<xref ref-type="bibr" rid="B9">Buckley et al., 2023</xref>), and experimental philosophy (<xref ref-type="bibr" rid="B14">Colombo et al., 2018</xref>). For example, looking at over 250,000 p-values published in major psychology journals, Nuijten et al. (<xref ref-type="bibr" rid="B37">2016</xref>) found that around 50% of the articles with statistical results contained at least one inconsistency, and around 13% contained at least one decision inconsistency. Other studies report inconsistency rates between 4% and 14%, with between 10% and 63% of articles containing at least one inconsistency and between 3% and 21% containing at least one decision inconsistency. These high inconsistency rates in other disciplines are concerning and have led to a constructive dialog, resulting in recommendations for best practices to either avoid inconsistencies or make them more detectable.</p>
<p>To assess the prevalence of statistical reporting inconsistencies in experimental linguistics, the present article conceptually replicates Nuijten et al. (<xref ref-type="bibr" rid="B37">2016</xref>) and assesses p-values reported in thirteen experimental linguistic journals published between 2000 and 2023. We explore whether the inconsistency rates differ across journals, whether they have changed over the course of the last 23 years and whether there is evidence for bias in these inconsistencies. We discuss the results and offer concrete recommendations for authors, reviewers, and editors to tackle this problem.</p>
</sec>
<sec>
<title>3. Method</title>
<sec>
<title>3.1 Quantitative analyses</title>
<p>All quantitative analyses were conducted using R Core Team (<xref ref-type="bibr" rid="B39">2025</xref>).</p>
</sec>
<sec>
<title>3.2 Sample</title>
<p>Focusing on experimental linguistic research, we used Kobrock and Roettger (<xref ref-type="bibr" rid="B29">2023</xref>) as a point of departure. They list 100 linguistic journals that had at least a hundred articles published at the time of assessment (2021) and a high ratio of articles containing the search string &#8220;experiment*&#8221; in title, abstract and/or keywords. Out of these 100 journals, we selected all journals with at least 10% of articles containing the search string &#8220;experiment*&#8221;. Out of the remaining 37 journals, we selected those journals that encouraged the use of APA style (<xref ref-type="bibr" rid="B1">American Psychological Association, 2020</xref>), either in the main body of the text or specifically regarding statistics, in the author guidelines, resulting in nine remaining journals. Moreover, to access the articles in .pdf format, the articles had to be either accessible to us through our library license, or open access, resulting in a final list of eight journals: <italic>Applied Psycholinguistics</italic> (APS), <italic>Bilingualism: Language and Cognition</italic> (BLC), <italic>Linguistic Approaches to Bilingualism</italic> (LAB), <italic>Language and Speech</italic> (LaS), <italic>Language Learning and Technology</italic> (LLT), <italic>Journal of Language and Social Psychology</italic> (LSP), <italic>Journal of Child Language</italic> (JCL), and <italic>Studies in Second Language Acquisition</italic> (SLA). An anonymous reviewer raised the justified concern that the resulting sample might not represent the vast majority of work in experimental linguistics. To address this concern, we additionally included those five journals that have published the highest absolute number of experimental articles (according to <xref ref-type="bibr" rid="B29">Kobrock &amp; Roettger, 2023</xref>), regardless of the above mentioned constraints, resulting in the inclusion of the following psycholinguistic outlets: <italic>Journal of Memory and Language</italic> (JML), <italic>Language, Cognition and Neuroscience</italic> (LCN, formerly <italic>Language and Cognitive Processes</italic>), <italic>Journal of Psycholinguistic Research</italic> (JPR), <italic>Journal Of Speech Language And Hearing Research</italic> (SLH), and <italic>Brain and Language</italic> (BAL).</p>
<p>We included only original research articles within the publication years of 2000&#8211;2023, excluding any book reviews, response articles, commentaries, editorials, corrigenda, errata, advertisements, etc.</p>
</sec>
<sec>
<title>3.3 Statcheck</title>
<p>We used the R package Statcheck (Version 1.5.0; <xref ref-type="bibr" rid="B36">Nuijten &amp; Epskamp, 2024</xref>) to automatically detect statistical reporting inconsistencies. Statcheck works as follows: After converting articles in .pdf or .html format to plain text, Statcheck searches for specific strings that correspond to a NHST result using regular expressions. That way, Statcheck can detect results of t-tests, F-tests, Z-tests, &#967;2-tests, correlation tests, and Q-tests, as long as the test result fulfills three conditions: (a) the test result is reported completely, including the test statistic, the degrees of freedom (if applicable), and the p-value; (b) the test result is in the body of the text, i.e. Statcheck usually misses information in tables; and (c) the test result is reported in APA style. Given these constraints, Statcheck is estimated to detect roughly 60% of all reported NHST results (<xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>). Statcheck uses the reported test statistic and degrees of freedom to recalculate the p-value, compares the reported and recalculated p-value and, if there is a mismatch, flags the test as containing an &#8220;inconsistency.&#8221; The algorithm takes into account that tests might have been performed as one-tailed by identifying the search strings &#8220;one-tailed,&#8221; &#8220;one-sided,&#8221; or &#8220;directional&#8221; in the body of the text. Moreover, Statcheck considers p = .000 and p &lt; .000 as inconsistent, because p-values of exactly zero are mathematically impossible and the APA manual (<xref ref-type="bibr" rid="B1">American Psychological Association, 2020</xref>) advises reporting very small p-values as p &lt; .001. Validity checks of Statcheck suggest that inter-rater reliability between manual coding and Statcheck is high, i.e. 0.76 for inconsistencies and 0.89 for decision inconsistencies (<xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>). The overall accuracy of Statcheck is estimated to be between 96.2% to 99.9% (<xref ref-type="bibr" rid="B35">Nuijten et al., 2017</xref>; but see <xref ref-type="bibr" rid="B44">Schmidt, 2017</xref>). We thus consider Statcheck a valid proxy for the prevalence of statistical reporting inconsistencies.</p>
<p>Articles from <italic>Linguistic Approaches to Bilingualism</italic> spanned 2011&#8211;2023; articles from <italic>Language, Cognition and Neuroscience</italic> spanned 2015&#8211;2023. Statcheck could not parse 291 articles, likely related to issues with rendering the Chi-Squared symbol in the conversion from .pdf to .txt, resulting in some gaps in coverage. This procedure resulted in 13,065 research articles, which were submitted to analysis.</p>
</sec>
<sec>
<title>3.4 Analysis and research questions</title>
<p>The aims of this study were explicitly exploratory and hypothesis-generating, thus, analyses remain merely descriptive, reporting on the proportion of articles/tests that are statistically (in)consistent. Our investigation set out to explore the following research questions: How prevalent are statistical inconsistencies (and decision inconsistencies) in our sample? (4.1) Do inconsistency rates vary across journals and/or publication years? (4.2) Is there evidence for bias, i.e. are processes that result in inconsistencies more likely to produce lower p-values? (4.3) And do decision inconsistencies matter, i.e. are they merely typos or do they have the potential to lead to erroneous theoretical interpretations? (4.4)</p>
</sec>
</sec>
<sec>
<title>4. Results</title>
<sec>
<title>4.1 Inconsistencies are highly prevalent</title>
<p>The results are summarized in <xref ref-type="table" rid="T1">Table 1</xref>. Out of 13,065 articles, 6,268 articles contained statistical tests that Statcheck could assess (48%), amounting to 82,991 assessable p-values. Interestingly, the five journals that did not explicitly encourage APA style had virtually identical rates of assessable articles, compared to the original sample. Overall, 10,421 p-values were flagged as inconsistent (12.6%), out of which 1,226 were considered decision inconsistencies (1.5%) (see <xref ref-type="table" rid="T1">Table 1</xref>).</p>
<table-wrap id="T1">
<caption>
<p><bold>Table 1:</bold> Number of eligible articles, assessable articles and results, inconsistencies and decision inconsistencies across all journals. (<italic>Applied Psycholinguistics</italic> (APS), <italic>Language and Brain</italic> (BAL), <italic>Bilingualism: Language and Cognition</italic> (BLC), <italic>Journal of Memory and Language</italic> (JML), <italic>Journal of Psycholinguistic Research</italic> (JPR), <italic>Linguistic Approaches to Bilingualism</italic> (LAB), <italic>Language and Speech</italic> (LaS), <italic>Language Cognition and Neuroscience</italic> (LCN, formerly <italic>Language and Cognitive Processes</italic>), <italic>Language Learning and Technology</italic> (LLT), <italic>Journal of Language and Social Psychology</italic> (LSP), <italic>Journal of Child Language</italic> (JCL), <italic>and Studies in Second Language Acquisition</italic> (SLA), <italic>Journal Of Speech Language And Hearing Research</italic> (SLH)).</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Journal</bold></td>
<td align="left" valign="top"><bold>Eligible articles</bold></td>
<td align="left" valign="top"><bold>Assessable articles</bold></td>
<td align="left" valign="top"><bold>Assessable results</bold></td>
<td align="left" valign="top"><bold>Inconsistencies</bold></td>
<td align="left" valign="top"><bold>Decision inconsistencies</bold></td>
</tr>
<tr>
<td align="left" valign="top">APS</td>
<td align="left" valign="top">953</td>
<td align="left" valign="top">690</td>
<td align="left" valign="top">9570</td>
<td align="left" valign="top">1368</td>
<td align="left" valign="top">170</td>
</tr>
<tr>
<td align="left" valign="top">BAL</td>
<td align="left" valign="top">2253</td>
<td align="left" valign="top">780</td>
<td align="left" valign="top">10344</td>
<td align="left" valign="top">1414</td>
<td align="left" valign="top">153</td>
</tr>
<tr>
<td align="left" valign="top">BLC</td>
<td align="left" valign="top">964</td>
<td align="left" valign="top">610</td>
<td align="left" valign="top">9093</td>
<td align="left" valign="top">1161</td>
<td align="left" valign="top">120</td>
</tr>
<tr>
<td align="left" valign="top">JCL</td>
<td align="left" valign="top">1109</td>
<td align="left" valign="top">529</td>
<td align="left" valign="top">6240</td>
<td align="left" valign="top">750</td>
<td align="left" valign="top">69</td>
</tr>
<tr>
<td align="left" valign="top">JML</td>
<td align="left" valign="top">1507</td>
<td align="left" valign="top">768</td>
<td align="left" valign="top">13236</td>
<td align="left" valign="top">921</td>
<td align="left" valign="top">107</td>
</tr>
<tr>
<td align="left" valign="top">JPR</td>
<td align="left" valign="top">1137</td>
<td align="left" valign="top">534</td>
<td align="left" valign="top">5576</td>
<td align="left" valign="top">791</td>
<td align="left" valign="top">85</td>
</tr>
<tr>
<td align="left" valign="top">LAB</td>
<td align="left" valign="top">471</td>
<td align="left" valign="top">133</td>
<td align="left" valign="top">1719</td>
<td align="left" valign="top">234</td>
<td align="left" valign="top">25</td>
</tr>
<tr>
<td align="left" valign="top">LCN</td>
<td align="left" valign="top">751</td>
<td align="left" valign="top">397</td>
<td align="left" valign="top">5979</td>
<td align="left" valign="top">894</td>
<td align="left" valign="top">89</td>
</tr>
<tr>
<td align="left" valign="top">LLT</td>
<td align="left" valign="top">421</td>
<td align="left" valign="top">111</td>
<td align="left" valign="top">919</td>
<td align="left" valign="top">201</td>
<td align="left" valign="top">61</td>
</tr>
<tr>
<td align="left" valign="top">LSP</td>
<td align="left" valign="top">695</td>
<td align="left" valign="top">376</td>
<td align="left" valign="top">4320</td>
<td align="left" valign="top">429</td>
<td align="left" valign="top">60</td>
</tr>
<tr>
<td align="left" valign="top">LaS</td>
<td align="left" valign="top">597</td>
<td align="left" valign="top">160</td>
<td align="left" valign="top">2433</td>
<td align="left" valign="top">286</td>
<td align="left" valign="top">45</td>
</tr>
<tr>
<td align="left" valign="top">SLA</td>
<td align="left" valign="top">593</td>
<td align="left" valign="top">247</td>
<td align="left" valign="top">2954</td>
<td align="left" valign="top">471</td>
<td align="left" valign="top">64</td>
</tr>
<tr>
<td align="left" valign="top">SLH</td>
<td align="left" valign="top">1614</td>
<td align="left" valign="top">933</td>
<td align="left" valign="top">10608</td>
<td align="left" valign="top">1501</td>
<td align="left" valign="top">178</td>
</tr>
<tr>
<td align="left" valign="top">Total</td>
<td align="left" valign="top">13065</td>
<td align="left" valign="top">6268</td>
<td align="left" valign="top">82991</td>
<td align="left" valign="top">10421</td>
<td align="left" valign="top">1226</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>4.2 Inconsistencies are comparable across journals and publication year</title>
<p>On average, 49% of assessable articles contained one or more inconsistencies (journals range from 42 to 56%) and 12% contained one or more decision inconsistencies (journals range from 10 to 15%) (see <xref ref-type="fig" rid="F1">Figure 1</xref>). The proportion of inconsistencies ranged from 7 to 22% across journals (1 to 7% for decision inconsistencies). These rates appear to be stable across year of publication (see <xref ref-type="fig" rid="F2">Figure 2</xref>).</p>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> Proportion of articles containing at least one inconsistency / decision inconsistency. (<italic>Applied Psycholinguistics</italic> (APS), <italic>Language and Brain</italic> (BAL), <italic>Bilingualism: Language and Cognition</italic> (BLC), <italic>Journal of Memory and Language</italic> (JML), <italic>Journal of Psycholinguistic Research</italic> (JPR), <italic>Linguistic Approaches to Bilingualism</italic> (LAB), <italic>Language and Speech</italic> (LaS), <italic>Language Cognition and Neuroscience</italic> (LCN, formerly <italic>Language and Cognitive Processes</italic>), <italic>Language Learning and Technology</italic> (LLT), <italic>Journal of Language and Social Psychology</italic> (LSP), <italic>Journal of Child Language</italic> (JCL), <italic>and Studies in Second Language Acquisition</italic> (SLA), <italic>Journal Of Speech Language And Hearing Research</italic> (SLH)).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-47896-g1.png"/>
</fig>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> Proportion of inconsistencies / decision inconsistencies across time overall (left panel) and split into journals (right panel). (<italic>Applied Psycholinguistics</italic> (APS), <italic>Language and Brain</italic> (BAL), <italic>Bilingualism: Language and Cognition</italic> (BLC), <italic>Journal of Memory and Language</italic> (JML), <italic>Journal of Psycholinguistic Research</italic> (JPR), <italic>Linguistic Approaches to Bilingualism</italic> (LAB), <italic>Language and Speech</italic> (LaS), <italic>Language Cognition and Neuroscience</italic> (LCN, formerly <italic>Language and Cognitive Processes</italic>), <italic>Language Learning and Technology</italic> (LLT), <italic>Journal of Language and Social Psychology</italic> (LSP), <italic>Journal of Child Language</italic> (JCL), <italic>and Studies in Second Language Acquisition</italic> (SLA), <italic>Journal Of Speech Language And Hearing Research</italic> (SLH)).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-47896-g2.png"/>
</fig>
</sec>
<sec>
<title>4.3 Inconsistencies appear biased, but bias has decreased over time</title>
<p>If inconsistencies were bias-free, we would expect different types of inconsistencies to be equally frequent. However, this is not the case. Inconsistencies that report the p-value as being larger or smaller than a reference value (e.g. p &gt; 0.05 or p &lt; 0.05, respectively) are not equally prevalent in the sample: There were 4.4% of inconsistent tests with p being reported as larger than a reference, but 6.7% of inconsistent tests with p being reported as smaller than a reference. So even if we assumed these inconsistencies were merely typos of the comparison sign (e.g. &lt; instead of &gt;), inconsistencies are biased towards producing smaller p-values.</p>
<p>These biases are also reflected in decision inconsistencies. Of all decision inconsistencies (n = 1,226), 68% represent cases in which a reported significant result (p &lt; 0.05) is recalculated as non-significant (p &gt; 0.05), i.e. non-significant results are more than twice as likely to be erroneously reported as significant than the other way around. The latter pattern, however, seems to have decreased over time. Reproducing Nuijten et al. (<xref ref-type="bibr" rid="B37">2016</xref>), <xref ref-type="fig" rid="F3">Figure 3</xref> plots the development of the bias observed for decision inconsistencies over time. The prevalence of decision inconsistencies in significant p-values seems to have slightly decreased over the years, while the prevalence of decision inconsistencies in non-significant p-values seems to have slightly increased over the years.</p>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> Percentage of tests flagged for decision inconsistencies falsely reporting significance (black) or non-significance (orange), plotted across publication years.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="glossapx-4-1-47896-g3.png"/>
</fig>
</sec>
<sec>
<title>4.4 Do decision inconsistencies have the potential to matter?</title>
<p>The question arises as to whether the inconsistencies detected have the potential to matter theoretically. This is a difficult question to address empirically, because experimental behavioral research is often characterized by weak links between theory and predictions (<xref ref-type="bibr" rid="B43">Scheel et al., 2021</xref>), making it difficult even to extract hypotheses and determine whether hypotheses have been corroborated or not from published articles (<xref ref-type="bibr" rid="B42">Scheel, 2022</xref>). What we can assess is whether a given test result has the potential to be theoretically relevant, i.e. whether theoretical claims can be made based on it. We approximated an answer to this question as follows: First, we extracted a random subsample (15%, n = 164) of articles which contain at least one decision inconsistency, distributed as equally as possible across publication years. We restricted the subsample to only those decision inconsistencies that involve either a larger than (&gt;) or smaller than (&lt;) comparison of the p-value, to allow unambiguous interpretation of the results. We then manually extracted the relevant sentence which the test statistic is embedded in and assessed whether there was a mismatch between the reported p-value significance and the authors&#8217; interpretation of the p-value significance, such that cases that are likely typos (mismatch) and true decision consistencies (match) could be separated. This ratio let us approximate the true decision inconsistency rate. We considered true decision inconsistencies theoretically relevant: For example, if an article claimed that two groups are different (or not) based on an erroneously reported significant test, we considered this claim to potentially have theoretical consequences. Out of the 164 cases assessed, most were straightforwardly interpretable (155, 95%). The remaining 9 cases were coded as unclear, i.e. even after assessing the manuscript in detail, it remained unclear to us how the p-value was interpreted. Unclear cases occurred mostly due to tables or footnotes with lists of test results without clear reference. Of the interpretable test reports, 132 matched the erroneously reported p-value (85%), representing true decision inconsistencies that potentially have theoretical consequences. Inversely, 23 cases mismatched the erroneously reported p-value (15%), possibly representing genuine typos, e.g. the article interpreted the results in line with the computed p-value and not the erroneously reported p-value.</p>
</sec>
</sec>
<sec>
<title>5. Discussion and recommendations</title>
<sec>
<title>5.1 Statistical reporting inconsistencies are prevalent</title>
<p>The present study found a large amount of statistical reporting inconsistencies across a sample of 13,065 experimental linguistic articles, containing 82,991 assessable p-values. 12.6% of all p-values were flagged as inconsistent, and 1.5% were flagged as decision inconsistencies, i.e. the reported p-value is on the opposite side of the alpha threshold from the recalculated p-value. A manual validation of a subset of decision inconsistencies revealed that 85% of these decision inconsistencies are interpreted in line with the inconsistent p-value, i.e. a paper claims that e.g. two groups are different (or not) based on an inconsistently reported significant test. On average, 49% of assessable articles contained at least one inconsistency and 12% contained at least one decision inconsistency. The present examination did not indicate strong differences in inconsistency rates across journals or publication years.</p>
<p>The present study can be considered a conceptual replication of previous studies investigating statistical reporting inconsistency across different disciplines (<xref ref-type="bibr" rid="B3">Bakker &amp; Wicherts, 2011</xref>, <xref ref-type="bibr" rid="B4">2014</xref>; <xref ref-type="bibr" rid="B10">Caperos &amp; Pardo, 2013</xref>; <xref ref-type="bibr" rid="B51">Veldkamp et al., 2014</xref>; <xref ref-type="bibr" rid="B52">Wicherts et al., 2011</xref>) and most recent assessments using the automatic tool Statcheck (<xref ref-type="bibr" rid="B9">Buckley et al., 2023</xref>; <xref ref-type="bibr" rid="B14">Colombo et al., 2018</xref>; <xref ref-type="bibr" rid="B22">Gro&#223;, 2021</xref>; <xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>). The discovered inconsistency rates fall in line with these studies, which report on inconsistency rates between 4% and 14%, with between 10% and 63% of articles containing at least one inconsistency. Moreover, the observed rates of inconsistencies and decision inconsistencies are virtually identical to rates reported by Nuijten et al. (<xref ref-type="bibr" rid="B37">2016</xref>) for the psychological science literature.</p>
<p>Even if the prevalence of these inconsistencies could be largely attributed to inconsequential typos or rounding errors (an assumption we cannot test without access to the data), the sheer amount of the inconsistencies that have made it through peer review should concern us. They are human errors. If such a substantial amount of errors is found in plain sight, the question naturally arises as to how many errors during the data analysis itself remain undetected. We should ask ourselves, if the tip of the iceberg is already so large, what is the volume of the submerged iceberg?</p>
<p>Our results suggest biases as well. The rate of decision inconsistencies was twice as high for p-values reported as significant than for those reported as non-significant. These biases have been reported for other disciplines as well (<xref ref-type="bibr" rid="B3">Bakker &amp; Wicherts, 2011</xref>; <xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>) and could indicate a systematic bias in favor of lower p-values, in general, and a bias towards significant results, in particular. Our data do not speak to the causes of these biases, but possible reasons include the following:</p>
<p>First, researchers might intentionally round down p-values because they think lower p-values are more convincing to reviewers and readers. This practice has been admitted to by 1 in 5 surveyed psychological researchers (<xref ref-type="bibr" rid="B26">John et al., 2012</xref>). Given that a non-trivial number of quantitative linguists have admitted to committing questionable research practices (and even fraud) (<xref ref-type="bibr" rid="B24">Isbell et al., 2022</xref>), we cannot exclude the possibility that some of the inconsistencies in our sample were, indeed, intentional. It is our strong belief, however, that the majority of inconsistencies are unintentional and caused by other mechanisms.</p>
<p>Second, researchers might scrutinize non-significant results more than significant results, or are less likely to double-check significant results than non-significant results, because results that confirm their hypothesis feed into their confirmation bias (<xref ref-type="bibr" rid="B34">Nickerson, 1998</xref>). For example, Fugelsang et al. (<xref ref-type="bibr" rid="B19">2004</xref>) let researchers evaluate data that are either consistent or inconsistent with their prior expectations. They showed that when researchers encounter results that are not in line with their expectations, they are likely to blame the methodology, while results that confirmed their expectations were rarely critically scrutinized.</p>
<p>Third, the observed bias might merely be a reflection of publication bias (<xref ref-type="bibr" rid="B18">Franco et al., 2014</xref>; <xref ref-type="bibr" rid="B47">Sterling, 1959</xref>) with (erroneously) reported significant p-values being more likely to be published than (erroneously) reported non-significant ones. Publication bias is a well-established pattern in experimental linguistic research, with many recent meta-analyses discussing possible evidence for it (<xref ref-type="bibr" rid="B25">Isbilen &amp; Christiansen, 2022</xref>; <xref ref-type="bibr" rid="B31">Lehtonen et al., 2018</xref>; <xref ref-type="bibr" rid="B33">Lu et al., 2024</xref>). For example, De Bruin et al. (<xref ref-type="bibr" rid="B17">2015</xref>) showed that studies with results supporting the bilingual-advantage theory were more likely to be published, while studies with results challenging the theory were significantly less likely to be published.</p>
<p>Regardless of what might possibly cause biases in the processes that generate inconsistencies, our data also suggest a positive development. The over-proportional occurrence of decision inconsistencies for p-values reported as significant has decreased over time.</p>
</sec>
<sec>
<title>5.2 Limitations of our study</title>
<p>While we believe our work offers an important contribution to improving statistical reporting practices in experimental linguistics, the present assessment and the conclusions we can draw from them are limited. First, our sample is limited to only a subset of experimental linguistic journals. However, given the selection of journals and their standing in the field, and given that the inconsistency rates of our study are not only comparable to similar studies from other disciplines, but also relatively stable across journals and time, our findings should be considered relevant for experimental linguistics at large.</p>
<p>Second, given the constraints on automatically detecting test statistics, Statcheck misses reported values that either diverge from APA reporting standards or are reported in tables. However, inconsistency rates in our own sample have been shown to be similar for results in APA format vs. results that diverge from APA format (<xref ref-type="bibr" rid="B3">Bakker &amp; Wicherts, 2011</xref>; <xref ref-type="bibr" rid="B37">Nuijten et al., 2016</xref>).</p>
<p>Third, Statcheck overestimates inconsistency rates, because it might not accurately detect corrections for multiple comparisons (<xref ref-type="bibr" rid="B44">Schmidt, 2017</xref>). Nuijten et al. (<xref ref-type="bibr" rid="B35">2017</xref>), however, show that not only were there only a small proportion of flagged inconsistencies related to multiple comparisons, but also that these multiple comparisons themselves were often erroneously reported. They conclude that &#8220;[a]ny reporting inconsistencies associated with these tests and corrections could not explain the high prevalence of reporting inconsistencies&#8221; (<xref ref-type="bibr" rid="B35">Nuijten et al., 2017, p. 27</xref>).</p>
<p>More elaborate automatic tools for the extraction of statistical information might allow for a more detailed and more accurate assessment of statistical reporting in the future (e.g. <xref ref-type="bibr" rid="B27">Kalmbach et al., 2023</xref>). Despite its limitations, Statcheck provides a rough proxy for true inconsistency rates in the published literature, and we hope the reader agrees that the prevalence of inconsistencies is a state of affairs that should be reflected upon.</p>
</sec>
<sec>
<title>5.3 Recommendations for the field</title>
<p>There are concrete actionable steps that the field of experimental linguistics can take to reduce statistical reporting inconsistencies. In order to avoid simple copy-and-paste errors related to working in two separate programs for writing the manuscript and conducting the statistical analysis, authors should consider <italic>literate programming</italic>, i.e. an integration of analysis code and prose into a single, dynamic document (<xref ref-type="bibr" rid="B11">Casillas et al., 2023</xref>; <xref ref-type="bibr" rid="B28">Knuth, 1984</xref>). Several implementations of literate programming are freely available to researchers, including common R markdown files (Rmd) and Quarto markdown files (qmd). Literate programming can ensure that values derived from the statistical analysis are automatically integrated into the manuscript document, avoiding errors that might happen during a manual transfer from one program to the other.</p>
<p>Authors should generally engage in transparent and reproducible practices that can reduce human errors or at least make them detectable by sharing their derived data (i.e. the anonymized data table that was analyzed) as well as a detailed description of their statistical protocol, ideally in form of reproducible scripts. Sharing reproducible analyses with reviewers allows the reviewers to reproduce the authors&#8217; analyses, possibly detect errors or even inappropriate statistical choices before publication, thus improving the quality and robustness of the final product. Moreover, publicly sharing their analyses has numerous benefits to the authors themselves beyond error detection: Open data and materials can facilitate collaboration (<xref ref-type="bibr" rid="B8">Boland et al., 2017</xref>), increase efficiency and sustainability (<xref ref-type="bibr" rid="B32">Lowndes et al., 2017</xref>), and are cited more often (<xref ref-type="bibr" rid="B13">Colavizza et al., 2020</xref>).</p>
<p>Reviewers can, additionally, check the statistical reporting consistency in the manuscript by using tools such as Statcheck (<xref ref-type="bibr" rid="B36">Nuijten &amp; Epskamp, 2024</xref>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://statcheck.io">http://statcheck.io</ext-link>) or p-checker (<xref ref-type="bibr" rid="B45">Sch&#246;nbrodt, 2015</xref>, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://shinyapps.org/apps/p-checker/">http://shinyapps.org/apps/p-checker/</ext-link>). Reviewers could consider requesting data and scripts during peer review. Such requests might be particularly justified when inconsistencies are apparent. Explicitly requesting that authors share data might instill additional care and quality checks when authors prepare their materials, but might also allow the reviewers to carefully reproduce the results and critically evaluate all choices made in the statistical analysis (<xref ref-type="bibr" rid="B41">Sakaluk et al., 2014</xref>). Recent evidence suggests that experimental linguistics is still characterized by a pluralism of statistical approaches, even when researchers are trying to answer the same research question (<xref ref-type="bibr" rid="B15">Coretta et al., 2023</xref>). Some of these approaches might be more appropriate than others (<xref ref-type="bibr" rid="B46">Sonderegger &amp; S&#243;skuthy, 2024</xref>; <xref ref-type="bibr" rid="B49">Vasishth, 2023</xref>), so more thorough evaluations of how researchers arrive at their statistical conclusions might elevate their analytical robustness. Moreover, a turn towards inferential frameworks that do not focus on binary decision procedures, might alleviate some of the biases we observed in the direction of inconsistencies (<xref ref-type="bibr" rid="B16">Cumming, 2014</xref>; <xref ref-type="bibr" rid="B50">Vasishth et al., 2018</xref>).</p>
<p>Journal editors could explicitly recommend consistency checks with algorithms such as Statcheck during peer review, a practice that has been taken up on by several journals from neighboring disciplines (<italic>Psychological Science</italic>,<xref ref-type="fn" rid="n1">1</xref>&#160;<italic>Advances in Methods and Practices in Psychological Science</italic>,<xref ref-type="fn" rid="n2">2</xref>&#160;<italic>Stress &amp; Health</italic> (<xref ref-type="bibr" rid="B5">Barber, 2017</xref>)). Editors could also demand, recommend or at least encourage data sharing for publication in their journal. Data sharing policies have been shown to substantially increase the reproducibility of analyses (e.g. <xref ref-type="bibr" rid="B23">Hardwicke et al., 2018</xref>; <xref ref-type="bibr" rid="B30">Laurinavichyute et al., 2022</xref>), and a number of linguistic journals have already implemented such policies, including journals within our sample. Having said that, open data policies which, for example, the <italic>Journal of Memory and Language</italic> introduced in 2018, did not seem to have affected the proportion of statistical inconsistencies after their introduction. The inconsistency rates for JML (and other journals) were rather stable across time, so open data alone might not resolve the issue without further changes to the research eco-system.</p>
<p>Researchers make errors. Researchers have biases. This is who we are as humans, and there is not much we can do about our nature. Being aware of this fact and how it might affect research might help us to make possibly negative consequences detectable and preventable.</p>
</sec>
</sec>
</body>
<back>
<sec>
<title>Data Accessibility Statement</title>
<p>All derived data and corresponding R scripts are available here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://osf.io/gx3ub/">https://osf.io/gx3ub/</ext-link>.</p>
</sec>
<sec>
<title>Acknowledgements</title>
<p>We are thankful for comments and suggestions by two anonymous reviewers and Brian Dillon. We also greatly appreciated comments on an earlier draft by Thomas Schmidt. All remaining errors are our own.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors have no conflict of interest to declare.</p>
</sec>
<sec>
<title>Author contributions</title>
<p>Conceptualization, Methodology, Validation, Formal Analysis, Review &amp; Editing of Manuscript, Data Curation &#8211; TBR. &amp; DLJE; Software, Investigation &#8211; DLJE; Writing of Original Draft, Visualization, Supervision &#8211; TBR.</p>
</sec>
<sec>
<title>ORCiD IDs</title>
<p><bold>Dara Leonard Jenssen Etemady:</bold>&#160;<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://orcid.org/0009-0006-0083-3944">https://orcid.org/0009-0006-0083-3944</ext-link></p>
<p><bold>Timo B. Roettger:</bold>&#160;<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://orcid.org/0000-0003-1400-2739">http://orcid.org/0000-0003-1400-2739</ext-link></p>
</sec>
<fn-group>
<fn id="n1"><p><ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.psychologicalscience.org/publications/psychological_science/ps-submissions">http://www.psychologicalscience.org/publications/psychological_science/ps-submissions</ext-link>; accessed on July 15, 2024.</p></fn>
<fn id="n2"><p><ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.psychologicalscience.org/publications/ampps/ampps-submission-guidelines">https://www.psychologicalscience.org/publications/ampps/ampps-submission-guidelines</ext-link>; accessed on July 15, 2024.</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><mixed-citation publication-type="book"><collab>American Psychological Association</collab>. (<year>2020</year>). <source>Publication manual of the American Psychological Association</source> (<edition>7th</edition> ed.). <pub-id pub-id-type="doi">10.1037/0000173-000</pub-id></mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><string-name><surname>Arvan</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Pina</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Parde</surname>, <given-names>N.</given-names></string-name> (<year>2022</year>). <article-title>Reproducibility in computational linguistics: Is source code enough?</article-title> <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>, <fpage>2350</fpage>&#8211;<lpage>2361</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2022.emnlp-main.150</pub-id></mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><string-name><surname>Bakker</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2011</year>). <article-title>The (mis)reporting of statistical results in psychology journals</article-title>. <source>Behavior Research Methods</source>, <volume>43</volume>, <fpage>666</fpage>&#8211;<lpage>678</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-011-0089-5</pub-id></mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><string-name><surname>Bakker</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2014</year>). <article-title>Outlier removal and the relation with reporting errors and quality of psychological research</article-title>. <source>PLoS One</source>, <volume>9</volume>(<issue>7</issue>), <elocation-id>e103360</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0103360</pub-id></mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><string-name><surname>Barber</surname>, <given-names>L. K.</given-names></string-name> (<year>2017</year>). <article-title>Meticulous manuscripts, messy results: Working together for robust science reporting</article-title>. <source>Stress &amp; Health</source>, <volume>33</volume>(<issue>2</issue>), <fpage>89</fpage>&#8211;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1002/smi.2756</pub-id></mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><string-name><surname>Berle</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name><surname>Starcevic</surname>, <given-names>V.</given-names></string-name> (<year>2007</year>). <article-title>Inconsistencies between reported test statistics and p-values in two psychiatry journals</article-title>. <source>International Journal of Methods in Psychiatric Research</source>, <volume>16</volume>(<issue>4</issue>), <fpage>202</fpage>&#8211;<lpage>207</lpage>. <pub-id pub-id-type="doi">10.1002/mpr.225</pub-id></mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="journal"><string-name><surname>Bochynska</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Keeble</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Halfacre</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Casillas</surname>, <given-names>J. V.</given-names></string-name>, <string-name><surname>Champagne</surname>, <given-names>I.-A.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>R&#246;thlisberger</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Buchanan</surname>, <given-names>E. M.</given-names></string-name>, &amp; <string-name><surname>Roettger</surname>, <given-names>T.</given-names></string-name> (<year>2023</year>). <article-title>Reproducible research practices and transparency across linguistics</article-title>. <source>Glossa Psycholinguistics</source>, <volume>2</volume>(<issue>1</issue>). <pub-id pub-id-type="doi">10.5070/G6011239</pub-id></mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><string-name><surname>Boland</surname>, <given-names>M. R.</given-names></string-name>, <string-name><surname>Karczewski</surname>, <given-names>K. J.</given-names></string-name>, &amp; <string-name><surname>Tatonetti</surname>, <given-names>N. P.</given-names></string-name> (<year>2017</year>). <article-title>Ten simple rules to enable multi-site collaborations through data sharing</article-title>. In <source>PLoS Computational Biology</source> <volume>13</volume>(<issue>1</issue>), <elocation-id>e1005278</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005278</pub-id></mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><string-name><surname>Buckley</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Hyland</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name><surname>Seery</surname>, <given-names>N.</given-names></string-name> (<year>2023</year>). <article-title>Estimating the replicability of technology education research</article-title>. <source>International Journal of Technology and Design Education</source>, <volume>33</volume>(<issue>4</issue>), <fpage>1243</fpage>&#8211;<lpage>1264</lpage>. <pub-id pub-id-type="doi">10.1007/s10798-022-09787-6</pub-id></mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="journal"><string-name><surname>Caperos</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name><surname>Pardo</surname>, <given-names>A.</given-names></string-name> (<year>2013</year>). <article-title>Consistency errors in p-values reported in Spanish psychology journals</article-title>. <source>Psicothema</source>, <volume>25</volume>(<issue>3</issue>), <fpage>408</fpage>&#8211;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.7334/psicothema2012.207</pub-id></mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="journal"><string-name><surname>Casillas</surname>, <given-names>J. V.</given-names></string-name>, <string-name><surname>Constantin-Dureci</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Rasc&#243;n</surname>, <given-names>I. A.</given-names></string-name>, <string-name><surname>Shao</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Rodr&#305;&#769;guez</surname>, <given-names>S. A.</given-names></string-name>, <string-name><surname>Gadamsetty</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Minetti</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Laungani</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Thatcher</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Gardere</surname>, <given-names>R.-T.</given-names></string-name>, et al. (<year>2023</year>). <source>Opening open science to all: Demystifying reproducibility and transparency practices in linguistic research</source>. <pub-id pub-id-type="doi">10.31234/osf.io/spz4w</pub-id></mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><string-name><surname>Claesen</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Vanpaemel</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Maerten</surname>, <given-names>A.-S.</given-names></string-name>, <string-name><surname>Verliefde</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Tuerlinckx</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Heyman</surname>, <given-names>T.</given-names></string-name> (<year>2023</year>). <article-title>Data sharing upon request and statistical consistency errors in psychology: A replication of Wicherts, Bakker and Molenaar (2011)</article-title>. <source>Plos One</source>, <volume>18</volume>(<issue>4</issue>), <elocation-id>e0284243</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0284243</pub-id></mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><string-name><surname>Colavizza</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Hrynaszkiewicz</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Staden</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Whitaker</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name><surname>McGillivray</surname>, <given-names>B.</given-names></string-name> (<year>2020</year>). <article-title>The citation advantage of linking publications to research data</article-title>. <source>PloS One</source>, <volume>15</volume>(<issue>4</issue>), <elocation-id>e0230416</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0230416</pub-id></mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><string-name><surname>Colombo</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Duev</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, &amp; <string-name><surname>Sprenger</surname>, <given-names>J.</given-names></string-name> (<year>2018</year>). <article-title>Statistical reporting inconsistencies in experimental philosophy</article-title>. <source>PLoS One</source>, <volume>13</volume>(<issue>4</issue>), <elocation-id>e0194360</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0194360</pub-id></mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="journal"><string-name><surname>Coretta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Casillas</surname>, <given-names>J. V.</given-names></string-name>, <string-name><surname>Roessig</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Franke</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Ahn</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Al-Hoorie</surname>, <given-names>A. H.</given-names></string-name>, <string-name><surname>Al-Tamimi</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Alotaibi</surname>, <given-names>N. E.</given-names></string-name>, <string-name><surname>AlShakhori</surname>, <given-names>M. K.</given-names></string-name>, <string-name><surname>Altmiller</surname>, <given-names>R. M.</given-names></string-name>, <string-name><surname>Arantes</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Athanasopoulou</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Baese-Berk</surname>, <given-names>M. M.</given-names></string-name>, <string-name><surname>Bailey</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Sangma</surname>, <given-names>C. B. A.</given-names></string-name>, <string-name><surname>Beier</surname>, <given-names>E. J.</given-names></string-name>, <string-name><surname>Benavides</surname>, <given-names>G. M.</given-names></string-name>, <string-name><surname>Benker</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>BensonMeyer</surname>, <given-names>E. P.</given-names></string-name>, &#8230; <string-name><surname>Roettger</surname>, <given-names>T. B.</given-names></string-name> (<year>2023</year>). <article-title>Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human-speech analyses</article-title>. <source>Advances in Methods and Practices in Psychological Science</source>, <volume>6</volume>(<issue>3</issue>), <elocation-id>25152459231162567</elocation-id>. <pub-id pub-id-type="doi">10.1177/25152459231162567</pub-id></mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="journal"><string-name><surname>Cumming</surname>, <given-names>G.</given-names></string-name> (<year>2014</year>). <article-title>The new statistics: Why and how</article-title>. <source>Psychological Science</source>, <volume>25</volume>(<issue>1</issue>), <fpage>7</fpage>&#8211;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1177/0956797613504966</pub-id></mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="journal"><string-name><surname>De Bruin</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Treccani</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Della Sala</surname>, <given-names>S.</given-names></string-name> (<year>2015</year>). <article-title>Cognitive advantage in bilingualism: An example of publication bias?</article-title> <source>Psychological Science</source>, <volume>26</volume>(<issue>1</issue>), <fpage>99</fpage>&#8211;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1177/0956797614557866</pub-id></mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="journal"><string-name><surname>Franco</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Malhotra</surname>, <given-names>N.</given-names></string-name>, &amp; <string-name><surname>Simonovits</surname>, <given-names>G.</given-names></string-name> (<year>2014</year>). <article-title>Publication bias in the social sciences: Unlocking the file drawer</article-title>. <source>Science</source>, <volume>345</volume>(<issue>6203</issue>), <fpage>1502</fpage>&#8211;<lpage>1505</lpage>. <pub-id pub-id-type="doi">10.1126/science.1255484</pub-id></mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><string-name><surname>Fugelsang</surname>, <given-names>J. A.</given-names></string-name>, <string-name><surname>Stein</surname>, <given-names>C. B.</given-names></string-name>, <string-name><surname>Green</surname>, <given-names>A. E.</given-names></string-name>, &amp; <string-name><surname>Dunbar</surname>, <given-names>K. N.</given-names></string-name> (<year>2004</year>). <article-title>Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory</article-title>. <source>Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Exp&#233;rimentale</source>, <volume>58</volume>(<issue>2</issue>), <elocation-id>86</elocation-id>. <pub-id pub-id-type="doi">10.1037/h0085799</pub-id></mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="journal"><string-name><surname>Garc&#305;&#769;a-Berthou</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Alcaraz</surname>, <given-names>C.</given-names></string-name> (<year>2004</year>). <article-title>Incongruence between test statistics and p values in medical papers</article-title>. <source>BMC Medical Research Methodology</source>, <volume>4</volume>, <fpage>1</fpage>&#8211;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2288-4-13</pub-id></mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><string-name><surname>Green</surname>, <given-names>C. D.</given-names></string-name>, <string-name><surname>Abbas</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Belliveau</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Beribisky</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Davidson</surname>, <given-names>I. J.</given-names></string-name>, <string-name><surname>DiGiovanni</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Heidari</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Martin</surname>, <given-names>S. M.</given-names></string-name>, <string-name><surname>Oosenbrug</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Wainewright</surname>, <given-names>L. M.</given-names></string-name> (<year>2018</year>). <article-title>Statcheck in Canada: What proportion of CPA journal articles contain errors in the reporting of p-values?</article-title> <source>Canadian Psychology/Psychologie Canadienne</source>, <volume>59</volume>(<issue>3</issue>), <elocation-id>203</elocation-id>. <pub-id pub-id-type="doi">10.1037/cap0000139</pub-id></mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="journal"><string-name><surname>Gro&#223;</surname>, <given-names>T.</given-names></string-name> (<year>2021</year>). <article-title>Fidelity of statistical reporting in 10 years of cyber security user studies</article-title>. <source>Socio-Technical Aspects in Security and Trust: 9th International Workshop, STAST 2019, Luxembourg City, Luxembourg, September 26, 2019, Revised Selected Papers</source> <volume>9</volume>, <fpage>3</fpage>&#8211;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-55958-8_1</pub-id></mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="journal"><string-name><surname>Hardwicke</surname>, <given-names>T. E.</given-names></string-name>, <string-name><surname>Mathur</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>MacDonald</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Nilsonne</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Banks</surname>, <given-names>G. C.</given-names></string-name>, <string-name><surname>Kidwell</surname>, <given-names>M. C.</given-names></string-name>, <string-name><surname>Hofelich Mohr</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Clayton</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Yoon</surname>, <given-names>E. J.</given-names></string-name>, <string-name><surname>Henry Tessler</surname>, <given-names>M.</given-names></string-name>, et al. (<year>2018</year>). <article-title>Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal <italic>Cognition</italic></article-title>. <source>Royal Society Open Science</source>, <volume>5</volume>(<issue>8</issue>), <elocation-id>180448</elocation-id>. <pub-id pub-id-type="doi">10.1098/rsos.180448</pub-id></mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Isbell</surname>, <given-names>D. R.</given-names></string-name>, <string-name><surname>Brown</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Derrick</surname>, <given-names>D. J.</given-names></string-name>, <string-name><surname>Ghanem</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Arvizu</surname>, <given-names>M. N. G.</given-names></string-name>, <string-name><surname>Schnur</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Plonsky</surname>, <given-names>L.</given-names></string-name> (<year>2022</year>). <article-title>Misconduct and questionable research practices: The ethics of quantitative data handling and reporting in applied linguistics</article-title>. <source>The Modern Language Journal</source>, <volume>106</volume>(<issue>1</issue>), <fpage>172</fpage>&#8211;<lpage>195</lpage>. <pub-id pub-id-type="doi">10.1111/modl.12760</pub-id></mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Isbilen</surname>, <given-names>E. S.</given-names></string-name>, &amp; <string-name><surname>Christiansen</surname>, <given-names>M. H.</given-names></string-name> (<year>2022</year>). <article-title>Statistical learning of language: A meta-analysis into 25 years of research</article-title>. <source>Cognitive Science</source>, <volume>46</volume>(<issue>9</issue>), <elocation-id>e13198</elocation-id>. <pub-id pub-id-type="doi">10.1111/cogs.13198</pub-id></mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="journal"><string-name><surname>John</surname>, <given-names>L. K.</given-names></string-name>, <string-name><surname>Loewenstein</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Prelec</surname>, <given-names>D.</given-names></string-name> (<year>2012</year>). <article-title>Measuring the prevalence of questionable research practices with incentives for truth telling</article-title>. <source>Psychological Science</source>, <volume>23</volume>(<issue>5</issue>), <fpage>524</fpage>&#8211;<lpage>532</lpage>. <pub-id pub-id-type="doi">10.1177/0956797611430953</pub-id></mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="journal"><string-name><surname>Kalmbach</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Hoffmann</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Lell</surname>, <given-names>N.</given-names></string-name>, &amp; <string-name><surname>Scherp</surname>, <given-names>A.</given-names></string-name> (<year>2023</year>). <article-title>On the rule-based extraction of statistics reported in scientific papers</article-title>. <source>International Conference on Applications of Natural Language to Information Systems</source>, <fpage>326</fpage>&#8211;<lpage>338</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-35320-8_23</pub-id></mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="journal"><string-name><surname>Knuth</surname>, <given-names>D. E.</given-names></string-name> (<year>1984</year>). <article-title>Literate programming</article-title>. <source>The Computer Journal</source>, <volume>27</volume>(<issue>2</issue>), <fpage>97</fpage>&#8211;<lpage>111</lpage>. <pub-id pub-id-type="doi">10.1093/comjnl/27.2.97</pub-id></mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="journal"><string-name><surname>Kobrock</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name><surname>Roettger</surname>, <given-names>T.</given-names></string-name> (<year>2023</year>). <article-title>Assessing the replication landscape in experimental linguistics</article-title>. <source>Glossa Psycholinguistics</source>, <volume>2</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.5070/G6011135</pub-id></mixed-citation></ref>
<ref id="B30"><mixed-citation publication-type="journal"><string-name><surname>Laurinavichyute</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Yadav</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name><surname>Vasishth</surname>, <given-names>S.</given-names></string-name> (<year>2022</year>). <article-title>Share the code, not just the data: A case study of the reproducibility of articles published in the <italic>Journal of Memory and Language</italic> under the Open Data Policy</article-title>. <source>Journal of Memory and Language</source>, <volume>125</volume>, <elocation-id>104332</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jml.2022.104332</pub-id></mixed-citation></ref>
<ref id="B31"><mixed-citation publication-type="journal"><string-name><surname>Lehtonen</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Soveri</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Laine</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>J&#228;rvenp&#228;&#228;</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>De Bruin</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Antfolk</surname>, <given-names>J.</given-names></string-name> (<year>2018</year>). <article-title>Is bilingualism associated with enhanced executive functioning in adults? A meta-analytic review</article-title>. <source>Psychological Bulletin</source>, <volume>144</volume>(<issue>4</issue>), <elocation-id>394</elocation-id>. <pub-id pub-id-type="doi">10.1037/bul0000142</pub-id></mixed-citation></ref>
<ref id="B32"><mixed-citation publication-type="journal"><string-name><surname>Lowndes</surname>, <given-names>J. S. S.</given-names></string-name>, <string-name><surname>Best</surname>, <given-names>B. D.</given-names></string-name>, <string-name><surname>Scarborough</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Afflerbach</surname>, <given-names>J. C.</given-names></string-name>, <string-name><surname>Frazier</surname>, <given-names>M. R.</given-names></string-name>, <string-name><surname>O&#8217;Hara</surname>, <given-names>C. C.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>N.</given-names></string-name>, &amp; <string-name><surname>Halpern</surname>, <given-names>B. S.</given-names></string-name> (<year>2017</year>). <article-title>Our path to better science in less time using open data science tools</article-title>. <source>Nature Ecology &amp; Evolution</source>, <volume>1</volume>(<issue>6</issue>), <elocation-id>0160</elocation-id>. <pub-id pub-id-type="doi">10.1038/s41559-017-0160</pub-id></mixed-citation></ref>
<ref id="B33"><mixed-citation publication-type="journal"><string-name><surname>Lu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Frank</surname>, <given-names>M. C.</given-names></string-name>, &amp; <string-name><surname>Degen</surname>, <given-names>J.</given-names></string-name> (<year>2024</year>). <article-title>A meta-analysis of syntactic satiation in extraction from islands</article-title>. <source>Glossa Psycholinguistics</source>, <volume>3</volume>(<issue>1</issue>). <pub-id pub-id-type="doi">10.5070/G60111425</pub-id></mixed-citation></ref>
<ref id="B34"><mixed-citation publication-type="journal"><string-name><surname>Nickerson</surname>, <given-names>R. S.</given-names></string-name> (<year>1998</year>). <article-title>Confirmation bias: A ubiquitous phenomenon in many guises</article-title>. <source>Review of General Psychology</source>, <volume>2</volume>(<issue>2</issue>), <fpage>175</fpage>&#8211;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.1037/1089-2680.2.2.175</pub-id></mixed-citation></ref>
<ref id="B35"><mixed-citation publication-type="journal"><string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>Assen</surname>, <given-names>M. A. van</given-names></string-name>, <string-name><surname>Hartgerink</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Epskamp</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2017</year>). <source>The validity of the tool &#8220;statcheck&#8221; in discovering statistical reporting inconsistencies</source>. <pub-id pub-id-type="doi">10.31234/osf.io/tcxaj</pub-id></mixed-citation></ref>
<ref id="B36"><mixed-citation publication-type="webpage"><string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, &amp; <string-name><surname>Epskamp</surname>, <given-names>S.</given-names></string-name> (<year>2024</year>). <source>Statcheck: Extract statistics from articles and recompute p-values(1.5.0)[r]</source>. <uri>https://github.com/MicheleNuijten/statcheck</uri></mixed-citation></ref>
<ref id="B37"><mixed-citation publication-type="journal"><string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>Hartgerink</surname>, <given-names>C. H.</given-names></string-name>, <string-name><surname>Van Assen</surname>, <given-names>M. A.</given-names></string-name>, <string-name><surname>Epskamp</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2016</year>). <article-title>The prevalence of statistical reporting errors in psychology (1985&#8211;2013)</article-title>. <source>Behavior Research Methods</source>, <volume>48</volume>, <fpage>1205</fpage>&#8211;<lpage>1226</lpage>. <pub-id pub-id-type="doi">10.5070/G6011135</pub-id></mixed-citation></ref>
<ref id="B38"><mixed-citation publication-type="journal"><string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, &amp; <string-name><surname>Polanin</surname>, <given-names>J. R.</given-names></string-name> (<year>2020</year>). <article-title>&#8220;Statcheck&#8221;: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses</article-title>. <source>Research Synthesis Methods</source>, <volume>11</volume>(<issue>5</issue>), <fpage>574</fpage>&#8211;<lpage>579</lpage>. <pub-id pub-id-type="doi">10.1002/jrsm.1408</pub-id></mixed-citation></ref>
<ref id="B39"><mixed-citation publication-type="webpage"><collab>R Core Team</collab>. (<year>2025</year>). <source>R: A language and environment for statistical computing</source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>. <uri>https://www.R-project.org/</uri></mixed-citation></ref>
<ref id="B40"><mixed-citation publication-type="journal"><string-name><surname>Roettger</surname>, <given-names>T. B.</given-names></string-name> (<year>2019</year>). <article-title>Researcher degrees of freedom in phonetic research</article-title>. <source>Laboratory Phonology</source>, <volume>10</volume>(<issue>1</issue>). <pub-id pub-id-type="doi">10.5334/labphon.147</pub-id></mixed-citation></ref>
<ref id="B41"><mixed-citation publication-type="journal"><string-name><surname>Sakaluk</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Williams</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Biernat</surname>, <given-names>M.</given-names></string-name> (<year>2014</year>). <article-title>Analytic review as a solution to the misreporting of statistical results in psychological science</article-title>. <source>Perspectives on Psychological Science</source>, <volume>9</volume>(<issue>6</issue>), <fpage>652</fpage>&#8211;<lpage>660</lpage>. <pub-id pub-id-type="doi">10.1177/1745691614549257</pub-id></mixed-citation></ref>
<ref id="B42"><mixed-citation publication-type="journal"><string-name><surname>Scheel</surname>, <given-names>A. M.</given-names></string-name> (<year>2022</year>). <article-title>Why most psychological research findings are not even wrong</article-title>. <source>Infant and Child Development</source>, <volume>31</volume>(<issue>1</issue>), <elocation-id>e2295</elocation-id>. <pub-id pub-id-type="doi">10.1002/icd.2295</pub-id></mixed-citation></ref>
<ref id="B43"><mixed-citation publication-type="journal"><string-name><surname>Scheel</surname>, <given-names>A. M.</given-names></string-name>, <string-name><surname>Tiokhin</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Isager</surname>, <given-names>P. M.</given-names></string-name>, &amp; <string-name><surname>Lakens</surname>, <given-names>D.</given-names></string-name> (<year>2021</year>). <article-title>Why hypothesis testers should spend less time testing hypotheses</article-title>. <source>Perspectives on Psychological Science</source>, <volume>16</volume>(<issue>4</issue>), <fpage>744</fpage>&#8211;<lpage>755</lpage>. <pub-id pub-id-type="doi">10.1177/1745691620966795</pub-id></mixed-citation></ref>
<ref id="B44"><mixed-citation publication-type="journal"><string-name><surname>Schmidt</surname>, <given-names>T.</given-names></string-name> (<year>2017</year>). <source>Statcheck does not work: All the numbers. Reply to Nuijten et al. (2017)</source>. <pub-id pub-id-type="doi">10.31234/osf.io/hr6qy</pub-id></mixed-citation></ref>
<ref id="B45"><mixed-citation publication-type="webpage"><string-name><surname>Sch&#246;nbrodt</surname>, <given-names>F. D.</given-names></string-name> (<year>2015</year>). <source>P-checker: One-for-all p-value analyzer</source>. <uri>http://shinyapps.org/apps/p-checker/</uri>.</mixed-citation></ref>
<ref id="B46"><mixed-citation publication-type="journal"><string-name><surname>Sonderegger</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>S&#243;skuthy</surname>, <given-names>M.</given-names></string-name> (<year>2024</year>). <source>Advancements of phonetics in the 21st century: Quantitative data analysis</source>. <pub-id pub-id-type="doi">10.31234/osf.io/mc6a9</pub-id></mixed-citation></ref>
<ref id="B47"><mixed-citation publication-type="journal"><string-name><surname>Sterling</surname>, <given-names>T. D.</given-names></string-name> (<year>1959</year>). <article-title>Publication decisions and their possible effects on inferences drawn from tests of significance &#8211; or vice versa</article-title>. <source>Journal of the American Statistical Association</source>, <volume>54</volume>(<issue>285</issue>), <fpage>30</fpage>&#8211;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.2307/2282137</pub-id></mixed-citation></ref>
<ref id="B48"><mixed-citation publication-type="journal"><string-name><surname>Van Aert</surname>, <given-names>R. C.</given-names></string-name>, <string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>Olsson-Collentine</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Stoevenbelt</surname>, <given-names>A. H.</given-names></string-name>, <string-name><surname>Van Den Akker</surname>, <given-names>O. R.</given-names></string-name>, <string-name><surname>Klein</surname>, <given-names>R. A.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2023</year>). <article-title>Comparing the prevalence of statistical reporting inconsistencies in COVID-19 preprints and matched controls: A registered report</article-title>. <source>Royal Society Open Science</source>, <volume>10</volume>(<issue>8</issue>), <elocation-id>202326</elocation-id>. <pub-id pub-id-type="doi">10.1098/rsos.202326</pub-id></mixed-citation></ref>
<ref id="B49"><mixed-citation publication-type="journal"><string-name><surname>Vasishth</surname>, <given-names>S.</given-names></string-name> (<year>2023</year>). <article-title>Some right ways to analyze (psycho)linguistic data</article-title>. <source>Annual Review of Linguistics</source>, <volume>9</volume>(<issue>1</issue>), <fpage>273</fpage>&#8211;<lpage>291</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-linguistics-031220-010345</pub-id></mixed-citation></ref>
<ref id="B50"><mixed-citation publication-type="journal"><string-name><surname>Vasishth</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mertzen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>J&#228;ger</surname>, <given-names>L. A.</given-names></string-name>, &amp; <string-name><surname>Gelman</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>The statistical significance filter leads to overoptimistic expectations of replicability</article-title>. <source>Journal of Memory and Language</source>, <volume>103</volume>, <fpage>151</fpage>&#8211;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2018.07.004</pub-id></mixed-citation></ref>
<ref id="B51"><mixed-citation publication-type="journal"><string-name><surname>Veldkamp</surname>, <given-names>C. L.</given-names></string-name>, <string-name><surname>Nuijten</surname>, <given-names>M. B.</given-names></string-name>, <string-name><surname>Dominguez-Alvarez</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Van Assen</surname>, <given-names>M. A.</given-names></string-name>, &amp; <string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name> (<year>2014</year>). <article-title>Statistical reporting errors and collaboration on statistical analyses in psychological science</article-title>. <source>PLoS One</source>, <volume>9</volume>(<issue>12</issue>), <elocation-id>e114876</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0114876</pub-id></mixed-citation></ref>
<ref id="B52"><mixed-citation publication-type="journal"><string-name><surname>Wicherts</surname>, <given-names>J. M.</given-names></string-name>, <string-name><surname>Bakker</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Molenaar</surname>, <given-names>D.</given-names></string-name> (<year>2011</year>). <article-title>Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results</article-title>. <source>PLoS One</source>, <volume>6</volume>(<issue>11</issue>), <elocation-id>e26828</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0026828</pub-id></mixed-citation></ref>
</ref-list>
</back>
</article>