Help us improve speech science.
Be part of a large collaborative project.
We constantly do it, it is part of every empirical project. Data analysis. It's a complex process. There are often many possible analytic strategies that could be used on one and the same dataset. This is true for the type of statistical analyses one can choose from, but also for which kind of measurements one extracts from the data in the first place. Speech scientists know that all too well. Comparing the phonetic signal of two utterances allows thousands of different analytical paths. Which aspect of the signal do we measure? How do we measure it? In what part of the utterance do we measure it?
The Many Speech Analyses project is looking for speech researchers to independently analyze the same data set in order to answer the same research question. You can participate as a solo analyst or form a team of researchers. Each analyst will then serve as reviewers who will critically assess the analyses of the other teams.
How does this work? We will provide each team with the same set of recordings and annotated TextGrids and one research question to answer. You can find more information on the data, research question and expected project timeline below. We will give all teams a full debriefing as the first step after joining the project. Then, it will be up to you to decide how to takle the research question, from choosing which phonetic measurements or other type of data to obtain from the recordings to how to statistically analyze them. Each team will submit a report of their analysis and will then review the analyses of four of the other teams. Finally, we (the project coordinators) will aggregate all of the individual analyses together using advanced meta-analytical techniques. Pooling different analyses this way, rather than depending on any single analysis, can help increase the robustness and generalisability of the resaerch outcome, whatever that might be.
What is in it for you? By joining this project and completing the tasks of submitting a report of your analysis and reviewing four other reports, all the members of your team will become co-authors on the final publication. The good news is, the publication has already been in-principle accepted in AMPPS as a Registered Report. A Registered Report is a publishing format that emphasizes the importance of the research question and the quality of methodology by conducting peer review prior to data collection. High quality protocols are then provisionally accepted for publication if the authors follow through with the registered methodology. Concretely, this means that publication of the results of this study are in principle guaranteed and your participation will lead to a guaranteed co-authorship of a publication in AMPPS. To reduce the influence of the accepted meta-analytical protocol on the analyses of the individual teams, you will be given access to the Registered Report after submitting the analysis reports and reviews to the project coordinators. Finally, you will be part of an international collaborative effort to understand and improve the robustness of our scientific discoveries.
Blue Banana? The data set
Watch this short video
The dataset used in this project investigates the acoustics of referring expressions like “the blue banana”. Referring is one of the most basic uses of language. How does a speaker choose a referential expression when they want to refer to a specific object like a banana? The object’s context plays a large role in that choice. Generally, speakers aim to be as informative as possible to uniquely establish a referent (Grice 1975). Thus we expect them to only use for example a modifier like “blue” if it is strictly necessary for disambiguation (e.g. the adjective yellow when there is a yellow and a less ripe green banana).
But there is much evidence against strictly rational speakers. Speakers are often overinformative, i.e. they use referring expressions that are more specific than necessary. This redundancy has been argued to facilitate object identification and more efficient communication (Arts et al. 2011, Paraboni et al. 2007, Rubio-Fernandez 2016). For example, Degen et al. (2020) show that modifiers that are less typical for a given referent (e.g. a blue banana) are more likely to be used in an overinformative scenario (e.g. when there is just one banana).
The literature has a strong focus on whether a certain referential expression is chosen or not. However, speech communication much richer, allowing for more subtle enhancements of the communicative signal to make referential disambiguation easier. Spoken languages utilize suprasegmental aspects of speech to disambiguate referents and signal (un)predictable content. So we ask:
Do speakers phonetically modulate utterances to signal atypical word combinations? (e.g. a blue banana vs. a yellow banana)?
To answer this question, we analyse recordings from an experimental study. We elicited sentences with noun-modifier pairs of varying typicality. Native German speakers were asked to instruct a confederate to select a specific referent among four objects presented on a screen. The instructions were presented to the subject via written form, which the subject read out loud to the confederate. An example input is 'You should put the cube on the blue banana' (Du sollst den Würfel auf die blaue Banane ablegen). The target noun-modifier pair was either typical ('yellow banana'), medium-typical ('green banana'), or atypical ('blue banana'). Your task will be to investigate whether typicality affects the phonetics of the utterances.
Our road map
|I – Recruitment||We are recruiting researchers in speech science now! Click on "Join the project" below.||2021-12-15|
|II – Analyses by analysis teams||Each analytic team performs the analyses on the given dataset.||2022-04-15|
|III – Peer review||Team members review other teams' analyses.||2022-06-01|
|IV – Analysis by project coordinations||The project coordinators run the analyses detailed in the In-Principle Accepted Registered Report.||2022-07-01|
|V – Collaborative Writing||A paper draft is written by the project coordination and circulated to all authors asking for feedback before the final submission.||2022-08-15|
Here is a visual overview of the timeline.
The corpus is in German. Do I need to know German to analyze the data?
You don't need to have an active knowledge of German to analyze the corpus. We will provide annotated TextGrids that help navigating the speech signal, trial lists that contain all relevant target words (and translations), and a detailed description of stimuli generation and experimental procedure.
Is the data already segmented?
We provide annotated TextGrids with three tiers: utterances, words, and segments. The utterance tier is time-aligned to the recordings, but the word and segment tiers are not (there are intervals with labels for words and segments within the utterances, but these are not aligned to the recording). If your analysis requires word or segment level time-alignment, this will have to be done by the team.
What are the reviews of the team analyses for?
After evaluating the amount of variability across teams, we want to assess the perceived quality of individual analyses by multiple reviewers. This will enable us to weight the individual analyses based on the reviews when we will pool them together in the meta-analysis.
How are reviews assigned?
We will randomly assign four analyses to review to each analysis team. In other words, after you have completed your analysis, you will receive analysis reports from four other teams to evaluate.
How will time be managed?
The time frame for each step of the project is set and can be found above. Analysts will have four months to finish their analysis (start date: 2021-12-15, deadline: 2022-04-14). After submitting the analyses, teams have about another six weeks to review the analysis reports of four other teams (start date: 2022-04-15, deadline: 2022-05-31). Co-authorship on the final paper is contingent on submitting both analyses and reviews on time.
Will you compare the analyses against each other?
No, we won't compare the individual analyses. We will pool all the analyses and their scientific conclusion using meta-analytical tools.
What types of analyses will the teams have to do?
Every team will have to decide on both phonetic analysis and statistical analysis based on what they deem most appropriate to answer the research question. There are no constraints on how the data can be analysed.
Is it alright to focus on only a select number of variables (e.g., duration and f0) and leave out others that I'm less familiar with/less interested in (like vowel quality)? Or should every report be as comprehensive as possible?
Every team will be able to decide on appropriate acoustic and statistical analysis themselves. One aspect of the process is the variable selection. There are no constraints on how the data will be analysed.
Will our analysis be publicly available?
Yes, all analyses will be stored on a publicly available repository on the Open Science Framework.