Acoustic Analysis: Animal vs. Vowel Mimic Exercises

acoustic-analysis

voice-quality

vocal-exercises

MFA-2026

Author

Morgan Pay

Published

February 28, 2026

This analysis presents acoustic and perceptual data from a study comparing vocal exercises derived from animal imitations (owl, cat, cow) with exercises derived from the corresponding vowel imitations (/u/, /æ/, /ɑ/). Five participants each completed two sessions — one animal-based, one vowel-based — with acoustic measures extracted from mimics, exercises, and a target song (“Somewhere Over the Rainbow”) recorded at baseline and after each exercise block.

Acoustic measures: f0, jitter, shimmer, HNR, intensity (calibrated), spectral moments (CoG, spectral SD, skewness, kurtosis), CPPS, H1-H2, F1, F2, alpha ratio, L/H ratio, LTAS slope/tilt, AVQI.

Perceptual measure: Omni-VES (Voice Evaluation Scale), rated by researcher and participant self-assessment.

Statistical approach: Cohen’s d with bootstrapped 95% CIs, Spearman rank correlations. Emphasis on effect sizes and patterns over p-values given n = 5.

Formant caveat: Formant estimation is unreliable when f0 exceeds ~350 Hz. Owl mimics (~570 Hz) and “where” targets (~520 Hz) are flagged; cat mimics (~430 Hz) are borderline; cow mimics (~320 Hz) and “some” targets (~260 Hz) are reliable.

Study Design

Five participants each completed two recording sessions:

Session A (Animal): Imitate an animal sound (owl, cat, cow) → sing a derived exercise → sing “Somewhere Over the Rainbow”
Session B (Vowel): Imitate the corresponding vowel (/u/, /æ/, /ɑ/) → sing a derived exercise → sing “Somewhere Over the Rainbow”

Three animal-vowel pairs target different vowels and vocal behaviors:

Pair	Animal	Vowel	Approximate f0	Primary Domain
Owl / /u/	Owl hoo	/u/	~570 Hz	Intensity, stability
Cat / /æ/	Meow	/æ/	~430 Hz	Resonance, spectrum
Cow / /ɑ/	Moo	/ɑ/	~320 Hz	Phonation, registration

Each session also included AVQI recordings (sustained vowel + continuous speech) before and after the exercise protocol, and “Somewhere Over the Rainbow” at baseline and after each of the three exercise blocks.

Each Animal Mimic Invites a Different Vocal Adjustment

The strongest finding in this study: the three animal mimics do not all work the same way. Each one changes the voice on different acoustic dimensions compared to its vowel counterpart.

The heatmap below shows Cohen’s d effect sizes for animal vs. vowel mimics, broken out by pair. Blue = animal mimic is higher; red = vowel mimic is higher. Hover for confidence intervals.

Figure 1

Reading the heatmap: Each cell shows the standardized difference (Cohen’s d) between the animal mimic and its vowel counterpart. Positive values (blue) mean the animal mimic scored higher; negative values (red) mean the vowel mimic scored higher.

Three distinct acoustic signatures emerge:

Cow changes phonation: HNR d = +3.91, f0 SD d = −2.55, lower perturbation across the board. The cow sound invites a fundamentally different way of using the voice compared to singing /ɑ/ — more modal, more stable, lower pitch.
Cat changes resonance and spectrum: spectral SD d = +3.21, F2 d = +2.78, CoG d = +1.93. The meow concentrates energy in the upper spectrum in a way that singing /æ/ does not.
Owl changes intensity and stability: intensity d = +2.57, shimmer dB d = −2.13. The owl hoo is essentially a more projected, more stable version of the vowel.

This is the most robust finding in the study. The effect sizes are very large (d > 2 for multiple measures across all three pairs) and consistent within pairs.

Transfer from Mimic to Exercise

Do the acoustic differences established during mimicry carry over into the derived exercises? The side-by-side heatmaps below compare effect sizes during the mimic phase (left) and exercise phase (right) for each pair.

Figure 2

Most mimic-phase effects attenuate when participants move to the exercise — the exercise heatmap is “cooler” (closer to zero) than the mimic heatmap. But the degree of transfer varies by pair:

Cat pair preserves spectral features best: spectral SD and CoG maintain 65–82% of their mimic-phase effect size during exercises.
Cow pair preserves voice quality partially: HNR retains ~24% of its mimic effect; shimmer dB retains ~43%.
Owl pair shows the least transfer: intensity drops 84%; shimmer dB reverses entirely.

Some effects emerge in the exercise phase that weren’t present in mimics — notably, Cow H1-H2 becomes meaningful only during exercises, suggesting the exercise develops a phonation-type difference that the mimic merely hints at.

Rep-by-Rep Trajectories

Exercises are not static across repetitions. The plots below track five key measures across 3–6 reps for each pair, showing individual participant lines (thin) and group means (thick). The animal-derived exercise is shown in solid lines; the vowel-derived exercise in dashed lines.

Figure 3

The most striking trajectory is Cow H1-H2: the cow-derived exercise becomes breathier across repetitions (H1-H2 increases) while the /ɑ/-derived exercise becomes more pressed (H1-H2 decreases). This divergence suggests that the motor learning happening across repetitions is itself condition-dependent — the animal mimic seeds a vocal pattern that evolves as the singer repeats the exercise.

Other observations: - Cat H1-H2 shows parallel downward trends — both conditions become more pressed, but the cat-derived exercise starts from a higher baseline. - Alpha ratio diverges for Cow but not for Cat or Owl, consistent with Cow’s phonation-domain effect. - Jitter is generally stable across reps, with the animal condition lower (less perturbed) for Cow and Owl.

Carry-Over into “Somewhere Over the Rainbow”

After each exercise block, participants sang “Somewhere Over the Rainbow.” The plots below track acoustic measures on two sustained pitch targets — “some” (~260 Hz, reliable formants) and “where” (~520 Hz, unreliable formants) — from baseline through three post-exercise measurements.

Individual participant lines are shown (thin, colored) with the group mean overlaid (thick black = animal, thick gray dashed = vowel).

Figure 4

Individual variability overwhelms any group-level carry-over pattern. This is a sample-size limitation (n = 5), not evidence of no carry-over. The individual trajectories show that some participants respond more consistently than others, and the direction of change is not uniform across participants or measures.

Despite the noisy group-level data, the researcher consistently perceived progressive improvement within sessions — reduced strain, better registration connection, more balanced resonance — in both protocols. The qualitative narrative tells a clearer story than the acoustic measures at this sample size.

What Predicts Perceived Voice Quality?

The heatmap below shows Spearman correlations between Omni-VES ratings and each acoustic measure, broken out by vowel group. The “all” column pools all vowel contexts; the individual columns show within-vowel relationships.

Figure 5

The pooled correlations (“All” column) appear counterintuitive: more shimmer and jitter correlate positively with higher VES scores, while HNR correlates negatively. But this is a between-vowel confound — the cow pair, which produces higher perturbation and lower HNR, also receives the highest VES ratings because it is the most demanding task.

Within vowels, the story changes:

For /ɑ/: Shimmer (ρ = +0.50) and HNR (ρ = −0.50) still correlate with VES, but CPPS and H1-H2 also appear. Breathier phonation and lower cepstral prominence associate with higher VES.
For /æ/: Skewness, kurtosis, and intensity are the strongest predictors. Less spectral complexity associates with higher VES.
For /u/: The pattern partially reverses. HNR is positively correlated with VES, opposite to the pooled direction. LTAS tilt is the strongest predictor.

The VES paradox: The qualitative notes illuminate why — owl mimics receive consistently positive quality descriptions (“ease,” “lofted space,” “relaxed”) but the lowest VES scores (researcher VES = 2 for 4 of 5 participants). Cow mimics receive notes about strain and effort but the highest VES scores. The VES is functioning as a task demand scale, not a voice quality scale. When a task is easy and well-executed, the score is low — the effort isn’t there to rate.

Pair	Typical Researcher VES	Quality Language
Owl / /u/	2	“none to be heard” [strain], “good mimicry,” “release and ease”
Cat / /æ/	3–4	“twang/ping” alongside “pressing,” “constriction”
Cow / /ɑ/	4–6	“vocal fry,” “pressing,” “registration breaks”

AVQI Pre/Post

The Acoustic Voice Quality Index (AVQI) was measured from sustained vowel + continuous speech recordings before and after each session. Neither protocol harms overall voice quality.

Figure 6

AVQI values are stable or slightly improved pre-to-post in both conditions. No participant shows a clinically meaningful worsening. This means neither the animal nor the vowel exercise protocol introduces vocal strain or degradation at the whole-voice level — an important baseline finding even if AVQI is too blunt an instrument to capture the fine-grained acoustic shifts documented above.

Note: P4 Session B (Vowel) post-AVQI was not available.

Qualitative Observations

The researcher (Morgan) recorded observations on strain/tension, breath support, resonance/placement, and general impressions for every segment across all 10 sessions. These observations ground the acoustic findings in the language voice teachers actually use.

Themes by Animal-Vowel Pair

Owl / /u/ — “Ease,” “lofted space,” “relaxed”

The owl mimic consistently elicited the most positive quality language. Descriptions include: “none to be heard” (re: strain), “a good amount of release and ease,” “good lofted, back mouth space,” and “all four reps are very similar in timbre, pitch, breath control.” The owl exercises continued this pattern: “still sounded easeful,” “relatively free, easy, relaxed vibrato and breath.” Acoustically, this maps to low perturbation, high intensity, and spectrally narrow output.

Cat / /æ/ — “Twang/ping,” “pressing,” “constriction”

The cat mimic produced mixed observations — positive resonance features alongside effortful production: “some good ping starting to develop” but also “some tightness potentially around the larynx/pharynx” and “some pharyngeal constriction.” Multiple participants naturally produced an /m/ onset, reflecting the natural onset of “meow.” Acoustically, this maps to high spectral SD, high CoG, and high F2 — the meow engages M2 resonance but sometimes through effort rather than ease.

Cow / /ɑ/ — “Vocal fry,” “pressing in fry,” “registration breaks”

The cow mimic is defined by registration traversal — chest voice through fry into head voice. Notes consistently flag: “some strain in the fry/chest,” “brightness in the chest/warmth in the head — not much connection throughout,” and “fry to M2 to fry.” The exercises show the pattern continuing: “seems to be some pressing at the vocal fold level here.” This maps to the HNR/f0 SD/perturbation pattern, and the H1-H2 divergence across reps.

“Somewhere” Trajectory — What the Listener Heard

Despite noisy group-level acoustic data, the researcher consistently perceived progressive improvement:

P1: Baseline “some adduction issues” → Post-Cow “less overall tension, better phrasing, more focus and ping, seems to be an improvement in overall function”
P2: Baseline “some slight strain on ascending leaps” → Post-Cow “much more ease in transition areas, the connection between the larger leaps were very successful”
P4: Baseline “slight strain on ascending leaps” → Post-Cow “the least amount of strain yet! Registration is connected, best take yet!”
P5: Baseline “excessive airy quality due to hypofunction” → Post-Cat “we can hear the mechanism making adjustments here in real time”

Both protocols appear to warm up the voice and build coordination. The kind of improvement may differ — the animal session developing more M2 access and registration connection, the vowel session developing more breath flow and release — but both trajectories point toward improved function.

Summary

Finding	Confidence	Evidence
The three animal-vowel pairs work through different acoustic mechanisms	High	Very large effect sizes (d > 2 for multiple measures across all three pairs), consistent within pairs
Animal mimics produce acoustically distinct vocalizations vs. vowel mimics	High	Every pair shows at least two measures with d > 1.5
Exercises partially inherit the mimic’s acoustic signature	Moderate	Transfer is real but attenuated and pair-dependent; Cat spectral features transfer best
Neither protocol harms overall voice quality	High	AVQI stable or improved pre-to-post in both conditions
Perceived voice quality (VES) is vowel-dependent	Moderate	Different acoustic dimensions predict VES for each vowel; VES functions as task demand, not quality
Cow H1-H2 diverges across exercise reps	Exploratory	Visible in trajectories but may be driven by 1–2 participants at n = 5
Carry-over into “Somewhere” is not clearly detectable at this sample size	Noted	Individual variability overwhelms group signal; perceptual improvement perceived by researcher

Limitations

Sample size (n = 5): Effect sizes are unstable, confidence intervals wide. Findings describe these five singers, not a generalizable population.
Order effects: Cannot fully separate condition effects from session-order effects.
Formant reliability: Unreliable at f0 > 350 Hz (owl mimics, “where” targets).
Single rater: VES rated by one researcher without inter-rater reliability.

This study is designed as a proof-of-concept and pattern identification, not as a definitive test of the hypothesis. The acoustic signatures are robust; the generalizability awaits replication with a larger sample.

Advisors

Kayla Gautereaux — Vocal Pedagogy Research Advisor
Ian Howell — Analysis Advisor
Morgan Stahl — Thesis Advisor
Josh Gilbert — Thesis Advisor