English vowel sounds
This page covers 16 English vowel sounds: 13 single vowels and 3 diphthongs. A vowel is an open speech sound: air flows freely out of the mouth, shaped by the position of the tongue, jaw and lips, with no obstruction in the vocal tract. Below, each vowel is shown with an example word and a short video showing how the sound is made.
Every vowel is voiced, which means it is produced with the vocal folds vibrating. The air passes from the lungs, up through the windpipe (trachea) and voice box (larynx), and out through the mouth. Vowels are commonly described by:
- The part of the tongue involved: front, central or back.
- The tongue’s height relative to the palate: high, mid or low.
- The shape of the lips: rounded or unrounded.
- The length of the sound: long or short.
The vowel chart shown here maps each vowel by tongue position: front to back across the top, and high to low down the side. It roughly represents where the tongue sits in the mouth when each vowel is produced.
The table below describes each English vowel sound, including tongue position, lip shape, length and an example word. Click any video link to watch a short animation showing how that vowel sound is made.
Diphthongs
A diphthong is a vowel sound that glides from one position to another within a single syllable, as in “bite”, “brown” and “boy”. Although each diphthong is treated as a single vowel sound, it is written with two symbols to show where the sound starts and where it ends.
Go beyond the vowel chart
This page explains each vowel sound, but it cannot show how sounds combine in words and sentences or tell you whether you are producing them correctly. Pronunciation Coach 3D lets you type any English word or sentence and see how the tongue, lips and airflow shape each sound, then record yourself and get an instant pronunciation score.
See how Pronunciation Coach 3D works →Frequently asked questions
American English is commonly analysed as having around 16 vowel sounds: 13 single vowels and 3 diphthongs, or gliding vowels, as in “bite”, “brown” and “boy”. You may see the figure 20 quoted elsewhere; this usually refers to British English, where several vowels are analysed differently. English has only five vowel letters (a, e, i, o, u) but many more vowel sounds.
A vowel sound is an open speech sound made without blocking the flow of air through the mouth. The air passes out freely, shaped by the position of the tongue, jaw and lips. Vowels are voiced sounds, meaning the vocal folds vibrate as they are produced. Consonants, by contrast, involve some obstruction of the airflow.
A diphthong is a vowel sound that glides from one position to another within a single syllable, as in “bite”, “brown” and “boy”. Although each diphthong is treated as a single vowel sound, it is written with two symbols to show where the sound starts and where it ends. American English is usually described as having three diphthongs.
Long and short refer to how the vowel is held and shaped, not simply its duration. Long vowels, such as the /i/ sound in “cream”, are generally more tense and may be held longer. Short vowels, such as the /ɪ/ sound in “bit”, are quicker and more relaxed. The two often differ in tongue position as well as length.
The vowels on this page are for American English. Many vowel sounds are shared with British English, but some differ, especially r-coloured vowels such as the /ɚ/ sound in “burn”, and some of the symbols used. British English is often described with around 20 vowel sounds and a slightly different set of symbols, which is why you may see different vowel charts elsewhere.
Each vowel in the table above links to a short video showing how that vowel sound is produced. To go further, Pronunciation Coach 3D lets you type any English word or sentence, hear it spoken, and watch an animated 3D model of the mouth showing how the tongue, lips and airflow shape each sound.
The best way to improve is to practise and compare. Pronunciation Coach 3D records your speech and displays it as a waveform alongside an animated 3D model of the mouth, so you can compare your pronunciation with the model. It also gives a speech intelligibility score, showing how clearly your speech was understood.