Section A: Project Information
SpeakEasy is an innovative AI-powered pronunciation assessment tool designed to help students, teachers, and language enthusiasts improve their English pronunciation. By leveraging cutting-edge technologies like Whisper and Phonemizer, SpeakEasy provides instant feedback through phoneme breakdowns, making it easier to identify pronunciation errors and correct them efficiently. The tool offers a simple, interactive interface where users can upload audio files and see an immediate, visual representation of their pronunciation accuracy.
Design Concepts
The design of SpeakEasy prioritizes ease of use and accessibility. I wanted to create a tool that anyone could use, whether you’re a beginner or an advanced learner. The interface is intuitive—simply upload an audio file, enter the correct text, and get an immediate breakdown of how closely your speech matches the ideal pronunciation. The color-coded phoneme results (green for correct and red for errors) make it easy for learners to quickly identify and address areas for improvement.
Technical Principles
At the core of SpeakEasy is the combination of two advanced technologies: Whisper, a speech recognition system, and Phonemizer, a phoneme breakdown tool. Whisper transcribes the spoken words in the audio file into text, while Phonemizer breaks the text down into its phonemic components. By comparing these components with the ideal phonemes, SpeakEasy provides visual feedback on how accurately the user pronounced the words. This feedback is clear, actionable, and easy to understand, even for those with no technical background.
Potential Impact
The potential impact of SpeakEasy on language learners is profound. By providing immediate, visual feedback on pronunciation, SpeakEasy enables learners to self-correct and improve faster. For teachers, it’s an invaluable tool to assess students’ pronunciation without having to listen to every single audio recording manually. SpeakEasy also promotes more efficient language learning, helping learners gain confidence in their pronunciation and ultimately become more fluent speakers. Its simplicity, combined with powerful AI, has the potential to revolutionize how pronunciation is taught and assessed, making learning more engaging and effective.
Section B: Participant Information
Title | First Name | Last Name | Organisation/Institution | Faculty/Department/Unit | Phone Number | Contact Person / Team Leader | |
---|---|---|---|---|---|---|---|
Miss. | Jiayi | Qiu | Primary School | English Language Education | coni.qiu@gmail.com | 65805208 |
Section C: Project Details
Problem Identification and Relevance in Education
As a primary English teacher in Hong Kong, I’ve witnessed firsthand the challenges that many students face when learning English, particularly in the realm of pronunciation. A common issue that I noticed was that students often struggle with Received Pronunciation (RP) English due to the influence of their Cantonese accent. This is not just a minor issue; it’s a significant barrier to clear communication and fluency in English. RP is seen as the "standard" pronunciation in many educational settings, but for Cantonese speakers, the differences in sound systems between Cantonese and English can make it difficult to master the nuances of English pronunciation.
This struggle is further compounded by a lack of tailored resources for Cantonese-speaking students. They often receive generic feedback or rely on traditional pronunciation methods that do not directly address the unique challenges they face with sounds that are absent in Cantonese. As a result, students develop habits that are hard to break, affecting their confidence and fluency in English. Having the experience of teaching in both local and international settings, I found that students who learn IPA symbols have a better command of pronunciation skills. That's why I developed this tool for students to directly see their speech and the correct version of IPA symbols.
Inspired by these observations, I developed SpeakEasy, an AI-powered pronunciation tool aimed at helping students overcome these barriers. The hypothesis underlying this project is simple: If students receive immediate, visual feedback on their pronunciation, they can identify and correct their mistakes more effectively. By using AI technologies like Whisper and Phonemizer, SpeakEasy offers real-time phoneme breakdowns, making it easier for students to compare their speech with native RP pronunciation.
I believe SpeakEasy will succeed because it directly addresses a pressing issue in language learning—pronunciation—through an innovative, user-friendly platform. It empowers students to correct mistakes on their own and fosters a deeper understanding of the subtleties of English pronunciation.
N/A
/ stream 3
Technical Implementation and Performance
Functional Architecture and Technical Workflow
The core of SpeakEasy is built around two primary technologies: Whisper and Phonemizer. The workflow begins when the user uploads an audio file, typically containing a sentence or word they wish to assess. The Whisper model, a state-of-the-art automatic speech recognition (ASR) system, transcribes the audio into text. The transcription is then passed to the Phonemizer tool, which breaks down the text into phonemes, the smallest units of sound in spoken language. Finally, the tool compares the user's pronunciation to the ideal IPA (International Phonetic Alphabet) symbols, providing visual feedback on pronunciation accuracy.
Implementation of Innovative Features
One of the key innovations of SpeakEasy is the use of phoneme breakdowns paired with color-coded feedback. This allows learners to see at a glance whether their pronunciation aligns with standard RP English. If errors are detected, they are highlighted in red, while correct phonemes are shown in green. This clear and immediate feedback empowers users to understand the areas where they need improvement. The phoneme comparison also facilitates better retention and awareness of the subtleties of pronunciation.
The project was developed iteratively, starting with the integration of Whisper for transcription, followed by the integration of Phonemizer for phonemic breakdowns. The final touch was adding a user-friendly interface that enables effortless audio uploads and results display.
Design and Development Timeline
Phase 1 (Week 1): Initial concept development, integration of Whisper for transcription.
Phase 2 (Week 3): Integration of Phonemizer and phoneme breakdown feature.
Phase 3 (Week 4-6): User interface design, ensuring accessibility and ease of use.
Phase 4 (Week 6&7): Testing and refinement based on feedback, optimization for performance.
Performance Metrics
To evaluate effectiveness, SpeakEasy will be assessed through:
1/ Accuracy of phoneme breakdown: Ensuring that the tool consistently identifies the correct phonemes.
2/ User engagement: Monitoring the number of users, frequency of usage, and user retention.
3/ Feedback quality: User satisfaction with the feedback provided, assessed through surveys and user feedback.
4/ Pronunciation improvement: Tracking students’ improvement in pronunciation accuracy through repeated assessments.
Implementation and Conversion Work Plan
Moving forward, I plan to continuously improve the model by incorporating more complex speech patterns and expanding language support. Additionally, I will develop a mobile version of the tool to reach more users. Conversion to other platforms will include integrating with existing learning management systems for wider accessibility.
Relationship Between Project Functions and Technologies
The relationship between the project’s functions and technologies is straightforward. Whisper handles the transcription aspect, while Phonemizer interprets the pronunciation using IPA. The feedback mechanism ties the two together, providing learners with the insights they need to improve their pronunciation. Progress thus far includes successful integration and preliminary testing, with positive results in terms of transcription accuracy and phoneme analysis.
SpeakEasy represents a truly innovative solution to a problem that has long been a challenge for English learners, particularly in Hong Kong. As an educator, I’ve witnessed the struggles of students who, despite their hard work, find it difficult to master English pronunciation due to the strong influence of their Cantonese accent. Existing tools often fail to address the specific needs of Cantonese-speaking learners, who are often taught with a one-size-fits-all approach. SpeakEasy changes this by providing immediate, actionable feedback on pronunciation accuracy through the use of AI-powered phoneme breakdowns.
What sets SpeakEasy apart is its simplicity and directness. Many pronunciation tools focus on general corrections, but SpeakEasy focuses on providing detailed, phoneme-level feedback. By using Whisper for transcription and Phonemizer for phonemic breakdown, the tool allows learners to see exactly where their pronunciation deviates from the ideal RP (Received Pronunciation) model. This granular level of feedback gives learners the clarity they need to make targeted improvements in their speech.
The innovation in SpeakEasy lies not just in the technology but in how it’s designed to be user-friendly and accessible. It doesn’t require any advanced knowledge of linguistics or technology to use—just a simple audio upload and instant feedback. The tool is not only an effective language-learning resource but also an engaging experience that fosters a sense of achievement as learners progress in their pronunciation skills.
In conclusion, SpeakEasy is a creative fusion of AI technology and language education, designed to bridge the gap between theory and practice. It allows learners to see their pronunciation mistakes visually and instantly, making language learning more effective, engaging, and accessible.
As the developer of SpeakEasy, my focus is not only on building a tool that addresses the current needs of English learners but also on ensuring that it is scalable and sustainable for future growth.
Scalability is crucial to meet the increasing demand for SpeakEasy, especially as more students and educators across different regions begin to utilize it. At this stage, the platform is designed with teachers in mind, as they typically have the necessary knowledge of IPA (International Phonetic Alphabet) to help interpret the feedback. However, I recognize that for students, understanding IPA is essential before they can fully benefit from the tool. I plan to improve the user experience for students by providing educational resources and tutorials to help them learn IPA alongside the tool’s functionality. In the future, I hope to enhance the platform’s accessibility by incorporating visual guides and audio prompts that can support students in understanding the phoneme breakdowns more intuitively.
To handle the increasing number of users and ensure a smooth experience, I have built the platform on cloud infrastructure, which can scale to accommodate more users as demand grows. This approach minimizes the risk of bottlenecks and ensures that the tool remains responsive, even during peak usage periods. Additionally, as the platform grows, I plan to continue refining the AI models to make them more accurate and adaptable to different accents, allowing for wider applicability.
Regarding sustainability, I aim to keep the platform environmentally conscious by using energy-efficient cloud services powered by renewable energy. This ensures that the project remains sustainable both in terms of environmental impact and long-term usability.
In the coming months, I plan to refine SpeakEasy to make it more user-friendly for both teachers and students, while also focusing on continuous improvement to ensure the tool remains effective and relevant for the long term.
SpeakEasy is designed not only as a tool for improving English pronunciation but also as a resource that can have a significant social impact by addressing specific issues faced by learners in Hong Kong, particularly those who struggle with the influence of their Cantonese accent on their English pronunciation. For many students, this challenge can create barriers to academic success and job opportunities, limiting their potential. By providing an AI-powered, easy-to-use platform that helps learners visualize their pronunciation mistakes and correct them with precision, SpeakEasy has the potential to level the playing field and create more equitable opportunities for learners of all backgrounds.
One of the most important aspects of SpeakEasy is its alignment with broader social goals, such as equity and inclusion. Language proficiency is a fundamental skill in today’s globalized world, yet many students, especially those in underprivileged communities, lack the resources to receive the personalized instruction they need. SpeakEasy helps bridge this gap by offering an affordable, scalable solution that any learner with access to the internet can use. It makes language education more accessible, particularly for those who may not have the means to attend private lessons or expensive language courses.
In terms of social responsibility, I am committed to ensuring the tool evolves with the needs of its community. I will rely on user feedback, such as surveys and engagement metrics, to continually refine and improve SpeakEasy. Key metrics for measuring social impact will include user engagement rates, improvement in pronunciation accuracy, and the accessibility of the platform to users in different demographic groups.
By developing SpeakEasy, I aim to empower learners to overcome linguistic barriers, foster a more inclusive learning environment, and contribute to greater equity in education.
Personal Information Collection Statement (PICS):
1. The personal data collected in this form will be used for activity-organizing, record keeping and reporting only. The collected personal data will be purged within 6 years after the event.
2. Please note that it is obligatory to provide the personal data required.
3. Your personal data collected will be kept by the LTTC and will not be transferred to outside parties.
4. You have the right to request access to and correction of information held by us about you. If you wish to access or correct your personal data, please contact our staff at lttc@eduhk.hk.
5. The University’s Privacy Policy Statement can be access at https://www.eduhk.hk/en/privacy-policy.
- I have read and agree to the competition rules and privacy policy.