AIREA Competition 2025 – Open Category – All Submissions | AIREA - Artificial Intelligence Education and Research Alliance

Open Category

Entry ID

820

Participant Type

Team

Expected Stream

Stream 2: Identifying an educational problem and proposing a prototype solution.

Section A: Project Information

Project Title:

Augmenting Data Visualisation Pipeline: The Role of Large Language Models

Project Description (maximum 300 words):

This project pioneers the integration of large language models (LLMs) into the data visualization learning pipeline with a focus on educational transformation. It introduces a web-based, novice-friendly data visualization platform enhanced with LLMs, adopting a "novice-first" approach to overcome limitations found in current LLM-augmented tools like Julius AI and Microsoft’s LIDA.

Key innovations include an intuitive, web-based platform for data visualization learners that removes technical barriers such as environment setup. Further, it provides step-by-step guidance, actionable feedback, and help popups that aid in lowering the barrier to entry for learners of all skill levels.

The potential impact is transformative: the project makes learning of data visualization more accessible, personalized, and engaging, empowering students to experiment, iterate, and build critical data literacy skills. By democratizing access to visualization tools, it supports lifelong learning and fosters greater inclusivity in digital education.

File Upload

Initial-Submission-Slides.pdf

Section B: Participant Information

Personal Information (Team Member)

Title	First Name	Last Name	Organisation/Institution	Faculty/Department/Unit	Email	Phone Number	Contact Person / Team Leader
Mr.	Yong Jing	Goh	National University of Singapore	School of Computing	gohyongjing@u.nus.edu	84059110	YES
Dr.	Bimlesh	Wadhwa	National University of Singapore	School of Computing	bimlesh@nus.edu.sg	93856284

Section C: Project Details

Project Details

Please answer the questions from the perspectives below regarding your project.

1.Problem Identification and Relevance in Education (Maximum 300 words)

The complexity of the data visualization pipeline poses a significant hurdle for novices. Recent advances in LLMs, particularly their ability to understand natural language and generate immediate, context-aware outputs, present an unprecedented opportunity to bridge this gap.

This project was inspired by recognizing the mismatch between powerful but complex visualization frameworks and the needs of novice learners. Existing tools often assume technical proficiency, creating friction that discourages exploration.

Hypothesis: Integrating LLMs into a user-friendly, novice-centric pipeline with guided assistance will significantly enhance the accessibility, efficiency, and effectiveness of data visualization learning.

I believe this approach will succeed because it aligns with the proven educational technology principles such as user assistance, exploratory learning, adaptive feedback, and accessible design, ensuring it addresses core learning challenges.

2a. Feasibility and Functionality (for Streams 1&2 only) (Maximum 300 words)

The platform is built using ReactJS (frontend) and FastAPI (backend), which are robust technologies that allow for scalable, modular, and rapid development. A cloud server hosts the platform, interfacing with LLM APIs to generate natural-language feedback and code suggestions.

Core Functionalities:
- Dataset Upload: Easy dataset ingestion.
- Dataset Summary: LLM-generated insights into dataset characteristics.
- Goal Generation: AI-suggested visualization goals (e.g., trend analysis, outlier detection).
- Visualization Generation: Python (Seaborn) code automatically created and executed server-side.

User Experience Enhancements:
- Guided help popups and tooltips for each pipeline step.
- Actionable error messages.
- Modular design allowing seamless navigation across the pipeline.

Validation Strategy: Pilot testing with Graduate students (enrolled in a Data Visualization course at the University) assessing ease of use, perceived value, and qualitative feedback.

Evaluation Metrics:
- Visualisation Error Rate (VER): The proportion of LLM-generated code that fails to produce a valid visualisation, adapted from LIDA's research.
- Goal Diversity: Reflects the tools ability to expose users to a wide variety of visualisation goals.
- User Perception Rating: Subjective user ratings capturing ease of use, perceived usefulness, and overall satisfaction.

2b. Technical Implementation and Performance (for Stream 3&4 only) (Maximum 300 words)

3. Innovation and Creativity (Maximum 300 words)

The project reimagines the educational data visualization experience by adapting LIDA’s pipeline into a multi-user, web-based, guided learning environment, which is a significant departure from existing, setup-heavy frameworks.

Innovative Features:
- Installation-free environment for immediate engagement.
- Stateless API architecture supporting scalable multi-user access.
- Personalized assistance at every step to reduce cognitive overload.

By making technical learning environments more inviting and navigable, the project transforms intimidating learning processes into empowering experiences, aligning strongly with personalized learning principles.

Furthermore, by engaging learners in iterative dialogue with LLMs during the visualization creation process, the project fosters not only data literacy but also critical AI literacy. Users learn to interpret, validate, and refine AI-generated outputs, building essential skills for working responsibly with generative AI technologies in real-world contexts.

4. Scalability and Sustainability (Maximum 300 words)

The system’s stateless, API-driven architecture ensures that it can be horizontally scaled by simply adding backend instances.

Addressing Bottlenecks:
- Asynchronous request handling to minimize LLM-induced latencies.
- Input caching to reduce redundant computations.

Sustainability Measures:
- Use of lightweight backend frameworks (FastAPI) to reduce server load and energy usage.
- Caching LLM outputs to minimize carbon footprint.
- Emphasis on low-friction, reusable design patterns.

Long-Term Engagement Strategies:
- Visualization Repository for sharing user creations and encouraging peer learning. The platform's community-driven visualization repository aims to enable learners to share their creations, receive peer feedback, and learn from diverse approaches. This feature is being designed to promote collaborative learning and cross-pollination of ideas, fostering a global learning community centered around AI-assisted data exploration.
- Modular extensibility, allowing future addition of new visualization techniques and pedagogical features.
- Continuous feedback integration to refine platform usability and adapt to user needs.

5. Social Impact and Responsibility (Maximum 300 words)

Our project addresses the pressing educational inequality in data literacy and AI accessibility, particularly for students and novices who lack the technical skills to engage with data visualization tools. By transforming the LIDA pipeline into a web-based, beginner-friendly platform with interactive guidance, we democratize access to AI-assisted data analysis. This empowers individuals to explore and communicate data-driven insights effectively regardless of their technical background, fostering greater digital inclusion.

The solution is designed with accessibility and equity in mind. Help popups, tooltips, and simplified workflows are built into the interface to support users with varying levels of familiarity with data tools.

Responsible AI Practices:
- All LLM outputs are transparently labeled in the tutorial popup, allowing users to critically assess AI-generated content.
- User data, including uploaded datasets, is processed temporarily and not stored persistently, maintaining user privacy and data security.
- The platform's design emphasizes fairness and inclusivity by providing accessible, language-simplified guidance to users of diverse backgrounds and skill levels.
- We can further address ethical considerations around AI bias by encouraging users to critically review and validate AI suggestions rather than accepting outputs uncritically.

We will measure social impact using a mix of quantitative and qualitative metrics, including:
- User adoption rates
- Surveys to gauge perceived ease of use and value provided by the tool
- Feedback and suggestions from participants

Responsiveness to evolving needs will be achieved through regular user feedback loops and the modular pipeline structure, which allows for easy updates and customization of features based on user demand. This allows us to ensure the solution remains inclusive and adaptive to the learning journey of all users.

Do you have additional materials to upload?

PIC

Personal Information Collection Statement (PICS):
1. The personal data collected in this form will be used for activity-organizing, record keeping and reporting only. The collected personal data will be purged within 6 years after the event.
2. Please note that it is obligatory to provide the personal data required.
3. Your personal data collected will be kept by the LTTC and will not be transferred to outside parties.
4. You have the right to request access to and correction of information held by us about you. If you wish to access or correct your personal data, please contact our staff at lttc@eduhk.hk.
5. The University’s Privacy Policy Statement can be access at https://www.eduhk.hk/en/privacy-policy.

Agreement

I have read and agree to the competition rules and privacy policy.

Open Category Submission

820 – Augmenting Data Visualisation Pipeline: The Role of Large Language Models

Section A: Project Information

Section B: Participant Information

Section C: Project Details

Recent Posts

Recent Comments

Menu

Home

Forums

Competitions

Phone

Email

Address

Open Category Submission820 – Augmenting Data Visualisation Pipeline: The Role of Large Language Models

Section A: Project Information

Section B: Participant Information

Section C: Project Details

Recent Posts

Recent Comments

Open Category Submission

820 – Augmenting Data Visualisation Pipeline: The Role of Large Language Models