Quoc-Huy Trinh

Deep Generative Model  ·  Computer Vision  ·  NLP  ·  Multimodal

Hi 👋, I'm Huy — a Vietnamese researcher and a Master's student in Computer Science at Aalto University. I'm passionate about exploring new ideas at the frontier of deep learning and computer vision, and I love building things just to see how they work. Outside of research, I enjoy reading, soccer, music, and films.

My research is supervised by Prof. Minh-Triet Tran, Prof. Ulas Bagci, Prof. Sebastian Szyller, Prof. Bo Zhao, Dr. Debesh Jha, and Msc. Hai-Dang Nguyen.

Research Interests

Deep Generative Model Computer Vision NLP Audio Generation Medical Image Analysis Trustworthy AI

Education

Aalto

Aalto University, Espoo, Finland

M.Sc.  ·  Major: Computer Science Minor: Machine Learning, Data Science and Artificial Intelligence GPA: 4.82 / 5
Supervisors Prof. Sebastian Szyller
Prof. Bo Zhao
HCMUS

University of Science, VNU-HCM, Vietnam

B.Sc. (Honor Program)  ·  Major: Information Technology GPA: 3.58 / 4.0
Supervisors Prof. Minh-Triet Tran
Msc. Hai-Dang Nguyen
LHP

Le Hong Phong High School for the Gifted

Publications

2026
Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space
Quoc-Huy Trinh, Xi Ding, Yang Liu, Zhenyue Qin, Xingjian Li, Gorkem Durak, Halil Ertugrul Aktas, Elif Keles, Ulas Bagci*, Min Xu*
PRS-MED: Position Reasoning Segmentation in Medical Imaging
Quoc-Huy Trinh, Minh-Van Nguyen, Jung Zeng, Debesh Jha*, Ulas Bagci*
Firebolt-VL: Efficient Vision-Language Understanding with Cross-Modality Modulation
Quoc-Huy Trinh, Mustapha Abdullahi, Bo Zhao, Debesh Jha
2025
CMATalk: Cross Modality Alignment for Talking Head Generation
Xuan-Nam Cao, Quoc-Huy Trinh, Minh-Triet Tran
PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy
Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, Quoc-Huy Trinh, Koushik Biswas, Hongyi Pan, et al.
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Quoc-Huy Trinh, Minh-Van Nguyen, Trong-Hieu Nguyen Mau, Khoa Tran, Thanh Do
NeIn: Telling What You Don't Want
Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet Tran, Truong Nguyen, Susan Gauch
2024
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges
Debesh Jha, Vanshali Sharma, Debapriya Banik, Quoc-Huy Trinh, et al.
SAM-EG: Segment Anything Model with Edge Guidance framework for efficient Polyp Segmentation
Quoc-Huy Trinh, Hai-Dang Nguyen, Bao-Tram Nguyen Ngoc, Debesh Jha, Ulas Bagci, Minh-Triet Tran
Pose-to-Human (P2H): A pose guidance framework via Gram matrix for Occluded Person Re-identification
Quoc-Huy Trinh, Phuoc-Thao Vo Thi, Minh-Triet Tran, Hai-Dang Nguyen
PDGS: Pose-Guided Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification
Quoc-Huy Trinh, Nhat-Tan Bui, Dinh-Hieu Hoang, Phuoc-Thao Vo Thi, Hai-Dang Nguyen, Debesh Jha, Ulas Bagci, Ngan Le, Minh-Triet Tran
KDAS: Knowledge distillation Framework via Attention Supervision for Polyp Segmentation
Quoc-Huy Trinh, Minh-Van Nguyen, Phuoc-Thao Vo Thi
Pose Knowledge Distill Guidance: Effective Pose guide learning for Person Re-Identification
Quoc-Huy Trinh, Phuoc-Thao Vo Thi, Minh-Triet Tran, Hai-Dang Nguyen
ICMR 2024 — ACM Best Paper Oral
EAPC: Emotion and Audio Prior Control framework for the emotional and temporal Talking Face Generation
Xuan-Nam Cao, Quoc-Huy Trinh, Quoc-Anh Do-Nguyen, Van-Son Ho, Hoai-Thuong Dang, Minh-Triet Tran
2023
ALGNet: Attention Light Graph Memory Network for Medical Recommendation System
Minh-Van Nguyen, Duy-Thinh Nguyen, Quoc-Huy Trinh, Bac-Hoai Le
SpeechSyncNet: Speech to Talking Landmark via the fusion of prior frame landmark and the audio
Xuan-Nam Cao, Quoc-Huy Trinh, Van-Son Ho, Minh-Triet Tran
An objective validation of polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 transparency challenges
Debesh Jha, Vanshali Sharma, Debapriya Banik, Quoc-Huy Trinh, et al.
Graph for Transformer Feature: A New Approach for Face Anti-Spoofing
Quoc-Huy Trinh, Hieu Nguyen, Van Nguyen, Xuan-Mao Nguyen, Hai-Dang Nguyen
M2UNet: MetaFormer Multi-scale Upsampling Network for Polyp Segmentation
Quoc-Huy Trinh, Nhat-Tan Bui, Trong-Hieu Nguyen Mau, Minh-Van Nguyen, Hai-Minh Phan, Minh-Triet Tran, Hai-Dang Nguyen
Meta-Polyp: a baseline for efficient Polyp segmentation
Quoc-Huy Trinh
Tiny convolution contextual neural network: a lightweight model for skin lesion detection
Quoc-Huy Trinh, Trong-Hieu Nguyen Mau, Phuoc-Thao Vo Thi, et al.
PEFNet: Positional Embedding Feature for Polyp Segmentation
Trong-Hieu Nguyen-Mau, Quoc-Huy Trinh, Nhat-Tan Bui, Phuoc-Thao Vo Thi, Minh-Van Nguyen, Xuan-Nam Cao, Minh-Triet Tran, Hai-Dang Nguyen
2022
Multi Kernel Positional Embedding ConvNeXt for Polyp Segmentation
Trong-Hieu Nguyen-Mau, Quoc-Huy Trinh, Nhat-Tan Bui, Minh-Triet Tran, Hai-Dang Nguyen
Ensemble of Deep Neural Networks for Rice Leaf Disease Classification
Son Van Ho, Huy Gia Vuong, Binh Quang Nguyen, Quoc-Huy Trinh, Minh-Triet Tran
Res-Dense Net for 3D Covid Chest CT-Scan Classification
Quoc-Huy Trinh, Minh-Van Nguyen, Thien-Phuc Nguyen Dinh
EfficientNet for Brain-Lesion Classification — International Workshop BrainLes 2021
Quoc-Huy Trinh, Trong-Hieu Nguyen Mau, Radmir Zosimov, Minh-Van Nguyen
2021
SHREC 2021: Retrieval of Cultural Heritage Objects
Ivan Sipiran, Patrick Lazo, Cristian Lopez, ..., Quoc-Huy Trinh, et al.
Endoscopy Image Retrieval by Mixer Multi-Layer Perceptron
Quoc-Huy Trinh, Minh-Van Nguyen

Industry Experience

SpexAI

Lead Machine Learning Engineer — Spex A.I GmbH (Mar 2024 – May 2026)

SpexAI

Machine Learning Engineer — Spex A.I GmbH (Oct 2021 – Mar 2024)

SongGen

Senior Research Scientist — SongGen (Mar 2024 – May 2025)

VNG

Data Scientist — VNG Corporation (2022 – 2024)

Aeyes

Founder — Aeyes – Smart Glasses for Blind (2020 – Present)

Software Developer — Microbox (2021 – 2022)

BaoData (2020)

Academic Experience

Research Intern — Xu Lab, Carnegie Mellon University (Jun 2025 – Present)

Large Vision Language Models

Research Assistant — TAC Lab, Aalto University (Present)

Trustworthiness and reliability of Large Language Models

Research Assistant — Bagci Lab, Northwestern University (Aug 2024 – Present)

Research Intern — Empathic Computing Lab (2022 – 2023)

Research Intern — AIOZ (2021 – 2022)

Academic Service

Reviewer

ACM Multimedia 2025 (ACM MM)
IEEE Transactions on Circuits and Systems for Video Technology — Q1
International Conference on Computer Vision 2025 (ICCV)
Conference on Computer Vision and Pattern Recognition (CVPR)
European Conference on Computer Vision (ECCV)
IEEE Transactions on Medical Imaging (TMI)
International Conference on Multimedia and Expo (ICME)
International Conference on Multimedia Modeling (MMM)
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
International Joint Conference on Neural Networks (IJCNN)
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Mentorship & Outreach

Partner — VinUni Entrepreneur Lab
Ambassador — H4TF Hackathon 2023
Mentor — STEM Petrus Ky / Le Hong Phong
Technical Mentor — LHPSC

Awards

2024Scholarship awarded by ICME 2024
2024Best Paper Award — AI-SIPM @ ICMR 2024
2023Third Prize — Vietnamese National Invention 2023
2022Second Prize — Vietnamese Invention for Society
2022Excellent Paper Award — ICMV 2022
2022Best Paper Award — RIVF 2022
2022First Prize — Innocity
2022Top 1 — Zoo Hackathon
2021Second Prize — Makethon (no first prize awarded)
2021Second Prize — Software for Student (no first prize awarded)
2021Top 2 Best Project — TensorFlow Community
2021Second Prize — Vietnamese Edtech Startup 2021
2021Top 2 Medical Track — Mediaeval 2021
2021Top 3 — KO Hackathon
2020Top 3 Medical Track — Mediaeval 2020
2020Top 10 / 800 — FGCV 2020 @ CVPR 2020
2020Third Prize — Vietnam National Talent Youth in Computer Science
2019Third Prize — Vietnam National Science and Engineering Fair
2018Bronze Medal — Robotacon

Open Source

SongGen-AI / LLambada

Open-source music generation project.