Yushi SUN

I am a third-year PhD candidate at Hong Kong University of Science and Technology (HKUST) majored in Computer Science and Engineering supervised by Prof. Lei CHEN. I am fortunate to work with Prof. Nan TANG and Xin Luna Dong during my PhD journey.

I received the B.S. degree in Computer Science and Applied Mathematics from Hong Kong University of Science and Technology (HKUST) in 2021. More details of me are introduced in my CV.

Email  /  Google Scholar  /  Github

profile photo
News
I am looking for research visits/internships, please let me know if you have any relevant opportunities :).
We are hosting the KDD Cup competition (2024): Meta Comprehensive RAG. If you are interested, please check this link.
Research

My research interest lies in the following database topics: Data Preparation and Integration, Data Discovery, LLMs, and Crowd-sourcing. Representative papers are highlighted.

CRAG CRAG - Comprehensive RAG Benchmark
Xiao Yang*, Kai Sun*, Hao Xin*, Yushi Sun*, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Ethan Yifan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

submitted to NIPS, 2024, * indicates equal contribution.

We constructed a comprehensive RAG benchmark and hosted the KDD Cup competition.


paper

TaxoGlimpse Are Large Language Models a Good Replacement of Taxonomies?
Yushi Sun, Hao Xin, Kai Sun, Ethan Yifan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

VLDB, 2024

We conducted an extensive evaluation of SOTA LLMs on taxonomies.


data & code; paper

Crowd4U Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation
Yushi Sun, Jiachuan Wang, Peng Cheng, Libin Zheng, Lei Chen, Jian Yin

ICDE, 2024

We proposed a novel cross-domain-aware worker selection with training approach for crowdsourcing.


data & code; paper

RECA RECA: Related Tables Enhanced Column Semantic Type Annotation Framework
Yushi Sun, Hao Xin, Lei Chen

VLDB, 2023

We defined a novel named entity schema for related and sub-related table discovery and alignment for enhancing the annotation quality of column semantic types.


data & code; paper

Awards and Honors
HKUST Research Travel Grant (2024)
RedBird Academic Excellence Award for Continuing PhD Students (2023-2024)
RedBird Academic Excellence Award for Continuing PhD Students (2022-2023)
RedBird PhD Scholarship (2021)
HKUST Academic Achievement Medal (2021)
First Class Honor graduate from HKUST (2021)
Hong Kong PhD Fellowship Scheme (2021-2025)
Professional Services
Conference reviewer: CIKM 2023
Journal reviewer: TKDE 2024
Teaching Experience
Teaching Assistant of COMP 2711H Honors Discrete Mathematical Tools for Computer Science (2022 Fall)
Teaching Assistant of COMP 5712 Introduction to Combinatorial Optimization (2022 Spring)
Vistors