|
I am looking for research visits/internships for my final year (2025 Spring). Besides, I am open to work as research scientist or postdoctoral fellow/faculty (expected 2025 Summer). Please let me know if you have any relevant opportunities :).
|
Research
My research interest lies in the following database topics: Data Quality (focusing on high-quality data labeling/preparation, which targets the correctness and completeness of data), LLMs, and RAG.
My vision is to synthesize the strengths of data quality (DQ) and LMs fields and conduct research on DQ4LM and LM4DQ. Feel free to drop me an email if you are interested.
Representative papers on DQ and LM are highlighted in yellow and blue respectively.
|
|
CRAG - Comprehensive RAG Benchmark
Xiao Yang*, Kai Sun*, Hao Xin*, Yushi Sun*, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran,
Jiaqi Wang, Ethan Yifan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, Xin Luna Dong
NeurIPS, 2024, * indicates equal contribution.
We constructed a comprehensive RAG benchmark and hosted the KDD Cup competition.
paper;
KDD Cup
|
|
Are Large Language Models a Good Replacement of Taxonomies?
Yushi Sun, Hao Xin, Kai Sun, Ethan Yifan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen
VLDB, 2024
We conducted an extensive evaluation of SOTA LLMs on taxonomies.
data & code;
paper;
slides
|
|
Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation
Yushi Sun, Jiachuan Wang, Peng Cheng, Libin Zheng, Lei Chen, Jian Yin
ICDE, 2024
We proposed a novel cross-domain-aware worker selection with training approach for crowdsourced data labeling.
data & code;
paper;
slides
|
|
RECA: Related Tables Enhanced Column Semantic Type Annotation Framework
Yushi Sun, Hao Xin, Lei Chen
VLDB, 2023
We defined a novel named entity schema for related and sub-related table discovery and alignment for enhancing the annotation quality of column semantic types.
data & code;
paper;
slides
|
|
HKUST Research Travel Grant (2024)
RedBird Academic Excellence Award for Continuing PhD Students (2023-2024)
RedBird Academic Excellence Award for Continuing PhD Students (2022-2023)
RedBird PhD Scholarship (2021)
HKUST Academic Achievement Medal (2021)
First Class Honor graduate from HKUST (2021)
Hong Kong PhD Fellowship Scheme (2021-2025)
|
|
Conference reviewer: CIKM 2023
Journal reviewer: TKDE 2024
|
|
Teaching Assistant of COMP 1021 Introduction to Computer Science (2024 Fall)
Teaching Assistant of COMP 2711H Honors Discrete Mathematical Tools for Computer Science (2022 Fall)
Teaching Assistant of COMP 5712 Introduction to Combinatorial Optimization (2022 Spring)
|
|