[SCIS]
Shukang Yin#, Chaoyou Fu#*, Sirui Zhao#*, Tong Xu, Hao Wang, Dianbo Sui, Enhong Chen*.
"Woodpecker: Hallucination Correction for Multimodal Large Language Models",
SCIENCE CHINA Information Sciences(SCIS), 67(12): 220105, 2024. DOI: 10.1007/s11432-024-4251-x.
[IRAC'24]
Yifan Xu, Sirui Zhao*, Tong Xu, Enhong Chen*.
"AUD: AU-based Diffusion Model for Facial Expression Synthesis from a Single Image",
In 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC),
2024: 631-636.
[PRAI'24]
Yudong Xia, Sirui Zhao*, Tong Wu, Huaying Tang, Tong Xu*.
"AIGLLM: An Action Instruction Generation Method with Visually Enhanced LLM",
In 2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI),
2024: 84-90.
[PRAI'24]
Mengduo Wu, Sirui Zhao*, Tong Wu, Yifan Xu, Tong Xu*, Enhong Chen.
"AVF-LIP: High-fidelity Talking Face Generation via Audio-visual Fusion",
In 2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI),
2024: 491-499.
[PRAI'24]
Tong Wu, Sirui Zhao*, Siyuan Jin, Tong Xu, Enhong Chen*.
"CMDM: A Control Motion Diffusion Model for 2D Digital Human Motion Video Generation",
In 2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI),
2024: 202-209.
[arXiv]
Tingjia Shen, Hao Wang*, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen.
"Exploring User Retrieval Integration Towards Large Language Models for Cross-domain Sequential Recommendation",
arXiv preprint arXiv:2406.03085, 2024.
[arXiv]
Mingjia Yin, Hao Wang, Wei Guo, Yong Liu, Zhi Li, Sirui Zhao, Zhen Wang, Defu Lian, Enhong Chen.
"Learning Partially Aligned Item Representation for Cross-domain Sequential Recommendation",
arXiv preprint arXiv:2405.12473, 2024.
[IRAC'24]
Guoqing Zhao, Tong Xu*, Sirui Zhao.
"Prompting LLM for Embodied Tasks with Expert Instruction and Dimension Separation",
In 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC),
2024: 422-426.
[National Science Review]
Shukang Yin#, Chaoyou Fu#*, Sirui Zhao#*, Ke Li, Xing Sun, Tong Xu, Enhong Chen*.
"A Survey on Multimodal Large Language Models",
National Science Review, 11(12): nwae403, 2024. DOI: 10.1093/nsr/nwae403.
[arXiv]
Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun, Ziwei Liu, Liang Wang, Caifeng Shan, Ran He.
"MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs",
arXiv preprint arXiv:2411.15296, 2024.
[arXiv]
Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun.
"Video-MME: The first-ever comprehensive evaluation benchmark of multi-modal LLMs in video analysis",
arXiv preprint arXiv:2405.21075, 2024.
[ACM MM'24]
Zhengye Zhang#, Sirui Zhao#, Xinglong Mao, Shifeng Liu, Hao Wang, Tong Xu, Enhong Chen*.
"A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting",
In Proceedings of the 32nd ACM International Conference on Multimedia (ACM MM'24), Melbourne, Australia, 2024, pp. 11497-11502. DOI: 10.1145/3664647.3689143.
[ICME'24]
Shifeng Liu, Xinglong Mao, Sirui Zhao*, Chaoyou Fu, Ying Yu, Tong Xu, Enhong Chen*.
"TGMAE: Self-supervised Micro-Expression Recognition with Temporal Gaussian Masked Autoencoder",
In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME'24), Niagara Falls, Canada, 2024. DOI: 10.1109/ICME57554.2024.10687556.
[ACM TOMM]
Shukang Yin, Sirui Zhao*, Hao Wang, Tong Xu, Enhong Chen*.
"Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval",
ACM Transactions on Multimedia Computing, Communications, and Applications, 20(10): 1-21, 2024. DOI: 10.1145/3663571.
[PRCV'24]
Xinglong Mao, Shifeng Liu, Sirui Zhao*, Hao Wang, Tong Xu, Enhong Chen*.
"H2LMER: A Cross Frame-Rate Representation Alignment Framework for Micro-Expression Recognition",
Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024.
[ICMR'24]
Chenxiao Liu, Zheyong Xie, Sirui Zhao, Jin Zhou, Tong Xu*, Minglei Li, Enhong Chen,
"Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation",
In Proceedings of the 14th International Conference on Multimedia Retrieval (ICMR'24),
Dusit Thani Laguna Phuket, Thailand, 2024, pp. 533-542. DOI: 10.1145/3652583.3658104.
[ACM SIGKDD'24]
Mingjia Yin, Hao Wang*, Wei Guo, Yong Liu, Suojuan Zhang, Sirui Zhao, Defu Lian, Enhong Chen,
"Dataset Regeneration for Sequential Recommendation",
The 30th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD'2024),
2024, pp. 3954-3965. DOI: 10.1145/3637528.3671841.
[TOIS]
Hao Wang, Mingjia Yin, Luankang Zhang, Sirui Zhao, Enhong Chen,
"MF-GSLAE: A Multi-Factor User Representation Pre-training Framework for Dual-Target Cross-Domain Recommendation",
ACM Transactions on Information Systems,
43(2): Article 30, 1-28, 2025. DOI: 10.1145/3690382.