Zhuoyun Du 「杜卓耘」
| Email |
Google Scholar |
| GitHub |
Blog |
Linkedin |
|
I am a first-year graduate student major in Computer Science at
Zhejiang University. I currently work with Prof. Wei Chen at
ZJUVAI within the State Key Laboratory of CAD&CG.
I am also interning with the Taowise team at Alibaba.
Previously, I received my Bachelor's degree in computer science at Jinan University, Guangzhou. I also serve as a research assistant on the
ChatDev project in the
Department of Computer Science and Technology, Tsinghua University,
advised by Prof. Zhiyuan Liu and
Prof. Chen Qian at
THUNLP within the State Key Laboratory of Intelligent Technology and Systems.
I am actively seeking Ph.D. positions starting in Fall 2027 in
Large Language Models, Multi-Agent Systems, and Reasoning.
I welcome opportunities, collaborations, and recommendations!
Research Focus: How to design LLM-based autonomous agents that
collaborate effectively to tackle complex, multi-step tasks.
My recent work spans software-development agents, creative applications, agent
self-evolution, and latent-space reasoning.
Email: xiaodu.flying [AT] gmail.com / duzy [AT] zju.edu.cn
|
- [2025.1.22] Our work on pluggable RL process supervision framework that enables fine-grained optimization of each reasoning step is available at SSPO.
- [2025.5.16] Two papers accepted at ACL 2025 - exploring cross-team collaboration and medical education applications!
- [2024.12.16] Scaling Large-Language-Model-based Multi-Agent Collaboration accepted at ICLR 2025!
- [2024.12.16] Introducing EvoPatient: Transforming LLMs into Standardized Patients for medical education. arXiv:2412.11716
- [2024.9.26] Paper on collaborative agents under information asymmetry accepted at NeurIPS 2024!
- [2024.6.25] Launch of our Interactive E-book on Multi-Agent Collaboration
- [2024.5.16] Two papers accepted at ACL 2024 main conference
- [2023.8.29] ChatDev open-sourced! Now with
stars!
|
|
SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression
Yuyang Xu, Yi Cheng, Haochao Ying, Zhuoyun Du, Renjun Hu, Xing Shi, Wei Lin, Jian Wu
arxiv
paper |
abstract |
bibtex
Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning) often incur substantial computational overhead due to auxiliary models and overthinking. In this paper, we empirically reveal that the incorrect answers partially stem from verbose reasoning processes lacking correct self-fix, where errors accumulate across multiple reasoning steps. To this end, we propose Self-traced Step-wise Preference Optimization (SSPO), a pluggable RL process supervision framework that enables fine-grained optimization of each reasoning step. Specifically, SSPO requires neither auxiliary models nor stepwise manual annotations. Instead, it leverages step-wise preference signals generated by the model itself to guide the optimization process for reasoning compression. Experiments demonstrate that the generated reasoning sequences from SSPO are both accurate and succinct, effectively mitigating overthinking behaviors without compromising model performance across diverse domains and languages.
@article{xu2025sspo,
title={SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression},
author={Xu, Yuyang and Cheng, Yi and Ying, Haochao and Du, Zhuoyun and Hu, Renjun and Shi, Xing and Lin, Wei and Wu, Jian},
journal={arXiv preprint arXiv:2508.12604},
year={2025}
}
|
|
LLMs Can Simulate Standardized Patients via Agent Coevolution
Zhuoyun Du†, Lujie Zheng†, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying
ACL 2025
paper |
abstract |
bibtex
Training medical personnel using standardized patients (SPs) remains a complex challenge, requiring extensive domain expertise and role-specific practice. Previous research on Large Language Model (LLM)-based SPs mostly focuses on improving data retrieval accuracy or adjusting prompts through human feedback. However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. Extensive experiments on various cases demonstrate that, by providing only overall SP requirements, our framework improves over existing reasoning methods by more than 10\% in requirement alignment and better human preference, while achieving an optimal balance of resource consumption after evolving over 200 cases for 10 hours, with excellent generalizability.
@article{du2024evopatient,
title={LLMs Can Simulate Standardized Patients via Agent Coevolution},
author={Du, Zhuoyun and Zheng, Lujie and Hu, Renjun and Xu, Yuyang and Li, Xiawei and Sun, Ying and Chen, Wei and Wu, Jian and Cai, Haolei and Ying, Haohao},
journal={arXiv preprint arXiv:2412.11716},
year={2024}
}
|
|
Multi-Agent Collaboration via Cross-Team Orchestration
Zhuoyun Du†, Chen Qian†, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang
ACL 2025 Findings
paper |
code |
abstract |
bibtex
Large Language Models (LLMs) have significantly impacted various domains, especially through organized LLM-driven autonomous agents. A representative scenario is in software development, where agents can collaborate in a team like humans, following predefined phases to complete sub-tasks sequentially. However, for an agent team, each phase yields only one possible outcome. This results in the completion of only one development chain, thereby losing the opportunity to explore multiple potential decision paths within the solution space. Consequently leading to suboptimal results or extensive trial and error. To address this, we introduce Cross-Team Orchestration (Croto), a scalable multi-team framework that enables orchestrated teams to jointly propose various task-oriented solutions and interact with their insights in a self-independence while cross-team collaboration environment for superior solutions generation. Experiments reveal a notable increase in software quality compared to state-of-the-art baselines. We further tested our framework on story generation tasks, which demonstrated a promising generalization ability of our framework in other domains.
@article{du2024crossteam,
title={Multi-Agent Software Development through Cross-Team Collaboration},
author={Du, Zhuoyun and Qian, Chen and Liu, Wei and Xie, Zihao and Wang, Yifei and Dang, Yufan and Chen, Weize and Yang, Cheng},
journal={arXiv preprint arXiv:2406.08979},
year={2024}
}
|
|
Scaling Large-Language-Model-based Multi-Agent Collaboration
Chen Qian†, Zihao Xie†, Yifei Wang†, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun
ICLR 2025
paper |
code |
abstract |
bibtex
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law--increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law--the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts.
@article{qian2024scaling,
title={Scaling Large-Language-Model-based Multi-Agent Collaboration},
author={Qian, Chen and Xie, Zihao and Wang, Yifei and Liu, Wei and Dang, Yufan and Du, Zhuoyun and Chen, Weize and Yang, Cheng and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2406.07155},
year={2024}
}
|
|
Autonomous Agents for Collaborative Task under Information Asymmetry
Wei Liu†, Chenxi Wang†, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian
NeurIPS 2024
paper |
code |
abstract |
bibtex
Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' collaborations are leveraged to perform multi-person tasks, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication toward effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes.
@article{liu2024autonomous,
title={Autonomous Agents for Collaborative Task under Information Asymmetry},
author={Liu, Wei and Wang, Chenxi and Wang, Yifei and Xie, Zihao and Qiu, Rennai and Dang, Yufan and Du, Zhuoyun and Chen, Weize and Yang, Cheng and Qian, Chen},
journal={arXiv preprint arXiv:2406.14928},
year={2024}
}
|
Taowise @ Taotian (Alibaba Group)
Algorithm Engineer Intern | Jan 2025 – Present
Focusing on research on new paradigm in multi-agent latent space reasoning enhancement and supervised fine-tuning (SFT) methodologies for large language models. Meanwhile working on advanced AI applications in e-commerce.
THUNLP @ Tsinghua University
Research Intern | Nov 2023 – Aug 2024
Deeply involved in the development of the ChatDev project and its related projects, focusing on multi-agent cross-team collaboration capabilities.
|
- First-Class Scholarship, 2024 - Outstanding Graduate Student Award
- Second-Class Scholarship, 2023 - Academic Excellence Recognition
- Second-Class Scholarship, 2022 - Academic Excellence Recognition
|
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
Academic Salon at ModelBest | January 5, 2024
slides |
abstract
Recently, Large Language Models (LLMs) have made significant progress in complex reasoning tasks through "Chain of Thought" technology. However, the models' reasoning is often limited by their own understanding and lack of external perspectives. To address this issue, the salon introduced the "Exchange of Thoughts" (EoT) framework, a new framework that enables cross-model communication during problem-solving. Drawing inspiration from network topology, EoT integrates four unique communication paradigms. The research delves into the communication dynamics and quantities associated with each paradigm. To mitigate the risk of erroneous reasoning chains, a confidence assessment mechanism is added to the communication. Experiments on various complex reasoning tasks have proven that EoT significantly surpasses established baselines, highlighting the value of external perspectives in enhancing the performance of LLMs.
|
Sports: Soccer, Fencing, Snowboarding, Billiards, Badminton
Music: Piano, Guitar (Beginner)
Hobbies: Photography, Reading, Physics
Visit my personal blog for research insights, photography, and life experiences. (I will try to update more frequently. :))))) |
|