Hello

Zhuoyun Du | 杜卓耘

I am a second-year master's student in Computer Science at Zhejiang University. I currently work with Prof. Wei Chen at ZJUVAI within the State Key Laboratory of CAD&CG. I am also interning with the Taowise team at Alibaba.

Previously, I received my Bachelor's degree from Jinan University. I also served as a research assistant on the ChatDev project stars at Tsinghua University, advised by Prof. Zhiyuan Liu and Prof. Chen Qian.


Seeking Ph.D. positions (Fall 2027).
Open to collaborations & recommendations!

Seeking Research Interns.
LLMs & Autonomous Agents. Email me!

Research Focus: Designing LLM-based autonomous agents that collaborate effectively to tackle complex tasks (software development, self-evolution, and latent-space reasoning).

News & Timeline

Selected Publications

(† denotes equal contribution)

Online-PVLM

Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

Huiyu Bai, Runze Wang, Zhuoyun Du, Yiyang Zhao, Fengji Zhang, Haoyu Chen, Xiaoyong Zhu, Bo Zheng, Xuejiao Zhao*
arXiv Preprint, 2025
Personalized vision-language models (VLMs) often struggle to adapt to new user-defined concepts in real time, requiring per-concept training that is neither scalable nor efficient. To address this, we propose Online-PVLM, a training-free framework for online concept learning in latent space. Our method leverages hyperbolic representations to generate compact, discriminative concept embeddings on the fly from a few reference images—without any test-time training. Furthermore, we introduce OP-Eval, a large-scale benchmark with 1,292 concepts and over 30K high-quality instances to rigorously evaluate online personalization. Experiments show that Online-PVLM achieves state-of-the-art performance across diverse tasks while enabling efficient caching and retrieval, paving the way for real-world personalized multimodal AI.
@article{bai2025onlinepvlm,
  title={Online-PVLM: Advancing Personalized VLMs with Online Concept Learning},
  author={Bai, Huiyu and Wang, Runze and Du, Zhuoyun and Zhao, Yiyang and Zhang, Fengji and Chen, Haoyu and Zhu, Xiaoyong and Zheng, Bo and Zhao, Xuejiao},
  journal={arXiv preprint arXiv:2511.20056},
  year={2025}
}
Interlat

Enabling Agents to Communicate Entirely in Latent Space

Zhuoyun Du, Runze Wang, Huiyu Bai, Zouying Cao, Xiaoyong Zhu, Bo Zheng, Wei Chen, Haochao Ying
arXiv Preprint, 2025
While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling rich, internal latent states into discrete tokens inherently limits the depth and nuance of information that can be transmitted, thereby hindering collaborative problem-solving. Inspired by human mind-reading, we propose Interlat (Inter-agent Latent Space Communication), a paradigm that leverages the last hidden states of an LLM as a representation of its mind for direct transmission (termed latent communication). An additional compression process further compresses latent communication via entirely latent space reasoning. Experiments demonstrate that Interlat outperforms both fine-tuned chain-of-thought (CoT) prompting and single-agent baselines, promoting more exploratory behavior and enabling genuine utilization of latent information. Further compression not only substantially accelerates inference but also maintains competitive performance through an efficient information-preserving mechanism. We position this work as a feasibility study of entirely latent space inter-agent communication, and our results highlight its potential, offering valuable insights for future research.
@article{du2025interlat,
  title={Enabling Agents to Communicate Entirely in Latent Space},
  author={Du, Zhuoyun and Wang, Runze and Bai, Huiyu and Cao, Zouying and Zhu, Xiaoyong and Zheng, Bo and Chen, Wei and Ying, Haochao},
  journal={arXiv preprint arXiv:2511.09149},
  year={2025}
}
SSPO

SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression

Yuyang Xu, Yi Cheng, Haochao Ying, Zhuoyun Du, Renjun Hu, Xing Shi, Wei Lin, Jian Wu
arXiv Preprint, 2025
Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning) often incur substantial computational overhead due to auxiliary models and overthinking. In this paper, we empirically reveal that the incorrect answers partially stem from verbose reasoning processes lacking correct self-fix, where errors accumulate across multiple reasoning steps. To this end, we propose Self-traced Step-wise Preference Optimization (SSPO), a pluggable RL process supervision framework that enables fine-grained optimization of each reasoning step. Specifically, SSPO requires neither auxiliary models nor stepwise manual annotations. Instead, it leverages step-wise preference signals generated by the model itself to guide the optimization process for reasoning compression. Experiments demonstrate that the generated reasoning sequences from SSPO are both accurate and succinct, effectively mitigating overthinking behaviors without compromising model performance across diverse domains and languages.
@article{xu2025sspo,
  title={SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression},
  author={Xu, Yuyang and Cheng, Yi and Ying, Haochao and Du, Zhuoyun and Hu, Renjun and Shi, Xing and Lin, Wei and Wu, Jian},
  journal={arXiv preprint arXiv:2508.12604},
  year={2025}
}
EvoPatient

LLMs Can Simulate Standardized Patients via Agent Coevolution

Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying
ACL 2025
Training medical personnel using standardized patients (SPs) remains a complex challenge, requiring extensive domain expertise and role-specific practice. Previous research on Large Language Model (LLM)-based SPs mostly focuses on improving data retrieval accuracy or adjusting prompts through human feedback. However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. Extensive experiments on various cases demonstrate that, by providing only overall SP requirements, our framework improves over existing reasoning methods by more than 10% in requirement alignment and better human preference, while achieving an optimal balance of resource consumption after evolving over 200 cases for 10 hours, with excellent generalizability.
@article{du2024evopatient,
  title={LLMs Can Simulate Standardized Patients via Agent Coevolution},
  author={Du, Zhuoyun and Zheng, Lujie and Hu, Renjun, et al.},
  journal={arXiv preprint arXiv:2412.11716},
  year={2024}
}
Cross Team

Multi-Agent Collaboration via Cross-Team Orchestration

Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang
ACL 2025 Findings
Large Language Models (LLMs) have significantly impacted various domains, especially through organized LLM-driven autonomous agents. A representative scenario is in software development, where agents can collaborate in a team like humans, following predefined phases to complete sub-tasks sequentially. However, for an agent team, each phase yields only one possible outcome. This results in the completion of only one development chain, thereby losing the opportunity to explore multiple potential decision paths within the solution space. Consequently leading to suboptimal results or extensive trial and error. To address this, we introduce Cross-Team Orchestration (Croto), a scalable multi-team framework that enables orchestrated teams to jointly propose various task-oriented solutions and interact with their insights in a self-independence while cross-team collaboration environment for superior solutions generation. Experiments reveal a notable increase in software quality compared to state-of-the-art baselines. We further tested our framework on story generation tasks, which demonstrated a promising generalization ability of our framework in other domains.
@article{du2024crossteam,
  title={Multi-Agent Software Development through Cross-Team Collaboration},
  author={Du, Zhuoyun and Qian, Chen and Liu, Wei, et al.},
  journal={arXiv preprint arXiv:2406.08979},
  year={2024}
}
Scaling MAS

Scaling Large-Language-Model-based Multi-Agent Collaboration

Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun
ICLR 2025
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law--increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law--the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts.
@article{qian2024scaling,
  title={Scaling Large-Language-Model-based Multi-Agent Collaboration},
  author={Qian, Chen and Xie, Zihao and Wang, Yifei, et al.},
  journal={arXiv preprint arXiv:2406.07155},
  year={2024}
}
Information Asymmetry

Autonomous Agents for Collaborative Task under Information Asymmetry

Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian
NeurIPS 2024
Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' collaborations are leveraged to perform multi-person tasks, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication toward effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes.
@article{liu2024autonomous,
  title={Autonomous Agents for Collaborative Task under Information Asymmetry},
  author={Liu, Wei and Wang, Chenxi and Wang, Yifei, et al.},
  journal={arXiv preprint arXiv:2406.14928},
  year={2024}
}

Experience

Taowise @ Taotian (Alibaba Group)

Algorithm Engineer Intern Jan 2025 – Present

Focusing on research on new paradigm in multi-agent latent space reasoning/communication enhancement and supervised fine-tuning (SFT) methodologies for LLMs.

THUNLP @ Tsinghua University

Research Assistant Nov 2023 – Aug 2024

Deeply involved in the development of the ChatDev project and related works, focusing on multi-agent cross-team collaboration.

Projects

Multi-Agent Research Interactive E-book

A comprehensive collection of research papers on LLM-based multi-agent systems presented in an interactive e-book format. Organizes cutting-edge research into task-solving-oriented and social-simulation-oriented systems, covering agent communication, organizational structures, and co-evolution mechanisms.

ChatDev & Croto & MACNET

Actively contributed to the development of ChatDev stars and its extended branches, serving as the leader for one key branch.

Awards & Honors

Academic Service

Program Committee / Reviewer:
NeurIPS 2025 ICLR 2026

Beyond Research

Sports: Soccer Fencing Snowboarding Billiards Badminton
Music: Piano Guitar (Beginner)
Hobbies: Photography Reading Physics

Visit my Personal Blog for research insights, photography, and life experiences.
(I will try to update more frequently :D)

Loading...
...