I am currently a second-year Ph.D. student at Institute for Interdisciplinary Information Sciences, Tsinghua University, supervised by Prof. Jianyu Chen. Previously, I received my Bachelor’s degree in Computer Science from Beijing Institute of Technology.

My research interests lie in Embodied foundation models and Unified multimodal models, particularly in areas of VLA and World-model-based Policy. I am dedicated to exploring the limitations and bottlenecks of existing VLA (Vision-Language-Action) models. By leveraging world models, I hope to bridge the gap between general vision-language capabilities (VLMs) and action modeling, ultimately building a general embodied foundation model capable of human-like reasoning, imagination, execution, and correction.

Currently, I am an intern at the Seed Robotics Team, working on Unified Action Models. Feel free to reach out for collaboration or discussion: zhangjianke53@gmail.com.