Publications

Adaptive Mobile Agent for Dynamic Interactions

Published in IEEE International Conference on Multimedia and Expo (ICME) 2025, 2025

This work presents a novel LLM-based multimodal agent framework for mobile devices, designed to enhance interaction and adaptive capabilities in dynamic mobile environments through autonomous navigation and human-like behaviors.

Recommended citation: Li, Y., Zhang, C., Yang, W., Fu, B., Cheng, P., Chen, X., Chen, L., & Wei, Y. (2025). Adaptive Mobile Agent for Dynamic Interactions. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME) 2025.

Learning to Be A Doctor: Searching for Effective Medical Agent Architectures

Published in ACM International Conference on Multimedia (ACM MM) 2025, 2025

This paper introduces a novel framework for the automated design of medical agent architectures, defining a hierarchical and expressive agent search space that enables dynamic workflow adaptation through structured modifications at multiple levels.

Recommended citation: Zhuang, Y., Jiang, W., Zhang, J., Yang, Z., Zhou, J. T., & Zhang, C. (2025). Learning to Be A Doctor: Searching for Effective Medical Agent Architectures. In Proceedings of ACM International Conference on Multimedia (ACM MM) 2025.

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Published in arXiv preprint, 2025

This work proposes a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility through a memory mechanism that records task execution history and evolves high-level actions.

Recommended citation: Jiang, W., Zhuang, Y., Song, C., Yang, X., Zhou, J. T., & Zhang, C. (2025). AppAgentX: Evolving GUI Agents as Proficient Smartphone Users. arXiv preprint arXiv:2503.02268.

Skeleton-Guided Spatial-Temporal Feature Learning for Video-Based Visible-Infrared Person Re-Identification

Published in arXiv preprint, 2024

Video-based visible-infrared person re-identification (VVI-ReID) is challenging due to significant modality feature discrepancies. This work proposes a novel Skeleton-guided spatial-Temporal feAture leaRning (STAR) method that uses skeleton information to improve spatial-temporal features in videos of both modalities.

Recommended citation: Jiang, W., Zhu, X., Gao, J., & Liao, D. (2024). Skeleton-Guided Spatial-Temporal Feature Learning for Video-Based Visible-Infrared Person Re-Identification. arXiv preprint arXiv:2411.11069.