Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

awards

publications

Skeleton-Guided Spatial-Temporal Feature Learning for Video-Based Visible-Infrared Person Re-Identification

Published in arXiv preprint, 2024

Video-based visible-infrared person re-identification (VVI-ReID) is challenging due to significant modality feature discrepancies. This work proposes a novel Skeleton-guided spatial-Temporal feAture leaRning (STAR) method that uses skeleton information to improve spatial-temporal features in videos of both modalities.

Recommended citation: Jiang, W., Zhu, X., Gao, J., & Liao, D. (2024). Skeleton-Guided Spatial-Temporal Feature Learning for Video-Based Visible-Infrared Person Re-Identification. arXiv preprint arXiv:2411.11069.

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Published in arXiv preprint, 2025

This work proposes a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility through a memory mechanism that records task execution history and evolves high-level actions.

Recommended citation: Jiang, W., Zhuang, Y., Song, C., Yang, X., Zhou, J. T., & Zhang, C. (2025). AppAgentX: Evolving GUI Agents as Proficient Smartphone Users. arXiv preprint arXiv:2503.02268.

Learning to Be A Doctor: Searching for Effective Medical Agent Architectures

Published in ACM International Conference on Multimedia (ACM MM) 2025, 2025

This paper introduces a novel framework for the automated design of medical agent architectures, defining a hierarchical and expressive agent search space that enables dynamic workflow adaptation through structured modifications at multiple levels.

Recommended citation: Zhuang, Y., Jiang, W., Zhang, J., Yang, Z., Zhou, J. T., & Zhang, C. (2025). Learning to Be A Doctor: Searching for Effective Medical Agent Architectures. In Proceedings of ACM International Conference on Multimedia (ACM MM) 2025.

Adaptive Mobile Agent for Dynamic Interactions

Published in IEEE International Conference on Multimedia and Expo (ICME) 2025, 2025

This work presents a novel LLM-based multimodal agent framework for mobile devices, designed to enhance interaction and adaptive capabilities in dynamic mobile environments through autonomous navigation and human-like behaviors.

Recommended citation: Li, Y., Zhang, C., Yang, W., Fu, B., Cheng, P., Chen, X., Chen, L., & Wei, Y. (2025). Adaptive Mobile Agent for Dynamic Interactions. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME) 2025.