“SurgMotion”, the Best-in-Class Surgical Video Foundation Model Launched

Open-Sourcing to Empower, AI to Lead Medicine

A surgical video foundation model known as “SurgMotion” was unveiled at the Hong Kong Science and Technology Parks Shenzhen Branch on 24 March [2026] by the Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science & Innovation (HKISI), and the Chinese Academy of Sciences.

The launch represents a substantial advancement in surgical AI, as it transitions from fragmented recognition to generalised understanding. This support is designed to be robust and reliable for clinical treatment, surgical procedures, medical education, and post-operative review.

Prof. Hongbin LIU, Director and Professor of CAIR; Prof. Nassir Navab, Member of Academia Europaea, IEEE, MICCAI, IAMBE, and AAIA Fellow, Professor at Technical University of Munich (TUM), Director of Chair for Computer Aided Medical Procedures (CAMP); Dr. Wai-Sang POON, Honorary Consultant, Neuromedical Centre, HKU-SZH, Clinical Professor of Surgery, HKU; Professor of Surgery (fractional), CUHK-Shenzhen Faculty of Medicine; Chairman, SZ-HK Specialist Training in Neurosurgery; Prof. Huai LIAO, Deputy Director of Pulmonary and Critical Care Medicine, Director of the Centre for Pulmonary Diagnostics and Interventional Therapy, Chief Physician and Professor, The First Affiliated Hospital of Sun Yat-sen University; Dr. Danny T.M. Chan, Clinical Associate Professor (Honorary) and Head of Division of Neurosurgery, Department of Surgery, The Chinese University of Hong Kong (CUHK); Mr. Qiang XIE, Vice President, Wuhan United Imaging Intelligence Medical Technology Co., Ltd.; Mr. Yuanmeng WANG, Deputy Director of the Technology and Talent Department, Hetao Development Authority; and Mr. Hongqiang RONG, Associate Director, Business Development, Hong Kong Science and Technology Parks Corporation (HKSTP)

From Pixel Recognition to Motion Understanding

At present, “SurgMotion” is the most comprehensive and extensive surgical video general intelligence foundation in the industry, having been trained on the SurgMotion-15M dataset. This dataset contains approximately 15 million frames, which correspond to more than 3,658 hours of genuine surgical video.

“SurgMotion” introduces a motion-guided latent-space prediction mechanism that surpasses the constraints of traditional pixel reconstruction due to the substantial volume of data. This substantially improves the model’s capacity to understand critical semantic structures, such as interactive actions, anatomical features, and surgical instruments. It establishes the groundwork for universal surgical intelligence across numerous centres, departments, and procedures.

“SurgMotion” facilitates 13 significant human organ categories and six distinct surgical comprehension tasks, including workflow recognition, action comprehension, depth estimation, polyp segmentation, triplet recognition, and skill evaluation. It has attained state-of-the-art (SOTA) results in 17 internationally authoritative surgical AI benchmarks. It is noteworthy that it outperforms current methods in core tasks, including surgical workflow recognition, instrument interaction comprehension, and fine-movement modelling, demonstrating exceptional generalisation capability and precision.

Establishing a New Framework for Smart Healthcare

Prof. Hongbin LIU noted in his opening address that CAIR published the “EchoCare” ultrasound foundation model and the multi-modal medical AI large model CARES 3.0 last year, indicating a steadfast dedication to research and development. This year, CAIR maintains its momentum by introducing the “SurgMotion” surgical video foundation model, solidifying its position as a premier institution in AI-driven healthcare in the Greater Bay Area.

He emphasised that the Centre’s research is consistently governed by the objective of clinical application, with the objective of empowering doctors, benefiting patients, and contributing to the development of a healthier, more efficient healthcare ecosystem.

Open-sourcing the model to establish a foundation for general surgical AI

Prof. Dong YI, a researcher at CAIR, formally announced that “SurgMotion,” a foundation model with one billion parameters, is entirely open-sourced during the model’s presentation and promotion. He delineated the model’s design philosophy, which is that surgical videos frequently contain redundant segments or interfering noises. This is in contrast to traditional self-supervised learning methods, which may squander computational power and model capacity on low-level details.

In order to resolve this issue, the team implemented three technical improvements on top of the V-JEPA architecture: feature diversity preservation, motion-guided latent space prediction, and model stability preservation. These advancements enable the model to concentrate more effectively on the acquisition of mid-to-high-level semantic information and motion from surgical videos, thereby facilitating a more efficient self-supervised training approach.

The research team developed the largest known surgical video pre-training data set, SurgMotion-15M, in addition to technological innovation. This data set comprises 3,658 hours of surgical video from 50 sources, spanning 13 anatomical regions. It encompasses a variety of specialised disciplines, including neurosurgery, ophthalmology, otolaryngology, and laparoscopy. It offers an unprecedented level of diversity to bolster the paradigm.

Empowering Clinical Practice

The standardised analytical capabilities of SurgMotion can effectively mitigate the risks associated with complex surgeries, substantially improve the standardisation of clinical diagnosis and surgical procedures, and offer comprehensive technical support to medical professionals at all levels.

The model’s substantiation in neurosurgery training was initially demonstrated by Dr. Wai-Sang POON, Honorary Consultant Physician at HKU-SZH and Clinical Professor in Surgery at The University of Hong Kong, during the application case demonstration segment. Dr. Poon, who has 35 years of clinical experience, observed that HKU-SZH has been striving to address the standardisation challenges of the traditional apprenticeship model in complex surgical education for a long time as a neurosurgery specialist training base.

In this validation, “SurgMotion” obtained a 90% accuracy rate on multi-centre clinical data. It obtained a Spearman correlation with expert ratings of 0.770 and a minimum mean absolute error (MAE) of 2.649 in the JIGSAWS surgical skill assessment data set. Its capabilities surpass those of other comparable models by a considerable margin. The system is on the brink of becoming a dependable teaching aid, as it possesses the ability to conduct precise motion analysis and objective evaluations. This will significantly advance the digitisation and standardisation of specialist training, as well as assist young surgeons in conducting standardised surgical reviews.

The model’s applicability in the field of interventional pulmonology procedures was subsequently demonstrated by Prof. Huai LIAO, Deputy Director of the Department of Respiratory and Critical Care Medicine at The First Affiliated Hospital, Sun Yat-sen University. He emphasised that interventional pulmonology is progressing toward more precise and profound procedures. Powerful AI vision models are urgently required to provide the requisite technological foundation for this advancement.

“SurgMotion” exhibited exceptional performance, achieving overall superiority in the critical tasks of image segmentation and depth estimation. It achieved remarkable lesion contouring precision and minimal depth error. The model obtained an accuracy of approximately 85% in identifying respiratory interventional procedures when tested with real clinical video data from his hospital. Bronchoscopic robotics are poised to be profoundly empowered and clinical precision and safety will be substantially improved by the implementation of such a powerful perceptual capability that genuinely comprehends the surgical process.

Exploring Pathways

The technical specifics, clinical application prospects, and pathways for the industrialisation of “SurgMotion” were the subject of an in-depth discussion during the media Q&A session, which was conducted by Prof. Hongbin LIU, Dr. Wai-Sang POON, Prof. Huai LIAO, Dr. Danny Chan Tat Ming, and Prof. Dong YI jointly. The large-scale deployment of AI in surgery is expected to be accelerated by the open-sourcing of “SurgMotion” by CAIR, which will continuously infuse momentum into medical technology innovation in the Guangdong-Hong Kong-Macao Greater Bay Area.

The Centre for Artificial Intelligence and Robotics (CAIR) was established in 2019 and is one of two centres under the Hong Kong Institute of Science & Innovation, the sole research institute of the Chinese Academy of Sciences in Hong Kong that is directly affiliated.CAIR is committed to the integration and innovation of life sciences and artificial intelligence. It conducts research in three primary areas: Embodied Intelligent Robots, Multimodal AI Large Model, and Intelligent Sensing Technologies. It is one of the few institutions worldwide that conducts systematic research and development of AI systems for medical and healthcare applications, as well as their technological transformation.

Picture Source: CAIR -the Centre for Artificial Intelligence and Robotics