Abstract:
Objective: Manually annotated data are required for training intelligent segmentation models of computed tomography-scanned inner ear substructures and performing precise measurement of quantitative indicators. To address the high demand for consistency in such data as well as the critical issues of large annotation biases and data fragmentation caused by the lack of standardized annotation protocols, we propose herein a systematic pixel-level annotation specification for inner ear substructures. The purpose of this specification is to resolve the problems of ambiguous boundaries between different substructures and inconsistent manual annotation, thereby facilitating multicenter collaboration and the clinical translation of related technologies. Methods: On the basis of ultra-high-resolution computed tomography (U-HRCT) images, with the lateral semicircular canal used as the calibration reference for aligning the scan slices to the plane of maximum visualization, the bony labyrinth of the inner ear was subdivided into six independent substructures: the cochlea, vestibule, superior semicircular canal (excluding the common crus), lateral semicircular canal, posterior semicircular canal (excluding the common crus), and common crus, with explicit boundary definitions for each structure. A two-level quality control process was implemented in combination with axial-coronal-sagittal three-view verification, and all file formats were standardized. U-HRCT images of inner ears untouched by prior surgery were used in the study. Nineteen annotators with comparable annotation experience were enrolled and divided into a trained group and an untrained group for comparison of their scoring reliability and annotation quality, the results of which were analyzed using correlation coefficient statistics and the Mann-Whitney U test, respectively. Results: The anatomical conformity and annotation quality scores of all substructures were significantly higher in the trained group than in the untrained group, and the inter-expert agreement of evaluation results was favorable. Conclusion: This specification incorporates the common crus as an independent substructure and establishes a pixel-level annotation specification for inner ear substructures. It effectively improves the accuracy of manual annotation, provides high-quality data for training intelligent segmentation models and measuring quantitative parameters, and facilitates multicenter data sharing and the clinical translation of inner ear-related technologies, thus possessing important clinical and scientific research value.