Archive of the journal «Russian otorhinolaryngology» - Medical scientific journal «Russian otorhinolaryngology»

Medical Scientific Journal
Russian
Otorhinolaryngology
9, Bronnitskaya Str., Saint Petersburg, 190013, Russia
Tel./Fax: (921) 922-36-77, e-mail: text@pfco.ru
 ISSN 2413-4309 (online), ISSN 1810-4800 (print)  
Rossiiskaya otorinolaringologiya
Go to content
◄◄   |   
  |    ►►
Section: Section Otiology
Potential of multimodal language model for preliminary evaluation of otoscopic images
M. V. Komarov (1), O. I. Goncharov (2), A. A. Fedotova (3)
(1) Saint Petersburg Research Institute of Ear, Throat, Nose and Speech, Saint Petersburg, 190013, Russian Federation, (1), (3) Mechnikov North-Western State Medical University, Saint Petersburg, 195067, Russian Federation, (2) Almazov National Medical Research Centre, Saint Petersburg, 197341, Russian Federation, (1), (2), (3) City Hospital No. 26, Saint Petersburg, 196240, Russian Federation
UDK: УДК 616.284-072.1:519.766.2
DOI: https://doi.org/10.18692/1810-4800-2025-3-53-62
ABSTRACT
Abstract. A pilot study evaluated the capabilities of the universal multimodal LLM ChatGPT 03 for interpreting otoscopic images. Thirty-eight frames were grouped into nine clinical categories—from normal and foreign bodies to postoperative states and middle-ear tumors. A “gold standard” annotation was provided by two otorhinolaryngology experts (Cohen’s κ > 0.85), with consensus reached in cases of disagreement. Each frame was processed in a new session with the prompt “What do you see in this photo?” ChatGPT 03 achieved 100% accuracy in distinguishing “normal vs. pathology” (95% CI 90.8–100%), with sensitivity and specificity, PPV/NPV (positive predictive value/negative predictive value) = 100%. The correctness of its clinical diagnosis formulation was 81.6% (31/38). For five key morphological features (perforation, effusion, hyperemia, tympanosclerosis, cholesteatoma), the mean F1-score was 0.92, and Cohen’s κ = 0.87. Expert ratings of the utility of its text descriptions on a 5-point scale yielded M = 4.4 ± 0.6, ICC = 0.82, with no significant differences between groups (p = 0.24). Spearman’s ρ = 0.72 (p < 0.001) confirmed a strong positive correlation between the number of correctly identified features and the usefulness assessment. The average response time was 30–40 s. These findings underscore ChatGPT 03’s high potential for preliminary screening, report standardization, and education. Clinical implementation will require large-scale prospective validation, structured output, and integration of quantitative tools.
Publication date:
17.06.2025
Keywords:
otoscopy, multimodal language model, ChatGPT 03, middle ear diagnosis, morphological analysis, screening, telemedicine, explainable AI, classification accuracy, inter-rater agreement
For citation:
Komarov M. V., Goncharov O. I., Fedotova A. A. Potential of multimodal language model for preliminary evaluation of otoscopic images. Russian Otorhinolaryngology. 2025;24(3):53-62. (In Russ.) https://doi.org/10.18692/1810-4800-2025-3-53-62
All rights to this publication are registered. Link to Russian Otorhinolaryngology Journal is required.
Reprinting of both individual articles and the Journal itself without the publisher`s permission is prohibited.
The Journal editors and publisher are not liable for the content and accuracy of advertising information.
© St. Petersburg Research Institute of Ear, Throat, Nose, and Speech of the Ministry of Health of Russia
© Scientific Clinical Center of Otorhinolaryngology, FMBA of Russia
Логотип журнала "Российская оториноларингология"
© Co. Ltd. Polyforum Group, 2012—
Яндекс.Метрика
Back to content