عملکرد مقایسهای مدلهای زبانی بزرگ: ChatGPT o3-mini و Deepseek R1 در سوالات هماتولوژی/آنکولوژی اطفال
کد: G-1493
نویسندگان: Sahel Sharifpoor Saleh *, Sara Mohammadi Kalashani ℗, Mohammad Zarbi
زمان بندی: زمان بندی نشده!
برچسب: دستیار مجازی هوشمند
دانلود: دانلود پوستر
خلاصه مقاله:
خلاصه مقاله
Background and aims: This study compares the performance of two large language models, ChatGPT o3-mini and Deepseek R1, in answering pediatric hematology/oncology multiple-choice questions. With the growing integration of artificial intelligence in clinical decision-making, it is essential to assess the ability of these models to accurately process complex medical queries. The study aims to evaluate their accuracy, response time, and reasoning capabilities under standardized conditions. Method: A cross-sectional analysis was performed using 100 self-assessment multiple-choice questions originally developed by the American Society of Pediatric Hematology/Oncology. Due to the inclusion of images in some items, eight questions were excluded, resulting in 92 paired items for evaluation. Both models were tested under controlled conditions with memory features disabled to ensure independent processing of each question. Performance metrics included the accuracy of response selection, response time measured from query submission to answer output, and qualitative assessment of the reasoning process. IBM SPSS version 27 was utilized for the statistical analyses. Statistical analyses, including McNemar’s test and the Mann-Whitney U test, were applied to identify significant differences between the models. Results: Deepseek R1 demonstrated a significantly higher accuracy rate (approximately 85.87%) compared to ChatGPT o3-mini (65.5%). Although ChatGPT o3-mini provided faster response times, its performance in processing complex clinical scenarios was less consistent. The statistical analyses confirmed that the differences in both accuracy and response times between the models were significant (p 0.001). Conclusion: The findings indicate that while both models show promise for supporting clinical decision-making in pediatric hematology/oncology, Deepseek R1 offers superior accuracy and more reliable clinical reasoning, despite its slower response time. These results suggest that further research is warranted to optimize the trade-off between speed and precision, and to evaluate the applicability of these models in real-world clinical settings.
کلمات کلیدی
Artificial IIntelligence, Medicine, LLM, Hematology, Oncology