BACKGROUND: Artificial intelligence (AI) has rapidly advanced in healthcare and dental education, significantly impacting diagnostic processes, treatment planning, and academic training. The aim of this study is to evaluate the performance differences between different large language models (LLMs) by analyzing their accuracy rates in answers to multiple choice oral pathology questions. METHODS: This study evaluates the performance of eight LLMs (Gemini 1.5, Gemini 2, ChatGPT 4o, ChatGPT 4, ChatGPT o1, Copilot, Claude 3.5, Deepseek) in answering multiple-choice oral pathology questions from the Turkish Dental Specialization Examination (DUS). A total of 100 questions from 2012 to 2021 were analyzed. Questions were classified as "case-based" or "knowledge-based". The responses were classified as "correct" or "incorrect" based on official answer keys. To prevent learning biases, no follow-up questions or feedback were provided after the LLMs' responses. RESULTS: Significant performance differences were observed among the models (p < 0.001). ChatGPT o1 achieved the highest accuracy (96 correct, 4 incorrect), followed by Claude (84 correct), Gemini 2 and Deepseek (82 correct each). Copilot had the lowest performance (61 correct). Case-based questions showed notable performance variations (p = 0.034), where ChatGPT o1 and Claude excelled. For knowledge-based questions, ChatGPT o1 and Deepseek demonstrated the highest accuracy (p < 0.001). Post-hoc analysis revealed that ChatGPT o1 performed significantly better than most other models across both case-based and knowledge-based questions (p < 0.0031). CONCLUSION: LLMs demonstrated variable proficiency in oral pathology questions, with ChatGPT o1 showing higher accuracy. LLMs shows promise as a supplementary educational tool, though further validation is required.
No clinical trial protocols linked to this paper
Clinical trials are automatically linked when NCT numbers are found in the paper's title or abstract.PICO Elements
No PICO elements extracted yet. Click "Extract PICO" to analyze this paper.
Paper Details
MeSH Terms
Associated Data
No associated datasets or code repositories found for this paper.
Related Papers
Related paper suggestions will be available in future updates.