W125

EVALUATION OF VEDAI ACROSS CLINICAL PSYCHIATRY: A BLINDED PSYCHIATRIST-RATED STUDY

Anant Akash Sakthivel — Vishnu Preye Sakthivel2, Yasir Haji3, Ritvij Satodiya4 1Carolinas Psychiatric Institute, 2Madras Medical College, 3Titu Maiorescu University, 4New York University

Background

AI tools are increasingly being explored for psychiatric clinical decision support, but standardized evaluation of output quality, safety, and educational value remains limited. VedAI is a psychiatry-focused platform designed to support clinical reasoning across diverse clinical scenarios. We evaluated VedAI’s outputs using blinded ratings by psychiatrist through a structured scoring framework.

Methods

VedAI responses were generated for diverse psychiatry-relevant clinical scenarios. Outputs were independently evaluated using a 10-domain rubric assessing diagnostic accuracy, case analysis clarity, chain-of-thought quality, chain-of-reason process, intervention relevance, missed recommendations, educational value, empathy/sensitivity, structured reporting, and citation accuracy. Outputs were independently evaluated using a 10-domain rubric, and individual domain ratings were aggregated into a composite overall performance score (range, 10–50) using pre-specified interpretive thresholds. A secondary assessment used the FAITAMental Health framework to assess credibility, user experience, user agency, equity/inclusivity, transparency, and safety/crisis management. Based on internal clinical evaluation metrics and the FAITA framework, the combined use provided a structured approach for assessing psychiatric AI decision-support tools.

Results

A total of 159 evaluations were available for the primary analysis. Mean overall score was 44.72/50 (SD 3.46; 95% CI 44.18-45.26), placing average performance in the upper Strong Quality range. Median overall score was 45 (IQR 43.3-47.25). Overall, 80.5% of outputs were rated Strong or Excellent, 89.9% were Good or better, and 39.6% were Excellent. Highestperforming domains were structured reporting, diagnostic accuracy, citation accuracy, chain-ofthought quality, and intervention relevance. Domains with comparatively greater opportunity for optimization included completeness of recommendations, empathy/sensitivity, and chain-ofreason process. Among 135 FAITA evaluations, mean total score was 20.32/24 (SD 1.41). Strongest FAITA areas were goal clarity, evidence-based content, and bias/fairness, while feedback/support, transparency, and safety/crisis management scored relatively lower. Overall score correlated moderately with FAITA total score (r=0.56, p < 0.001).

Conclusions

VedAI demonstrated consistently strong performance across diverse psychiatric scenarios, particularly in structure, diagnostic alignment, and educational usefulness. These findings support its potential as a psychiatrist-facing clinical decision support and educational tool. The results supported the feasibility of responsibly deploying specialized AI intelligences in mental health care while maintaining standards for transparency safety, and clinician-centered usability. Further work should strengthen safety-related features and prospectively evaluate performance in real-world workflows.