EVALUATION OF VEDAI ACROSS CLINICAL PSYCHIATRY: A BLINDED PSYCHIATRIST-RATED STUDY
Background
AI tools are increasingly being explored for psychiatric clinical decision support, but standardized evaluation of output quality, safety, and educational value remains limited. VedAI is a psychiatry-focused platform designed to support clinical reasoning across diverse clinical scenarios. We evaluated VedAI’s outputs using blinded ratings by psychiatrist through a structured scoring framework.
Methods
VedAI responses were generated for diverse psychiatry-relevant clinical scenarios. Outputs were independently evaluated using a 10-domain rubric assessing diagnostic accuracy, case analysis clarity, chain-of-thought quality, chain-of-reason process, intervention relevance, missed recommendations, educational value, empathy/sensitivity, structured reporting, and citation accuracy. Outputs were independently evaluated using a 10-domain rubric, and individual domain ratings were aggregated into a composite overall performance score (range, 10–50) using pre-specified interpretive thresholds. A secondary assessment used the FAITAMental Health framework to assess credibility, user experience, user agency, equity/inclusivity, transparency, and safety/crisis management. Based on internal clinical evaluation metrics and the FAITA framework, the combined use provided a structured approach for assessing psychiatric AI decision-support tools.
Results
A total of 159 evaluations were available for the primary analysis. Mean overall score was 44.72/50 (SD 3.46; 95% CI 44.18-45.26), placing average performance in the upper Strong Quality range. Median overall score was 45 (IQR 43.3-47.25). Overall, 80.5% of outputs were rated Strong or Excellent, 89.9% were Good or better, and 39.6% were Excellent. Highestperforming domains were structured reporting, diagnostic accuracy, citation accuracy, chain-ofthought quality, and intervention relevance. Domains with comparatively greater opportunity for optimization included completeness of recommendations, empathy/sensitivity, and chain-ofreason process. Among 135 FAITA evaluations, mean total score was 20.32/24 (SD 1.41). Strongest FAITA areas were goal clarity, evidence-based content, and bias/fairness, while feedback/support, transparency, and safety/crisis management scored relatively lower. Overall score correlated moderately with FAITA total score (r=0.56, p < 0.001).
Conclusions
VedAI demonstrated consistently strong performance across diverse psychiatric scenarios, particularly in structure, diagnostic alignment, and educational usefulness. These findings support its potential as a psychiatrist-facing clinical decision support and educational tool. The results supported the feasibility of responsibly deploying specialized AI intelligences in mental health care while maintaining standards for transparency safety, and clinician-centered usability. Further work should strengthen safety-related features and prospectively evaluate performance in real-world workflows.