Vertebrae T2-T10

The vertebra plots show poor agreement across all six models, with Dice scores and consensus volume agreement ranging from 0% to 95%. Segmentation errors in four models (TotalSegmentator 1.5, TotalSegmentator 2.6, MultiTalent, Auto3DSeg) are the main cause, likely due to issues in the TotalSegmentator training data. After excluding these models, agreement between MOOSE and CADS improves, with Dice scores above 90% and consensus volume agreement over 85%. Remaining differences are due to segmentation coverage: MOOSE produces slightly larger vertebral volumes by including less spacing between vertebrae.

Vertebrae T2-T10

Agreement between all six models (Auto3DSeg, MOOSE, MultiTalent, CADS, TotalSegmentator 1.5 and TotalSegmentator 2.6)

Volume

Dice

Agreement between MOOSE and CADS

Volume

Dice