The sternum plots show moderate agreement across all five models, with Dice scores above 85% and consensus volume agreement above 75%. Excluding CADS slightly improves the results. Pairwise comparisons show very good agreement within two model pairs: TotalSegmentator 2.6 with Auto3DSeg, and MOOSE with MultiTalent.