Our metric is a combination of Cohen's kappa and the F1-scores. Specifically, we have computed the final score of each team on each test image as: score= Cohen's kappa + (macro-averaged F1-score + micro-averaged F1-score)/2.
The overall score will be computed as the average of score on all test images. Given the large data size and continuous nature of the score, it is extremely unlikely to observe ties among teams.
The confusion metric is 4-by-4, where each row is normalized to sum up to 1. The four groups are benign and Gleason grades 3,4,5. For example, the first row of the confusion matrix of 1 YujinHu indicates that 95.94% of benign pixels are correctly classified, 0.07% of benign pixels are misclassified to grade 3, 0.98% misclassified to grade 4, and 3.00% misclassified to grade 5.
Rank | UserID | Score | ||||
1 | YujinHu | 0.845151814 | 1 | YujinHu | 95.94% | |
2 | nitinsinghal | 0.792585244 | 11.70% | |||
3 | ternaus | 0.789663211 | 9.15% | |||
4 | zhangjingmri | 0.778060956 | 24.29% | |||
5 | sdsy888 | 0.759776281 | ||||
6 | cvblab | 0.757838005 | 2 | nitinsinghal | 82.95% | |
7 | XiaHua | 0.716059262 | 16.60% | |||
8 | AlirezaFatemi | 0.712536629 | 16.79% | |||
9 | jpviguerasguillen | 0.649811671 | 25.24% | |||
10 | qq604395564 | 0.643760156 | ||||
11 | Unipabs | 0.587781206 | 3 | ternaus | 90.96% | |
12 | ed15b055 | 0.530973711 | 18.79% | |||
13 | alxndrkalinin | 0.525332123 | 20.29% | |||
14 | bb_mlb_56 | 0.503856464 | 25.61% | |||
15 | ninashp | 0.503679041 | ||||
16 | jinchen0227 | 0.501883245 | 4 | zhangjingmri | 88.32% | |
17 | naiyh | 0.454468668 | 16.79% | |||
21.13% | ||||||
24.66% | ||||||
5 | sdsy888 | 88.96% | ||||
8.82% | ||||||
5.61% | ||||||
6.07% | ||||||
6 | cvblab | 72.47% | ||||
11.82% | ||||||
12.53% | ||||||
24.91% | ||||||
7 | XiaHua | 82.89% | ||||
5.84% | ||||||
5.91% | ||||||
0.06% | ||||||
8 | AlirezaFatemi | 94.23% | ||||
21.63% | ||||||
26.87% | ||||||
24.98% |