The evaluation container compares the following outputs for the nodules in both phases (1 and 2) with the ground truth:

  1. Malignancy risk (ranging from 0 - 1) against binary malignancy labels (0 or 1). 
  2. Nodule type classification (class 0, 1, 2 for non-solid, part-solid, and solid) against the same labels. 

Performance for malignancy risk estimation is evaluated with the area under the receiver operating characteristic curve (AUC). Performance for nodule type classification is evaluated with accuracy. We use scikit-learn's roc_auc_score and accuracy_score for this. 

The leaderboard will be ranked based on the overall_score (as shown below)

        AUC = sklearn.metric.roc_auc_score(malignancy_labels, predicted_malignancy_risks)
        accuracy_score = sklearn.metrics.accuracy_score(nodule_type_labels, predicted_nodule_types)
        overall_score = 0.75 * AUC + 0.25 * accuracy