Evaluation

We currently support evaluation of nine tasks: Image tagging, object detection, instance segmentation, semantic segmentation, panoptic segmentation, pose estimation, boundary detection, multi-object tracking, and multi-object tracking and segmentation. To evaluate your algorithms on each task, input your predictions and the corresponding ground truth annotations in Scalabel format.

Image Tagging

The tagging evaluation uses standard classification metrics. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.tagging \
    --gt scalabel/eval/testcases/tagging/tag_gts.json \
    --result scalabel/eval/testcases/tagging/tag_preds.json \
    --config scalabel/eval/testcases/tagging/tag_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Detection

The detection evaluation uses the AP metric and follows the protocol defined in the COCO dataset. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.detect \
    --gt scalabel/eval/testcases/box_track/track_sample_anns.json \
    --result scalabel/eval/testcases/det/bbox_predictions.json \
    --config scalabel/eval/testcases/det/det_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Instance Segmentation

The instance segmentation evaluation also uses the AP metric and follows the protocol defined in the COCO dataset. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.ins_seg \
    --gt scalabel/eval/testcases/ins_seg/ins_seg_rle_sample.json \
    --result scalabel/eval/testcases/ins_seg/ins_seg_preds.json \
    --config scalabel/eval/testcases/ins_seg/ins_seg_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Semantic Segmentation

The semantic segmentation evaluation uses the standard Jaccard Index, commonly known as mean-IoU. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.sem_seg \
    --gt scalabel/eval/testcases/sem_seg/sem_seg_sample.json \
    --result scalabel/eval/testcases/sem_seg/sem_seg_preds.json \
    --config scalabel/eval/testcases/sem_seg/sem_seg_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Panoptic Segmentation

The panoptic segmentation evaluation uses the Panoptic Quality (PQ) metric. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.pan_seg \
    --gt scalabel/eval/testcases/pan_seg/pan_seg_sample.json \
    --result scalabel/eval/testcases/pan_seg/pan_seg_preds.json \
    --config scalabel/eval/testcases/pan_seg/pan_seg_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Pose Estimation

The pose estimation evaluation also uses the AP metric and follows the protocol defined in the COCO dataset. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.pose \
    --gt scalabel/eval/testcases/pose/pose_sample.json \
    --result scalabel/eval/testcases/pose/pose_preds.json \
    --config scalabel/eval/testcases/pose/pose_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Boundary Detection

The boundary detection evaluation uses the F-measure for boundaries using morphological operators. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.boundary \
    --gt scalabel/eval/testcases/boundary/boundary_gts.json \
    --result scalabel/eval/testcases/boundary/boundary_preds.json \
    --config scalabel/eval/testcases/boundary/boundary_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for evaluation.

Multi-object Tracking

The MOT evaluation uses the CLEAR MOT metrics. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.mot \
    --gt scalabel/eval/testcases/box_track/track_sample_anns.json \
    --result scalabel/eval/testcases/box_track/track_predictions.json \
    --config scalabel/eval/testcases/box_track/box_track_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--iou-thr IOU_TRESH
                        IoU threshold for mot evaluation.
--ignore-iof-thr IGNORE_IOF_THRESH
                        Ignore iof threshold for mot evaluation.
--ignore-unknown-cats IGNORE_UNKNOWN_CATS
                        Ignore unknown categories for mot evaluation.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for mot evaluation.

Multi-object Tracking and Segmentation

The MOTS evaluation also uses the CLEAR MOT metrics, but uses mask IoU instead of box IoU. You can start the evaluation by running, e.g.:

python3 -m scalabel.eval.mots \
    --gt scalabel/eval/testcases/seg_track/seg_track_sample.json \
    --result scalabel/eval/testcases/seg_track/seg_track_preds.json \
    --config scalabel/eval/testcases/seg_track/seg_track_configs.toml

Available arguments:

--gt GT_PATH, -g GT_PATH
                        path to ground truth annotations.
--result RESULT_PATH, -r RESULT_PATH
                        path to results to be evaluated.
--config CFG_PATH, -c CFG_PATH
                        Config path. Contains metadata like available categories.
--out-dir OUT_DIR, -o OUT_DIR
                        Output path for evaluation results.
--iou-thr IOU_TRESH
                        IoU threshold for mots evaluation.
--ignore-iof-thr IGNORE_IOF_THRESH
                        Ignore iof threshold for mots evaluation.
--ignore-unknown-cats IGNORE_UNKNOWN_CATS
                        Ignore unknown categories for mots evaluation.
--nproc NUM_PROCS, -p NUM_PROCS
                        Number of processes for mots evaluation.