Scalabel Format
Scalabel format defines the protocol for importing image lists with optional automatic labels and exporting the manual annotations. This is also the format for BDD100K dataset.
Schema of the format is shown below. You can also use our Typescript
and Python type definitions. Most of the fields are optional depending
on your purpose. If you only want to upload a list of images when creating a
project, you only need url
. videoName
is used to group frames for each
tracking task. If you are annotating bounding boxes, you can ignore poly2d and
other label types.
Item List
, Categories
, Attributes
can be uploaded with separate
files. Or they could be contained in a single file, following the exporting format.
Exporting Format
The exporting format has the following fields.
- frames [ ]:
- item001
- item002
...
- config:
- image_size: (optional, valid when all images have the same size)
- width: int
- height: int
- attributes [ ]:
- name: string
- toolType: string (can be 'switch' or 'list')
- tagText: string (acronym when showing)
- values: string[]
- tagPrefix: string
- categories: string[]
Each item in the frame
field is an image with several fields.
attributes
, categories
are the list of tags given
to each label in images. Fields of item are given below.
- name: string (must be unique over the whole dataset!)
- url: string (relative path or URL to data file)
- videoName: string (optional)
- attributes: a dictionary of frame attributes
- intrinsics
- focal: [x, y]
- center: [x, y]
- nearClip:
- extrinsics
- location
- rotation
- timestamp: int64 (epoch time ms)
- frameIndex: int (optional, frame index in this video)
- size:
- width: int
- height: int
- labels [ ]:
- id: string
- index: int
- category: string (classification)
- manualShape: boolean
- manualAttributes: boolean
- score: float
- attributes: a dictionary of label attributes
- box2d:
- x1: float
- y1: float
- x2: float
- y2: float
- box3d:
- alpha:
- orientation:
- location: ()
- dimension: (3D point, height, width, length)
- poly2d:
- vertices: [][]float (list of 2-tuples [x, y])
- types: string
- closed: boolean
- rle:
- counts: str
- size: (height, width)
- graph: (optional)
- nodes [ ]:
- location: [x, y] or [x, y, z]
- category: string
- visibility: string (optional)
- type: string (optional)
- score: float (optional)
- id: string
- edges [ ]:
- source: string
- target: string
- type: string (optional)
- type: string (optional)
More details about the fields
- name / videoName / url
- When there is no url the data folder structure is assumed to be:
<data_root>/videoName (if any)/name
If your data folder structure differs from that, you can store the relative path from <data_root> to the data file in url.
Note that ‘name’ must be unique over the whole dataset, s.t.
frameGroup
can refer to each frame via its name.
labels
index: index of the label in an image or a video
manualShape: whether the shape of the label is created or modified manually
manualAttributes: whether the attribute of the label is created or modified manually
score: the confidence or some other ways of measuring the quality of the label.
box2d: box includes the pixel at x2,y2 - width = x2 - x1 + 1, height = y2 - y1 + 1
- box3d - follows the convention in the KITTI dataset.
alpha: observation angle if there is a 2D view
location: 3D center of the box, stored as 3D point in camera coordinates, meaning the axes (x,y,z) point right, down, and forward.
orientation: 3D orientation of the bounding box, stored as axis angles in the same coordinate frame as the location.
dimension: 3D box size, with length in x direction, height in y direction and width in z direction
poly2d
types: Each character corresponds to the type of the vertex with the same index in vertices. ‘L’ for vertex and ‘C’ for control point of a bezier curve.
closed: true for polygon and otherwise for path
graph
- nodes
location: 2D or 3D coordinates. In 2D: (x, y), x horizontal, y vertical, (0, 0) top left corner.
category: Either joint name or type of segmentation (see closed in poly2d).
visibility: Visibility of joint for pose.
type: Type of vertex for segmentation (see type in poly2d).
score: Confidence score during prediction.
id: Unique ID.
- edges
source: Unique ID of the source node of the edge.
target: Unique ID of the target node of the edge.
type: Type of edge.
type: Specification of graph.
If your dataset contains multiple data sources (e.g. multiple cameras or other sensors), you can group frames together using frameGroup
.
This data structure inherits from frame
, s.t. each frameGroup
has all of the attributes above, plus a list of frame names that are assigned to the group:
- [inherits all attributes from frame]
- frames: [ ]str (list of frame names in the group)
KITTI Format
The KITTI Velodyne dataset contains a pointcloud file (.bin
) and four corresponding image files (.png
).
Currently, Scalabel only supports .ply
files for pointclouds. Please refer to
this script for conversion purposes.
The data structure is similar to that used by frameGroup
above, where the frame names consist of the pointcloud and the four corresponding images.
You can have a quick try by submitting the relevant files in
examples/kitti
and choose Point Cloud in Item Type
and 3D Bounding Box in Label Type
.