DATASET EXPLORATION

The prediction format

8 min

predictions use the openlabel format https //www asam net/standards/detail/openlabel/ , which is expressed in json this is the same format as the one used for pre annotations docid\ s1enygschymkq7jff dcx general information about the openlabel format can be found in openlabel format docid\ awhhsrojm96zowliryw7d supported prediction features the current api for uploading predictions supports the following geometries name openlabel field description cuboid cuboid cuboid in 3d bounding box bbox bounding box in 2d bitmaps (segmentation) image segmentation bitmap for images the rotation of cuboids should be the same as that in openlabel format docid\ awhhsrojm96zowliryw7d (see coordinate systems docid\ xqf6uaqsofwvavmtepdec for more information) 2d geometries should be expressed in pixel coordinates for this api, the relevant parts (keys) are frames , objects , streams , ontologies and metadata the last one ( metadata ) is the easiet one, and should just read schema version" "1 0 0" (see examples below for full context) also stream is straightforward, and should specify what sensors (cameras, lidars, ) there are and what their name, like sensor name {"type" "camera"} or sensor name {"type" "lidar"} again, see the examples below for full context all parts of a prediction that is time varying throughout a sequence is described in frames , such as corodinates and dynamic properties each frame in the sequence is represented by a key value pair under frames the key is the frame id , and the value should look like frame id { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { } } the value for frame properties timestamp (measured in ms, recommended to set to 0 for non sequence data) will be used for matching each predicted frame to the relevant annotated frame, and must therefore match the scene that has been annotated we recommend that frame id (a string) follows the frame id used to describe the underlying scene, although frame properties timestamp will take precedence in case of mismatch in case of non sequence data, a good choice for frame id is "0" the values for frame properties external id and frame properties stream will be resolved automatically if left empty as shown the key objects in turn contains key value pairs, where each such pair is basically an object in that frame note that there is the key objects in each frame, as well as in the root they describe basically the same objects, but the information that is potentially time varying (i e frame specific, such as coordinates) belongs to the frame, whereas static information (such as the object class) belongs in the root the object keys (strings) are arbitrary, but must match the keys in the different objects if they are describing the same object please refer to the examples below on how to describe the objects in detail for cuboids and bounding boxes, an existence confidence can be provided by specifying the frame specific attribute confidence it must be a numeric value between 0 0 and 1 0, and will be set to 1 0 if left empty if provided, it must be defined as a numeric value the static object data type will show up as the class name in the tool for segmentation bitmaps, the image itself is a grayscale 8 bit png image of the same resolution as the annotated images (if the actual prediction only partially cover the annotated image or is of lower resolution, it has to be padded and/or upscaled) the image itself is supplied in the openlabel by pasting its base64 encoding as a string as an object to a frame see the example below moreover, also an ontology has to be supplied which describes what class corresponds to each color level with an 8 bit grayscale image, it is possible to encode up to 256 classes the ontology can be left out for non segmentation predictions the camera id in the examples below must match the id of the sensors in the annotated scene, whereas the corresponding id for the lidar sensor should be set to @lidar prediction examples 2d bounding box in two frames with a static property color in openlabel, a bounding box is represented as a list of 4 values \[x, y, width, height] , where x and y are the center coordinates of the bounding box the width and height are the width and height of the bounding box the x and y coordinates are relative to the upper left corner of the image { "openlabel" { "frames" { "0" { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "bbox" \[ { "attributes" { "num" \[ { "val" 0 85, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "camera id" } ] }, "name" "any human readable bounding box name", "val" \[ 1 0, 1 0, 40 0, 30 0 ] } ] } } } }, "1" { "frame properties" { "timestamp" 50, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "bbox" \[ { "attributes" { "num" \[ { "val" 0 82, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "camera id" } ] }, "name" "any human readable bounding box name", "val" \[ 2 0, 3 0, 30 0, 20 0 ] } ] } } } } }, "metadata" { "schema version" "1 0 0" }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "name" "any human readable bounding box name", "object data" { "text" \[ { "name" "color", "val" "red" } ] }, "type" "passengercar" } }, "streams" { "camera id" { "type" "camera" } } } } 3d cuboid in two frames with a static property color cuboids are represented as a list of 10 values \[x, y, z, qx, qy, qz, qw, width, length, height] , where x , y , and z are the center coordinates of the cuboid x , y , z , width , length , and height are in meters qx , qy , qz , and qw are the quaternion values for the rotation of the cuboid read more about coordinate systems and quaternions openlabel format docid\ awhhsrojm96zowliryw7d { "openlabel" { "frames" { "0" { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "cuboid" \[ { "attributes" { "num" \[ { "val" 0 85, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "@lidar" } ] }, "name" "any human readable cuboid name", "val" \[ 2 079312801361084, 18 919870376586914, 0 3359137773513794, 0 002808041640852679, 0 022641949116037438, 0 06772797660868829, 0 9974429197838155, 1 767102435869269, 4 099334155319101, 1 3691029802958168 ] } ] } } } }, "1" { "frame properties" { "timestamp" 50, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "cuboid" \[ { "attributes" { "num" \[ { "val" 0 87, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "@lidar" } ] }, "name" "any human readable cuboid name", "val" \[ 3 123312801361927, 20 285740376586913, 0 0649137773513349, 0 002808041640852679, 0 022641949116037438, 0 06772797660868829, 0 9974429197838155, 1 767102435869269, 4 099334155319101, 1 3691029802958168 ] } ] } } } } }, "metadata" { "schema version" "1 0 0" }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "name" "any human readable cuboid name", "object data" { "text" \[ { "name" "color", "val" "red" } ] }, "type" "passengercar" } }, "streams" { "@lidar" { "type" "lidar" } } } } a single frame segmentation bitmap transforming, upscaling, padding and base64 encoding a small color image to a larger grayscale image using python pil this code example gives an example of how to go from a multicolor prediction bitmap image of resolution 300 x 200 to a grayscale image of resolution 1000 x 800, by first converting to grayscale, then rescaling the prediction to 600 x 400 and then padding equally on the sides it also includes code for base64 encoding the image as a string, that later can be used in the openlabel this code only makes use of built in numpy functions, but is not optimized for performance import base64 import io import numpy as np from pil import image \# the original mapping used to produce the images original mapping = { (0,0,0) " background", (255,0,0) "class 1", (0,0,255) "class 2", } \# the grayscale mapping (this will also be the ontology in the openlabel) grayscale mapping = { " background" 0, "class 1" 1, "class 2" 2, } prediction = image open("my original prediction file png") # let's say this has resolution 300 x 200 def lookup(pixel color) return grayscale mapping\[original mapping\[tuple(pixel color)]] \# convert to grayscale via numpy array lookup prediciton numpy = np array(prediction) grayscale prediction numpy = np vectorize(lookup, signature="(m) >()")(prediciton numpy) grayscale prediction = image fromarray(grayscale prediction numpy astype(np uint8)) \# upscale to another resolution upscaled grayscale prediction = grayscale prediction resize((600, 400), resample=image resampling nearest) \# padding by first constructing a new background image of target size, and then paste the prediction in the right position padded grayscale prediction = image new("l", (1000, 800), 0) padded grayscale prediction paste(upscaled grayscale prediction, (201, 201)) image bytes = io bytesio() padded grayscale prediction save(image bytes, format="png") prediction str = base64 b64encode(image bytes getvalue()) decode("utf 8") openlabel for a segmentation bitmap the prediction str and grayscale mapping can thereafter be used in the openlabel like { "openlabel" { "frames" { "0" { "objects" { "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2" { "object data" { "image" \[ { "name" "a human readable name", "val" prediction str, "mime type" "image/png", "encoding" "base64", "attributes" { "text" \[ { "val" "camera id", "name" "stream" } ] } } ] } } }, "frame properties" { "streams" {}, "timestamp" 0, "external id" "" }, } }, "objects" { "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2" { "name" "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2", "type" "segmentation bitmap" } }, "streams" { "camera id" { "type" "camera" } }, "metadata" { "schema version" "1 0 0" }, "ontologies" { "0" { "classifications" {str(v) k for k, v in grayscale mapping items()}, "uri" "" } } } } if providing predictions for multiple cameras in the scene, the list of images could be extended using kognic openlabel to validate the format see kognic openlabel https //pypi org/project/kognic openlabel/ for more information