予測フォーマット

7 分

予測にはjson形式で表現される https //www asam net/standards/detail/openlabel/ を使用します。これは docid\ psyfyk8ranhgwtwmoztav に使用されるフォーマットと同じです。openlabelフォーマットの一般的な情報については docid\ tfdiv3dlaafndehaklui をご覧ください。サポートされる予測機能現在の予測アップロードapiは、以下のジオメトリをサポートしています名前 openlabelフィールド説明 cuboid cuboid 3dキューボイド bounding box bbox 2dバウンディングボックス bitmaps (segmentation) image 画像用セグメンテーションビットマップキューボイドの回転は docid\ tfdiv3dlaafndehaklui と同じである必要があります（詳細は docid 3jkwdhqcmvnvbuzbouzmr を参照）。2dジオメトリはピクセル座標で表現してください。このapiでは、関連する部分（キー）は frames , objects , streams , ontologies 、および metadata です。最後の metadata は最も簡単で、 schema version" "1 0 0" と記述するだけです（完全なコンテキストについては以下の例を参照）。また、 stream も簡単で、どのセンサー（カメラ、lidarなど）があるか、およびその名前を指定します（例 sensor name {"type" "camera"} や sensor name {"type" "lidar"} ）。こちらも完全なコンテキストについては以下の例を参照してください。シーケンス全体で時間的に変化する予測のすべての部分（座標や動的プロパティなど）は frames に記述されます。シーケンスの各フレームは frames 内のキーと値のペアで表現されます。キーは frame id で、値は以下のようになります frame id { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { } } frame properties timestamp の値（ミリ秒単位、非シーケンスデータの場合は0に設定することを推奨）は、各予測フレームを該当するアノテーション済みフレームとマッチングするために使用されるため、アノテーションされたシーンと一致する必要があります。 frame id （文字列）は、基となるシーンの記述に使用される frame id に従うことを推奨しますが、不一致の場合は frame properties timestamp が優先されます。非シーケンスデータの場合、 frame id には「 0 」が適切です。 frame properties external id と frame properties stream の値は、図のように空のままにすると自動的に解決されます。 objects キーには、キーと値のペアが含まれ、各ペアは基本的にそのフレーム内の1つのオブジェクトを表します。 objects キーは各フレーム内とルートの両方に存在することに注意してください。基本的に同じオブジェクトを記述しますが、時間的に変化する可能性のある情報（座標など、フレーム固有の情報）はフレームに属し、静的な情報（オブジェクトクラスなど）はルートに属します。オブジェクトキー（文字列）は任意ですが、同じオブジェクトを記述する場合は、異なる objects 内のキーが一致する必要があります。オブジェクトの詳細な記述方法については、以下の例を参照してください。キューボイドとバウンディングボックスには、フレーム固有の属性 confidence を指定することで存在確信度を提供できます。値は0 0から 1 0 の間の数値である必要があり、空の場合は1 0に設定されます。指定する場合は、数値として定義する必要があります。静的な object data type は、ツール内でクラス名として表示されます。セグメンテーションビットマップの場合、画像自体はアノテーション対象画像と同じ解像度のグレースケール8ビットpng画像です（実際の予測がアノテーション画像を部分的にしかカバーしていない場合や解像度が低い場合は、パディングやアップスケーリングが必要です）。画像自体は、base64エンコードされた文字列としてフレームのオブジェクトにペーストすることでopenlabel内に提供されます。以下の例を参照してください。さらに、各色レベルに対応するクラスを記述する ontology も提供する必要があります。8ビットグレースケール画像では、最大256クラスをエンコードできます。セグメンテーション以外の予測では、 ontology は省略できます。以下の例のcamera idは、アノテーション済みシーンのセンサーidと一致する必要があります。一方、lidarセンサーの対応するidは@lidarに設定してください。予測の例静的プロパティ color を持つ2フレームの2dバウンディングボックス openlabelでは、バウンディングボックスは4つの値のリスト \[x, y, width, height] として表現されます。 x と y はバウンディングボックスの中心座標です。 width と height はバウンディングボックスの幅と高さです。 x と y の座標は画像の左上隅を基準とします。 { "openlabel" { "frames" { "0" { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "bbox" \[ { "attributes" { "num" \[ { "val" 0 85, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "camera id" } ] }, "name" "any human readable bounding box name", "val" \[ 1 0, 1 0, 40 0, 30 0 ] } ] } } } }, "1" { "frame properties" { "timestamp" 50, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "bbox" \[ { "attributes" { "num" \[ { "val" 0 82, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "camera id" } ] }, "name" "any human readable bounding box name", "val" \[ 2 0, 3 0, 30 0, 20 0 ] } ] } } } } }, "metadata" { "schema version" "1 0 0" }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "name" "any human readable bounding box name", "object data" { "text" \[ { "name" "color", "val" "red" } ] }, "type" "passengercar" } }, "streams" { "camera id" { "type" "camera" } } } } 静的プロパティ color を持つ2フレームの3dキューボイドキューボイドは10個の値のリスト \[x, y, z, qx, qy, qz, qw, width, length, height] として表現されます。 x 、 y 、 z はキューボイドの中心座標です。 x 、 y 、 z 、 width 、 length 、 height の単位はメートルです。 qx 、 qy 、 qz 、 qw はキューボイドの回転を表すクォータニオン値です。座標系とクォータニオンの詳細については docid\ tfdiv3dlaafndehaklui をご覧ください。 { "openlabel" { "frames" { "0" { "frame properties" { "timestamp" 0, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "cuboid" \[ { "attributes" { "num" \[ { "val" 0 85, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "@lidar" } ] }, "name" "any human readable cuboid name", "val" \[ 2 079312801361084, 18 919870376586914, 0 3359137773513794, 0 002808041640852679, 0 022641949116037438, 0 06772797660868829, 0 9974429197838155, 1 767102435869269, 4 099334155319101, 1 3691029802958168 ] } ] } } } }, "1" { "frame properties" { "timestamp" 50, "external id" "", "streams" {} }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "object data" { "cuboid" \[ { "attributes" { "num" \[ { "val" 0 87, "name" "confidence" } ], "text" \[ { "name" "stream", "val" "@lidar" } ] }, "name" "any human readable cuboid name", "val" \[ 3 123312801361927, 20 285740376586913, 0 0649137773513349, 0 002808041640852679, 0 022641949116037438, 0 06772797660868829, 0 9974429197838155, 1 767102435869269, 4 099334155319101, 1 3691029802958168 ] } ] } } } } }, "metadata" { "schema version" "1 0 0" }, "objects" { "1232b4f4 e3ca 446a 91cb d8d403703df7" { "name" "any human readable cuboid name", "object data" { "text" \[ { "name" "color", "val" "red" } ] }, "type" "passengercar" } }, "streams" { "@lidar" { "type" "lidar" } } } } 単一フレームのセグメンテーションビットマップ python pilを使用した小さなカラー画像の変換、アップスケーリング、パディング、およびbase64エンコードによる大きなグレースケール画像への変換このコード例は、解像度300 x 200のマルチカラー予測ビットマップ画像を、解像度1000x800のグレースケール画像に変換する方法を示しています。まずグレースケールに変換し、次に予測を600x400にリスケーリングし、両側に均等にパディングします。また、openlabelで使用できるように、画像をbase64エンコードされた文字列に変換するコードも含まれています。このコードは組み込みのnumpy関数のみを使用しており、パフォーマンスの最適化は行われていません。 import base64 import io import numpy as np from pil import image \# the original mapping used to produce the images original mapping = { (0,0,0) " background", (255,0,0) "class 1", (0,0,255) "class 2", } \# the grayscale mapping (this will also be the ontology in the openlabel) grayscale mapping = { " background" 0, "class 1" 1, "class 2" 2, } prediction = image open("my original prediction file png") # let's say this has resolution 300 x 200 def lookup(pixel color) return grayscale mapping\[original mapping\[tuple(pixel color)]] \# convert to grayscale via numpy array lookup prediciton numpy = np array(prediction) grayscale prediction numpy = np vectorize(lookup, signature="(m) >()")(prediciton numpy) grayscale prediction = image fromarray(grayscale prediction numpy astype(np uint8)) \# upscale to another resolution upscaled grayscale prediction = grayscale prediction resize((600, 400), resample=image resampling nearest) \# padding by first constructing a new background image of target size, and then paste the prediction in the right position padded grayscale prediction = image new("l", (1000, 800), 0) padded grayscale prediction paste(upscaled grayscale prediction, (201, 201)) image bytes = io bytesio() padded grayscale prediction save(image bytes, format="png") prediction str = base64 b64encode(image bytes getvalue()) decode("utf 8") セグメンテーションビットマップ用のopenlabel prediction str と grayscale mapping は、その後openlabel内で以下のように使用できます { "openlabel" { "frames" { "0" { "objects" { "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2" { "object data" { "image" \[ { "name" "a human readable name", "val" prediction str, "mime type" "image/png", "encoding" "base64", "attributes" { "text" \[ { "val" "camera id", "name" "stream" } ] } } ] } } }, "frame properties" { "streams" {}, "timestamp" 0, "external id" "" }, } }, "objects" { "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2" { "name" "07d469f9 c9ab 44ec 8d09 0c72bdb44dc2", "type" "segmentation bitmap" } }, "streams" { "camera id" { "type" "camera" } }, "metadata" { "schema version" "1 0 0" }, "ontologies" { "0" { "classifications" {str(v) k for k, v in grayscale mapping items()}, "uri" "" } } } } シーン内の複数のカメラに対して予測を提供する場合は、画像のリストを拡張できます。 kognic openlabel を使用してフォーマットを検証詳細については https //pypi org/project/kognic openlabel/ を参照してください。