Orthographic Camera Sensor

Hi,

Is there any way of getting ortho cameras in CARLA? To get a geometrically correct map of the environment, I need “bird’s eye view” images which are only possible with orthographic projection.
I guess the alternative would be fully mapping the area with RGB + depth and then re-projecting it “upwards” into an orhtographic photo.

Thanks in advance

Hi @hos-b,
I am also looking for options to get an orthographic projection. Were you able to find any solutions?
I found in https://github.com/deepsense-ai/carla-birdeye-view they use the map topology to get the semantic renderings/projections, they basically rebuild the map from scratch and then generate rendering with opencv, which is not ideal. Unfortunately, that doesn’t work for me, I am looking for a map with sidewalk/buildings and all other entities that are present, and was curious if you found a way around.

Hi @maks

I did find a “workaround” which took almost a month of work. It’s much better than the deepsense repo, but it has its flaws for sure. I unfortunately cannot share the code, but I’ll explain the process.

  1. merge semseg & depth sensors: I made a class that simply initializes both to the same pose and syncs their output into a tuple.
  2. create a lot of those: I put 9 of them in a grid above my car looking down at a 90 angle.
  3. create a unified pointcloud: I pool the data from all 9 cameras, project the semantic labels to the local car coordinates and form a single pointcloud.
  4. reproject the points back to an imaginary camera to get your pixel values: I used nearest neighbor + some majority voting stuff.

a couple of notes:

  • Open3D is the go to library for handling point clouds. In C++ however, I couldn’t get it to link with CARLA so I used PCL instead. The scientific way of generating orthophotos is to create a mesh first which is much easier with Open3D. They might even have a method that generates an image from a pointcloud.

  • I needed the semantic top-down view (hence the majority voting). For RGB, I assume bilinear or bicubic interpolation would generate very good results. Also you’d obviously need to merge RGB with depth, not semantic.

  • This is definitely not the best way to do it but it worked for me. I wrote this in C++ so it might take less time to write in python, but it’ll probably be slower. I used a lot of multi-threading and OpenMP for-loops.

  • This process is very very slow and has to be completely synchronized. I don’t use the simulator in sync mode because I have absolute control over all agents (and I don’t use pedestrians). You have to figure out a synchronization scheme for your application.

  • Finally, watch out where you take your images. E.g. in tunnels you’d get nothing. I had to make a list of ‘forbidden’ Road IDs that get excluded from data collection.

It might just be easier to create a new sensor from scratch. There’s a tutorial on how to make one on the docs and all game engines have orthographic cameras, so you should be fine on the unreal side.

Appreciate your detailed response @hos-b
After a few days of fiddling with it, I figured a camera set very far above the map with a very narrow FOV gets me very close to what I need. Certainly, like with your setup, it doesn’t work with tunnels, bridges large trees etc.
Also, I was just checking Python API and found that cast_ray is now exposed in the “latest”, not sure how expensive it is, but it could be another alternative to construct the semantic map manually.

That’s one way to do it. Glad it worked for you. In my application I needed some geometric properties. e.g. the generated orthophoto had to represent a 10x10m square around the car & I needed to get rid of all the errors of perspective projection.
cast_ray could help if it’s not that computationally heavy since it’s great for parallelization and it’s closer to the theoretical approach of generating orthophotos. I’m too far in though, so I probably wouldn’t switch at this point.
Good luck with your project

I figured how to add the orthographic camera (requires installing UE4 and re-building the Carla binary).

I am attaching .diff (below) for anyone interested, with those changes any camera (tested rgb & semantic) can be made orthographic using the blueprint attributes as follows:
camera_bp.set_attribute(‘orthographic’, ‘true’)
camera_bp.set_attribute(‘OrthoWidth’, ‘1800’) // in cm

Note: Orthographic RGB camera has some issues when postprocessing effects are enabled, just set to “false” the “enable_postprocess_effects” attributed.

.diff
diff --git a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Actor/ActorBlueprintFunctionLibrary.cpp b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Actor/ActorBlueprintFunctionLibrary.cpp
index 85180b32..2e25817c 100644
--- a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Actor/ActorBlueprintFunctionLibrary.cpp
+++ b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Actor/ActorBlueprintFunctionLibrary.cpp
@@ -303,6 +303,19 @@ void UActorBlueprintFunctionLibrary::MakeCameraDefinition(
 AddRecommendedValuesForSensorRoleNames(Definition);
 AddVariationsForSensor(Definition);

+  // ORTHO switch
+  FActorVariation ORTHO;
+  ORTHO.Id = TEXT("orthographic");
+  ORTHO.Type = EActorAttributeType::Bool;
+  ORTHO.RecommendedValues = { TEXT("false") };
+  ORTHO.bRestrictToRecommended = false;
+
+  FActorVariation OrthoWidth;
+  OrthoWidth.Id = TEXT("OrthoWidth");
+  OrthoWidth.Type = EActorAttributeType::Float;
+  OrthoWidth.RecommendedValues = { TEXT("10.0") };
+  OrthoWidth.bRestrictToRecommended = false;
+
 // FOV
 FActorVariation FOV;
 FOV.Id = TEXT("fov");
@@ -361,6 +374,8 @@ void UActorBlueprintFunctionLibrary::MakeCameraDefinition(
 LensYSize.bRestrictToRecommended = false;

 Definition.Variations.Append({
+      ORTHO,
+      OrthoWidth,
     ResX,
     ResY,
     FOV,
@@ -1401,11 +1416,19 @@ void UActorBlueprintFunctionLibrary::SetCamera(
   ASceneCaptureSensor *Camera)
{
 CARLA_ABFL_CHECK_ACTOR(Camera);
+  if (RetrieveActorAttributeToBool("orthographic", Description.Variations, false) == false){
+    Camera->SetFOVAngle(
+        RetrieveActorAttributeToFloat("fov", Description.Variations, 90.0f));
+  }
+  else
+  {
+    Camera->SetupOrtho(
+        RetrieveActorAttributeToFloat("OrthoWidth", Description.Variations, 1000.0f)
+      );
+  }
 Camera->SetImageSize(
-      RetrieveActorAttributeToInt("image_size_x", Description.Variations, 800),
-      RetrieveActorAttributeToInt("image_size_y", Description.Variations, 600));
-  Camera->SetFOVAngle(
-      RetrieveActorAttributeToFloat("fov", Description.Variations, 90.0f));
+        RetrieveActorAttributeToInt("image_size_x", Description.Variations, 800),
+        RetrieveActorAttributeToInt("image_size_y", Description.Variations, 600));
 if (Description.Variations.Contains("enable_postprocess_effects"))
 {
   Camera->EnablePostProcessingEffects(
diff --git a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.cpp b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.cpp
index 3d912405..8423949c 100644
--- a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.cpp
+++ b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.cpp
@@ -9,6 +9,7 @@

#include "Carla/Game/CarlaStatics.h"

+#include "Math/Matrix.h"
#include "Async/Async.h"
#include "Components/DrawFrustumComponent.h"
#include "Components/SceneCaptureComponent2D.h"
@@ -20,6 +21,8 @@
#include "HighResScreenshot.h"
#include "Misc/CoreDelegates.h"
#include "RHICommandList.h"
+#include "Camera/CameraTypes.h"
+#include "Math/OrthoMatrix.h"

static auto SCENE_CAPTURE_COUNTER = 0u;

@@ -94,6 +97,15 @@ void ASceneCaptureSensor::SetFOVAngle(const float FOVAngle)
 CaptureComponent2D->FOVAngle = FOVAngle;
}

+void ASceneCaptureSensor::SetupOrtho(
+    float OrthoWidth
+  )
+{
+  check(CaptureComponent2D != nullptr);
+  CaptureComponent2D->ProjectionType = ECameraProjectionMode::Orthographic;
+  CaptureComponent2D->OrthoWidth = OrthoWidth;
+}
+
float ASceneCaptureSensor::GetFOVAngle() const
{
 check(CaptureComponent2D != nullptr);
diff --git a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.h b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.h
index 94b1566b..d331feb9 100644
--- a/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.h
+++ b/Unreal/CarlaUE4/Plugins/Carla/Source/Carla/Sensor/SceneCaptureSensor.h
@@ -66,6 +66,9 @@ public:
 UFUNCTION(BlueprintCallable)
 void SetFOVAngle(float FOVAngle);

+  UFUNCTION(BlueprintCallable)
+  void SetupOrtho(float OrthoWidth);
+
 UFUNCTION(BlueprintCallable)
 float GetFOVAngle() const;

1 Like

wow that’s great. thank you @maks. I was gonna use it but I realized I actually need the point cloud because I’m filtering some semantic IDs out before creating my orthophoto. It would be great if the CARLA devs would integrate your changes though. It’s never a bad idea to have more features.