Evaluation variance

Are the values on the leaderboard reported only from a single evaluation run or are they computed as mean of ‘n’ evaluation runs? In the NoCrash benchmark, the results are reported as mean and standard deviation of multiple evaluation runs since CARLA 0.8.4 has significant non-determinism, is this also true for CARLA 0.9.9 and the leaderboard evaluation?

The evaluation of the leaderboard is done through a total of 20 routes, which have different weather profiles.

Regarding determinism, the leaderboard in itself is deterministic. However, CARLA 0.9.9 isn’t fully deterministic. The main problem you will probably face is that the background vehicles will not choose the same options when they reach an intersection. This is being worked on, but it isn’t ready yet.

Anyhow, lots of efforts have been made to make CARLA deterministic, and a huge improvement has been made since CARLA 0.8.4

Thanks for the update.

So, for each of the 20 routes, the evaluation is run only once, i.e., REPETITIONS=1?

Exactly, repetitions = 1