Main Video





Failure Mode Analysis




Real-world apartment demos

Collected from an apartment in the real world. Each tour was subsampled at 0.5 Hz to roughly convert from human walking pace to the pace used in our experiments, and spans 20 sampled frames. Frames that represent unexplored areas (frontiers) the robot could explore next were identified. Demos indicate if FiLM-Nav accurately predicted which frontier would lead to the closest instance of the requested object.




RealEstate10K demos

Uncut POV tours randomly sampled from the RealEstate10K dataset. Each tour was subsampled at 0.5 Hz to roughly convert from human walking pace to the pace used in our experiments, and spans 20 sampled frames. Frames that represent unexplored areas (frontiers) the robot could explore next were identified. Demos indicate if FiLM-Nav accurately predicted which frontier would lead to the closest instance of the requested object.




ObjectNav HM3D v0.2 Gallery (5 examples each)

SPL: 100% - 75%

19.6% of all validation episodes

SPL: 75% - 50%

24.0% of all validation episodes

SPL: 50% - 25%

21.0% of all validation episodes

SPL: 25% - 0%

12.4% of all validation episodes