USING SLAM WISELY

Using SLAM wisely – both the term and the implementation

Mobile robots should know where they are. Majority of the autonomous operation relies on constantly estimating the location and the orientation of the robot, that is, the pose of the robot. Positioning can be described as the process of determining the pose of a robot relative to a given map of the environment. Mapping refers to the process of creating a consistent world model of the robot’s operating environment. Mapping is tightly coupled with positioning since creating a spatial model requires knowing the robot’s pose. A common approach for creating a world model is called Simultaneous Localization and Mapping (SLAM). If you have read anything else other than comic books or outdoor magazines, you have probably come across that acronym. As the full name of the SLAM states, the robot will construct a map of an unknown environment while localising itself with respect to that map. It is one of those features that you should master to secure your machine’s smooth mobility, especially in outdoor environments with adverse weather conditions. Heck, it is something that we humans do all the time in similar situations. Unfortunately, it is not a solution to solve all the problems, nor is it something that you should always use. In fact, many people are using SLAM continuously in vain and actually in many cases, that will have a negative effect on the overall performance, for various reasons. We’ll tell a bit about the SLAM, before we justify why we use SLAM only when it is absolutely necessary.

Humans are doing SLAM without knowing that

Most of us have wandered in a partly or completely unfamiliar city, without the help of a modern mobile phone (bad battery) or even without a map (you forgot it in your room), just to find yourself not sure of our location after some hundreds of metres away from the hotel. Your belief about which corner you are, is spread everywhere thanks to those few crossings and turns. The further you go, the less sure you are about your whereabouts. At that point, you’ll have very few options to choose from: you can either take a taxi, use public transportation, ask a local, or do like so many of us tend to do, just keep on walking and hoping that you will come across one of those famous locations. You know, something like Eiffel tower, Big Ben, Space Needle or the best restaurant you have ever been in; your earlier fuzzy memories or that short study of the tourist map in your hotel room, will put you back on the map. At that moment your belief about your location will collapse onto a much smaller area. Additionally, not only will you become more certain of your present location, but you’ll also have a better estimate of the path you have travelled between that moment and the moment when you nodded to the doorman on your way out to explore the delights of the town. And at some point, after a long and rewarding, kind of circular tour, when you accidentally end up back at the hotel, from where you started your walkabout, you obviously know exactly where you are. In SLAM vocabulary, you have completed the important loop closure.

Figure 1: Topological mapping is widely used by us humans. In it, the topological relationships between the points (i.e., the landmarks) are maintained, but it fully lacks the use of any presice distance and direction information. (Photo by: Triff/Shutterstock)

Loop closure is vital

Robots utilise loop closure to mitigate the accumulation of location estimation errors over time, which are caused by inherent noise and unaccounted dynamics in their sensors and actuators. Loop closure events occur when a robot re-observes a set of assumed stationary features, providing a constraint on the estimated trajectory and thus information about the robot’s location. In essence, these events necessitate that all motor commands issued between observations cumulatively result in a specific transformation of the robot’s coordinate system. Successful loop closures enhance key aspects such as localization accuracy, map refinement, robustness to dynamic environments, and overall system stability. Loop closure is a critical and prevalent technique in various commercial applications, including robotics, terrestrial mapping, augmented/virtual reality solutions, and what not else, where episodic memory-based spatial reconstruction is needed. 

Generic SLAM in steps

One compact way to describe SLAM solutions in general, proposed e.g., by Solà (2004), is to consider them to be built upon three fundamental operations, each of which is repeated at every time step:

  1. Motion: The robot moves, attaining a fresh perspective of the scene. This movement, however, introduces uncertainty in the robot’s localization due to inevitable noise and errors. To automate this process the motion model is required.
  2. Discovery: The robot identifies notable features or landmarks in the environment that need to be added to the map. Due to inaccuracies in the exteroceptive sensors (such as LiDARs, cameras, and radars), the exact location of these landmarks remains uncertain. Furthermore, given the existing uncertainty in the robot’s location, these uncertainties must be accurately combined. An automated solution necessitates the inverse observation model to ascertain the landmarks’ positions in the scene using sensor data.
  3. Observation: The robot recognizes previously mapped landmarks and uses them to refine its self-localization and the spatial localization of all landmarks. Consequently, the uncertainties associated with both localization and landmarks decrease. An automated solution needs the direct observation model to anticipate the measurement values based on the predicted landmark location and the robot’s localization.

These three models, in conjunction with an estimator implementation, enable the construction of an automated SLAM solution. The estimator is tasked with the accurate propagation of uncertainties each time one of the above situations arises. There are multiple methods available for implementing the estimator component. The most widely used solutions are presented later in this piece.

Focus on localizing or on mapping?

As the acronym states, SLAM is all about localising the machine and mapping the environment as the machine moves on it. In many use cases, the focus is on how to localise the machine, in respect to the newly created map. That is quite understandable, as in most cases the main objective is to get the machine safely from a particular location (global or local) to another location, in most cases via somehow optimised trajectory. However, that is not the full story. In some cases, the controlled movements of the machine are only done to make the mapping part of SLAM meet the requirements set by the use case. In other words, a mobile machine is used only as the platform for the sensory setup, whatever that might be, and the main objective is to create a map, which is detailed and dynamic enough to fulfil the mission specifications. These specifications dictate the type of the map needed; in some cases, the map used for the localization and navigation is not suitable for the task. A different type of format is needed. In many cases, some sort of high-precision mesh format, or similar, is created to fulfil the requirements.

How to handle a dynamic environment?

SLAM is used, not only for creating new maps when the machine visits a place for the first time, but also for updating the maps of areas it visits frequently. That feature is useful especially when the environment is highly dynamic, or the environmental conditions tend to vary a lot, seasonally like here in Finland, or lighting-wise in general when using camera-based solutions. The negative aspects of doing continuous SLAM include the need for extensive processing capacity, but more importantly, the real risk of handling all sorts of challenges while creating the updated map needed for the optimal navigation. And in any case, a map being created on-the-fly, is never as good as a completed, validated, and verified one.

For that reason, another option is to use that existing map, detect meaningful deviations based on incoming sensor data and then decide when a) pay no attention to detected changes, b) activate a module, which will integrate the detected changes to the existing map or c) do a full SLAM cycle to get a better suited, fully verified, and optimised map of environment. That is our way of working, i.e., we use SLAM to make a map of the new area, verify and validate it and then after that use it every time we operate in that area. Our solution enables us to create these maps piecewise and then stitch them all up into one large representation of the larger operation area.

Very short historical review

The term SLAM was originally introduced by J. J. Leonard and H. F. Durrant-Whyte (1991). It was built especially on the earlier work by R. Smith, et. al (1987). If the reader wants to learn more about the early days and generic formulation of the problem, we recommend the first part of this is a great tutorial paper by H. Durrant-Whyte and T. Bailey (2006). And for a bit newer view of the whole thing, get your hands on the extensive work by C. César, et. al (2016). If you want to save your effort, we recommend just buying THE BOOK by S. Thrun, et. al (2005). It has pretty much everything in one package. To read more about the underlying theory, get “Bayesian Filtering and Smoothing” by S. Särkkä (2013).

Figure 2a: The BOOK. (S. Thrun, et. al 2005)
Figure 2b: If you like equations. (S.Särkkä 2013)

Numbers from biz reports

Based on multiple (too many to list here, with partially conflicting numbers) business reports from different companies, SLAM Market is expected to grow at a CAGR of ~30-50 % over the various forecast periods to several USD Billions by 2030. One of the major drivers for the global SLAM market growth is the introduction of autonomous vehicles. Visual SLAM is a prominent trend in the global SLAM market. It is being employed actively in a wide range of applications from Unmanned Aerial Vehicles (UAVs), and Augmented Reality (AR) to Field & Service robots, mainly due to increase in computer processing speed and the availability of low-cost sensors. UAVs have rapidly become popular for real-time mapping, surveillance, building inspection, search and rescue operations, aerial photography, and so on. However, the Visual SLAM is facing some well-known challenges, such as performing constantly well in scenarios that are affected by changes in illumination, weather, or seasons. The most accurate and widespread SLAM systems to date use visual or 3D LiDAR information. Visual SLAM methods work by extracting distinguishable cues from images, which are used as landmarks in the world to estimate the motion of the robot between camera frames. LiDAR SLAM methods estimate motion by finding rigid transformations between point clouds provided by the sensor. So, there are clear advantages and disadvantages in both approaches. Visual can be more robust in dynamic scenes, but extremely sensitive to changes in illumination and appearance. On the other hand, LiDAR SLAM can work more consistently over changes in lighting conditions or seasons, by exploiting the geometric structure of the world.

GIM Robotics favors LiDARs

As expert LiDAR users, we firmly believe that the benefits of LIDAR, and why not radar-based, solutions outperform the camera-based solutions especially when you are doing outdoor solutions in all-weather conditions. Some of those benefits are listed below:

  • Lightning conditions: cameras do not work well in bad or changing lighting conditions, e.g. darkness, direct sunlight, strong shadows etc.
  • Weather conditions: active sensors (LiDARs and radars) have better performance in adverse weather conditions than passive sensors.
  • Range accuracy: camera triangulates the range, which is much more inaccurate than what LiDARs can achieve with Time-of-Flight (ToF) or Frequency-Modulated Continuous Wave (FMCW)  calculations, especially on even slightly longer distances.
  • Details: even if images contain a lot of information, LiDARs capture the environment with much higher detail since every returned point is an accurate 3D point.
  • System complexity: More sensors make the overall system more complex. One LiDAR can cover 360-degree Field-of-View (FOV) whereas it would require several cameras to cover the same area.
  • Calibration: cameras are much more sensitive to calibration errors. In addition, camera calibration is a more complex process.
  • Feature detection: complex calculations are required to detect and track features (in addition to loop closure detection) with cameras. With LiDARs 3D point cloud already returns geometry of the environment.
  • Computational cost: Image processing takes a lot of computational resources, as the camera data processing is typically done by ML-algorithms, which are not deterministic. Our LiDAR algorithms are deterministic, which makes it a lot easier to track possible problems.
  • Robustness: See above.

The same market reports underline that EKF-SLAMFastSlam and Graph-Based SLAM are the most widely used solutions at the moment. These three mainstream solutions are shortly described below.

Figure 3. 3D LiDAR point cloud visualization.

EKF-SLAM

The EKF-SLAM performs simultaneous localization and mapping (SLAM) using an extended Kalman filter (EKF). EKF was the first algorithm to be integrated in SLAM and it has probably been the most widely used. EKF is based on a Bayesian framework where the probability distribution of individual elements in the state-vector are estimated recursively according to their previous distributions. The main idea of EKF-SLAM is to generalise the EKF state to include not only the robot pose but also the map, i.e., the landmarks. It also assumes that the probability distribution functions can be modelled as gaussian distributions which allows for some appreciative linear properties when deriving the algorithm. It takes in observed landmarks from the environment and compares them with known landmarks to find associations and new landmarks. It uses the associations to correct the state and state covariance. The new landmarks are augmented in the state vector.  The main loop consists of these primary operations:

  • Prediction — Predict the next state based on the control command and the current state. 
  • Landmark Extraction — Get the landmarks in the environment.
  • Correction — Update the state and state covariance using the observed landmarks.

In robotics, the EKF to SLAM problem was introduced through a series of seminal papers by Smith and Cheeseman (1986) and Smith et al. (1990). The first implementations of EKF SLAM were due to Moutarlier and Chatila (1989) and Leonard and Durrant-Whyte (1991).

FastSLAM

FastSLAM is based on the theory of Bayesian filtering and maintains a probability distribution over the possible states of the system, updating it recursively as new information becomes available. In the case of FastSLAM, the system state consists of the robot’s pose and the map of the environment. The algorithm maintains a probability distribution over these states, represented by a set of particles. Each particle consists of a hypothesised robot pose and map. The FastSLAM algorithm updates these particles as the robot moves and observes its surroundings. The observation data is used to calculate the likelihood of each particle, and the particles are resampled according to their likelihoods. This process results in a set of particles that represents the robot’s current pose and map estimate. FastSLAM takes advantage of an important characteristic of the SLAM problem: landmark estimates are conditionally independent given the robot’s path. FastSLAM algorithm decomposes the SLAM problem into a robot localization problem, and a collection of landmark estimation problems that are conditioned on the robot pose estimate. A key characteristic of FastSLAM is that each particle makes its own local data association. In contrast, EKF techniques must commit to a single data association hypothesis for the entire filter.  Seminal papers on the topic include for example, Montemerlo et al. (2002), Hähnel et al. (2003), Montemerlo et al. (2003) and Thrun et al. (2004).

GRAPH-SLAM

GraphSLAM is a prevalent algorithm for addressing the offline SLAM problem. It converts the SLAM posterior into a graphical network that represents the log-likelihood of the data. The graph is then simplified using variable elimination techniques, leading to a lower-dimensional problem that is subsequently solved with conventional optimization techniques. The core concept behind GraphSLAM is straightforward: it extracts a set of soft constraints from the data, represented by a sparse graph. It then derives the map and the robot path by resolving these constraints into a globally consistent estimate. The constraints are generally nonlinear, but in the process of resolving them they are linearized. The resulting least squares problem is then solved using standard optimization techniques. GraphSLAM can handle a large number of features and can incorporate occasional GPS information into the mapping process if necessary.

The offline problem is defined by the ability to gather all data during mapping and convert this data into a map once the robot’s operation is complete. GraphSLAM accomplishes this by translating the data into a sparse graph of constraints. These constraints are then converted into an information form representation through linearization via Taylor expansion. The information form is subsequently reduced by applying exact transformations, which eliminate the map variables from the optimization problem. The resulting optimization problem is solved using a standard optimization technique, such as the conjugate gradient method. GraphSLAM recovers the map from the pose estimate through a series of decoupled small-scale optimization problems (one per feature). Repeated application of the linearization and optimization technique results in accurate maps.

The graph-based formulation of the SLAM problem has been proposed by F. Lu and E. Milios (1997). Good reading about Graph-SLAM includes for example the one from S. Thrun and M. Montemerlo (2006) and another by G. Grisetti, et. al (2010). Graph-based SLAM has several advantages over other methods like EKF-SLAM and FastSLAM. It can handle large-scale environments and long-term operations, as it can incrementally incorporate new measurements and poses into the graph. It can handle loop closures and multiple trajectories in a consistent and optimal way, by using graph optimization techniques. It can handle uncertainty in the measurements and poses, by using probabilistic models and nonlinear least squares. On the other hand, it can be computationally intensive and memory demanding, as it requires solving large and complex nonlinear least squares problems. It can be sensitive as it requires good initialization for the graph optimization to converge and in general, it requires some knowledge in graph topology and representation related issues.

Figure 4. Doing autonomous mapping in winter conditions.

Our way

Our positioning and mapping module fuses multiple sensor modalities to ensure pose estimate reliability in all situations. Typically, we use point cloud data, satellite positioning (when available), accelerations, angular velocities, and wheel revolutions (when available). Normally laser scanners are used to gather the point cloud data but any sensor that can produce point clouds can be used with our system. We have been using radars for marine and railway use cases and cameras, for example for a deep underwater SLAM project.

The positioning and mapping module consists of five main components: motion estimation, point cloud rectification, mapping, map validation, and map-based positioning. The motion estimation component estimates vehicle ego-motion based on all available sensor data, excluding the point cloud data. The ego-motion estimate is used to compensate for the distortion in point cloud data caused by the sensor movement during data acquisition.

We use the SLAM approach to create a map of the operating environment using the rectified point cloud data. Our mapping component can run in real-time but typically we create the map prior to the deployment of the mobile robots to ensure high quality maps. The resulting probabilistic, compact, and accurate environment representation is globally consistent, robust to outliers, and tolerant to dynamic objects and noisy sensory data. Furthermore, we can use other HD navigation maps and convert, if needed, their data to the format(s) we prefer. 

To validate the map quality, we have a semi-automatic map validation tool that automatically identifies potential problem areas that are visualised to the user on a graphical user interface along with the created maps. The user can then validate the map quality after visual inspection or edit the problematic map areas.

Figure 5. PAIKKA map creation process: the maps are created as a preprocessing step using an online map service. The picture represents our most widely used set of sensors.
Figure 6. PAIKKA positioning software architecture: runtime components.

After validating the map quality, we are ready to reliably estimate the mobile robot pose with our map-based positioning component that uses the rectified point cloud data and the maps created prior deployment. The probabilistic and efficient map-based positioning approach is based on a particle filter and provides smooth, accurate, and global 6D pose estimates in real-time. The probabilistic approach is inherently robust to outliers and enables operations also in adverse weather conditions.

Our probabilistic positioning approach enables all-weather operations. Even though it is based on probabilities, it is accurate down to just a few centimetres. Moreover, this accuracy is repeatable, also in adverse weather conditions. With our technology and expertise, any mobile machine can be turned into a mobile robot. With our positioning solution, these robots can operate anywhere.

References

T. Bailey and H. Durrant-Whyte. “Simultaneous localization and mapping (SLAM): part II,” in IEEE Robotics & Automation Magazine, vol. 13, no. 3, pp. 108-117, Sept. 2006.

C. César, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. D. Reid and J. J. Leonard. “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age.” IEEE Transactions on Robotics 32 (2016): 1309-1332. 

H. Durrant-Whyte and T. Bailey. “Simultaneous localization and mapping: part I,” in IEEE Robotics & Automation Magazine, vol. 13, no. 2, pp. 99-110, June 2006.

G. Grisetti, R. Kümmerle, C. Stachniss and W. Burgard. “A Tutorial on Graph-Based SLAM,” in IEEE Intelligent Transportation Systems Magazine, vol. 2, no. 4, pp. 31-43, winter 2010.

D. Hähnel, D. Fox, W. Burgard, and S. Thrun. “A highly efficient FastSLAM algorithm for generating cyclic maps of large-scale environments from raw laser range measurements.” In Proceedings of the Conference on Intelligent Robots and Systems (IROS), 2003.

J. Leonard and H. F. Durrant-Whyte. “Mobile robot localization by tracking geometric beacons,” in IEEE Transactions on Robotics and Automation, vol. 7, no. 3, pp. 376-382, June 1991.

F. Lu and E. Milios. “Globally consistent range scan alignment for environment mapping”. Autonomous Robots, 4:333–349, 1997.

M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. ”FastSLAM: a factored solution to the simultaneous localization and mapping problem”. AAAI/IAAI, 2002.

M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. “FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that probably converges.”In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 2003.

P. Moutarlier, R. Chatila. “An experimental system for incremental environment modelling by an autonomous mobile robot”. In Experimental Robotics I: The First International Symposium Montreal, June 19–21, 1989, 2006 Jan 19 (pp. 327-346). Berlin, Heidelberg: Springer Berlin Heidelberg.

RC. Smith, P. Cheeseman.  “On the Representation and Estimation of Spatial Uncertainty”. The International Journal of Robotics Research. 1986;5(4):56-68.

R. Smith, M. Self and P. Cheeseman. “Estimating uncertain spatial relationships in robotics,” Proceedings. 1987 IEEE International Conference on Robotics and Automation, Raleigh, NC, USA, 1987, pp. 850-850.  

R. Smith, M. Self, and P. Cheeseman. “Estimating uncertain spatial relationships in robotics”. In I.J. Cox and G.T. Wilfong (eds.), Autonomous Robot Vehicles, pp. 167- 193. Springer-Verlag. 1990.

J. Solà. “Simultaneous localization and mapping with the extended Kalman filter.” Barcelona: UPC. (2004).

S. Särkkä. “Bayesian Filtering and Smoothing. Cambridge”: Cambridge University Press; 2013. (Note: Newer version available.)

S. Thrun, M. Montemerlo, D. Koller, B. Wegbreit, J. I. Nieto and E. M. Nebot. “FastSLAM: An efficient solution to the simultaneous localization and mapping problem with unknown data association.” Journal of Machine Learning Research 4.3 (2004): 380-407.

S. Thrun, W. Burgard, and D. Fox. 2005. “Probabilistic Robotics” (Intelligent Robotics and Autonomous Agents). The MIT Press.

S. Thrun, M. Montemerlo. “The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures”. The International Journal of Robotics Research. 2006;25(5-6):403-42.