Aerial Arithmetic: Sensor fusion and orientation estimation

In this post I'm going to share the results of fusing an accelerometer, gyroscope and magnetometer together in order to obtain a rotating body's attitude. It contains implementations of two non-linear Kalman Filters: the Extended Kalman Filter and the Unscented Kalman Filter. All code can be found at the project repository here under the navigation directory.

The problem with gyroscopes

With inertial sensors, angular orientation (roll, pitch and yaw) of a rotating body can be determined. However, these angles cannot be directly measured. Even with a gyroscope, which measures angular rate , direct integration of it will result in drift as time goes on. For example, the below image should show gradual 30 degree increases in yaw, but instead there is continual drift:

Fig 1.
Drift exhibited from direct gyroscope integration

A MEMS gyroscope alone is too noisy to estimate angular orientation over long sampling times. Even if it somehow were able, it would only give estimate rotation relative to it's initial orientation.

One can overcome these issues by aiding the gyroscope with other sensors. Desirable sensors would need to be able to provide a stable reference to correct gyroscopic drift.
Two commonly used sensors for this purpose are accelerometers and magnetometers. Gravity provides a stable reference of "down" to correct pitch and roll drift. An accelerometer cannot correct yaw drift though, because gravity is a vertical field. A magnetometer complements the accelerometer, because magnetic north provides a locally stable reference to which yaw drift can be corrected. Together, they allow all drift caused by the gyroscope to be corrected.

These sensors carry their own drawbacks though. The accelerometer measures centrifugal and linear acceleration which can obscure a gravity measurement. Magnetometers measure fields that change over time and distance. They're also subject to soft iron (magnetic) and hard iron (ferrous) distortions. Meaning constant re-calibration may be required. Each of these problems needs to be addressed in a Attitude and Heading Reference System (AHRS).

The subject of this post aims to do just that. In it I will introduce and showcase 4 AHRS algorithms. Two of which are ones I designed, the others are selected from publications. I won't go into much detail about the methods themselves. There's simply too much information to cover. Instead I'll just present them as tools and discuss how I've used them, not how I made them.

Global features

Each AHRS possesses a state which contains the object's orientation. There are a few ways to describe angular orientations. Euler angles, Rotation Matrices and Quaternions. Like all things in life, there are pros and cons.

Euler angles:

(+) Easy to visualize

(+) Closed under addition and subtraction making operations very efficient

(+) Describe orientations using 3 parameters

(-) Subject to gimbal lock which loses a degree of freedom at certain orientations.

Rotation Matrix:

(+) Immune to gimbal lock

(-) Computationally expensive as they require 9 parameters to describe orientation.

(-) Difficult to visualize

(Unit) Quaternions:

(+) Describes rotations using 4 parameters

(+) Immune to gimbal lock

(-) Not closed under addition/subtraction

(-) Again harder to visualize.

In order to avoid gimbal lock and allow for better efficiency (although this will be contested with the Unscented Kalman Filter) I chose quaternions. I use the following convention for quaternions:

$q = \begin{bmatrix} q_w & q_x & q_y & q_z \end{bmatrix}^T$

Where $q$ is a quaternion with real part $q_w$ and vector part $q_x,q_y,q_z$.

The final thing the systems all have in common is how they use the gyroscopes to predict orientation. They all use the common continuous time model:

$\dot{q_k} = \frac{1}{2}\omega_k \otimes q_k$ (1.0)

Where $\otimes$ represents quaternion multiplication using the cross product.
$\omega_k = \begin{bmatrix} 0 & \omega_x & \omega_y & \omega_z \end{bmatrix}^T $ is a gyroscope measurement (in rad/s) at time $k$, augmented for quaternion multiplication. $\dot{q_k}$ is the quaternion rate of change vector.

The graphs shown in this post are from two different manual motions that I preformed on the device. The first was preforming two very fast 180 flips about the device's x axis. The second was a series of slow 30 degree rotations about the device's z axis.

Kalman Filtering

Kalman Filters (KF) are an industry standard for combining sensors in the way described in the opening section. They are a rich and complex subject that are far beyond the scope of this post. I would highly recommend anyone interested in getting started with KF to consider this project to learn more.

A very brief run down is that they use a prediction and update phase to fuse data together. To aid in fusing, user adjusted "belief" values weight how the data is factored into the final output. The class of filters I used were nonlinear KF due to (1.0) being nonlinear. The two most prolific nonlinear Kalman Filters are Extended Kalman Filters (EKF) and Unscented Kalman filters (UKF). The EKF addresses nonlinearity by linearizing the system around the current estimate. After linearization the classic linear Kalman Filter is used. Linearization requires taking partial derivatives of the prediction and measurement models.

In practice the EKF could be a serious problem if the system is highly nonlinear both in accuracy and implementation. Which gives credence to the UKF. Rather than linearizing the system, the UKF deterministically chooses a set of sigma points to pass through its nonlinear functions. Weighted mean, process and measurement covariances are computed from the mapped points. These new estimates are used as state/measurement predictions and covariance matricies in a traditional KF scheme. Sigma point sampling (in theory) captures the nonlinearity better than the EKF can. It also allows for ease of implementation because the state transition and measurement models can be treated as black boxes from which points are sampled, no derivative necessary. I'll begin with the EKF.

Gradient Descent Extended Kalman Filter

This KF is inspired by "An Extended Kalman Filter for Quaternion-Based Orientation Estimation Using MARG Sensors." In it, authors propose using a gradient descent approach to convert the aiding sensors, accelerometer and magnetometer, into quaternions. Gradient descent finds the locally minimal quaternion needed to align accelerometer and magnetometer measurements to Earth's gravitational and magnetic field. This allows direct comparison between the states and observations, simplifying the EKF update phase considerably. Direct comparison comes at the cost of computational efficiency due to the minimization sub problem created by gradient descent. Worse yet, the minimization takes multiple iterations to converge.
To keep with a design philosophy of speed and efficiency I choose not to model the gyroscope bias in the state estimates. This reduces matrix operations to occur on 4x4 matrices rather than a 7x7. I also took full advantage of the simplified update phase to implement a method which does not require a matrix inverse.

To mitigate accelerometer errors I also implement an adaptive response to movement. The measurement covariance matrix is scaled according to the accelerometer magnitude. Likewise the process covariance is scaled via the gyroscope magnitude. I found that scaling a factor exponentially had better results as quicker movements needed to be dampened faster.

Fig 2.
EKF run on data which the body is rapidly flipped int a 180 degree roll by hand

Adaptive response also had an advantage of creating a very smooth signal when the device experienced low movement, but still allows for fast tracking:

Fig 3.
Comparison between Adaptive EKF and low gain EKF on data for Figure 2

Fig 4.
Adaptive EKF and low gain EKF comparison under slow manual movement

Fig. 5
Comparison between Adaptive EKF and high gain EKF

Fig. 6
Adaptive EKF and high gain EKF comparison under slow manual movement

Figures 3-6 illustrate the attractive features of the adaptive Kalman Filter. It can track as rapidly as a high gain filter, but still retain the low noise frequency of a low gain filter. Finally the adaptive filter handled large accelerometer impulses better:

Fig 6.
y-axis of Figure 2 with adaptive and high gain EKF

The red line in Figure 6 shows a large erroneous spike past the 1500 mark. It's caused by a spike registered in the accelerometer due to large centrifugal motion felt when the device was rotated. The assumption that the accelerometer only measures gravity fails when the accelerometer registers large linear or centrifugal forces. What follows is a large difference between the predicted and measured gravity vector causing the Kalman gain to over compensate and feed a large correction back into the attitude estimation.
This is actually a considerable problem if the application is to under go rapid linear and centrifugal motions. To simulate large linear accelerations I hard coded the x and y axis accelerometer data to be 2g from 500 to 550 on the data from figure 6:

Fig 7.
EKF comparisons given an artificial high-g accelerometer signal

The adaptive scheme handles the sudden accelerometer jump as good as the low gain KF. Over time however, the drift will over take the dampening from the Kalman gain and cause large errors. One could potentially further combat this issue by skipping the update phase given high accelerometer magnitude. However, without the gyroscope bias being estimated, this could quickly cause drift.

Magnetometer disturbances will also cascade through the roll and pitch estimates. The magnetometer should have no effect on them since they are only needed to correct yaw errors.

Quaternion Based Square Root Unscented Kalman Filter

My main source for this KF was "Unscented Filtering in a Unit Quaternion Space for Spacecraft Attitude Estimation." The most noticeable thing about this UKF implementation is how it handles the quaternion states. Normally a set of sigma points are chosen to pass through the non-linear function. The results are then summed in a weighted mean and covariance estimation. However, this is assuming such operations are closed in the space the data exists. Unfortunately, unit quaternions are not in such a space. For example:

$\begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}^T + \begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}^T = \begin{bmatrix} 2 & 0 & 0 & 0 \end{bmatrix}^T$

Despite both summands being unit quaternions, their sum is not. So normal averaging for quaternions is not consistent with the space in which they exist. This affects both averaging and sigma point selection. Re-normalization has to occur immediately after the mean is found in the prediction phase. Even then, re-normalization only estimates the mean quaternion (see "On Averaging Rotations").

Because the sigma point selection is I choose is expensive, a measurement model that converted the state (quaternion) space into an easier to work with vector space was needed. Madgwick gave a way to convert quaternions into measurement estimates in his AHRS. The ability to treat the observation model as a black box function allowed this to be done easily.

I was also interested in the Square Root Formulation of the UKF, so I choose that method over the conventional one. The SR-UKF boast more numeric stability of matrices that require positive semi-definiteness. Its performance is comparable to the EKF.

Fig 8.
UKF & EKF comparison on Fig. 2 data set

Fig 9.
UKF & EKF comparison on y-axis of Figure 2

The UKF does not use any adaptive methods so it's more noisy when data is static. However it's also very vulnerable to errors caused by the accelerometer:

Fig 10.
UKF & EKF comparison with large artificial noise step

Figure 10 shows the step and the spike after the 1500 mark are not handled well by the UKF in its current form. Something would have to be done to react to large centrifugal and linear accelerations. An adaptive scheme like the EKF or skipping update phase entirely in the presence of large acceleromter magnitudes could address this. Its main weakness is how complex the filter is. It's large and requires a number of peripheral libraries to preform exotic matrix operations like qr factorization or Cholesky decomposition. This AHRS is the least suited for embedded systems.

PID methods

In this section non KF based methods are discussed. This includes two implementations. The popular AHRS given by Sebastian Madgwick in and a similar technique given by "Keeping a Good Attitude: A Quaternion-Based Orientation Filter for IMUs and MARGs" (referred to as the complementary filter). The biggest advantage of these two AHRS methods is their simplicity both in design and theory. They're inexpensive and give results comparable the above KF:

Fig. 11

Madgwick AHRS and complementary filter on data set in Fig. 2

Fig. 12

Madgwick AHRS and complementary filter on y-axis in Fig. 2

Madgwick's filter works by integrating the gyroscope using (1.0). When observation measurements are ready it parametrizes an error using a predicted gravity and magnetometer reading. However, as implemented there is a static gain factor in it. Which means that given large accelerations the sensor will cause errors. Using this AHRS still requires a trade off between tracking large motions and outputting sudden error spikes. The following is the results of Madgwick's AHRS with data used in figure 7:

Fig 13.

Madgwick's AHRS with artificial accelerometer signal

It reacts as negatively to the artificial accelerations as the UKF did. The spikes around the 1200 and 1600 marks are larger as well. Lowering gain reduces the errors by high g-force movement, however it makes the filter sluggish while tracking high movement:

Fig 14.
Low & High gain Madgwick AHRS comparison

Fig. 15

Low & High gain Madgwick AHRS comparison y-axis

The final AHRS is very similar to Madgwick's AHRS. It uses a different method for predicting observations coupling the accelerometer and magnetometer predictions differently. The advantage is that magnetic distortions are guaranteed to affect only yaw estimates.
Most importantly is that the authors recognize the problem in using the accelerometer as a reference. They give an adaptive scheme that changes the accelerometer correction gain based on its magnitude. Of all the AHRS method demonstrated here, this one was the most resilient to large accelerometer disturbances:

Fig. 16

Complementary & low gain Madgwick AHRS comparison with artificial step

Fig. 17

high gain Madgwick AHRS & Complementary filter comparison with Fig. 2 data

Fig. 16 and Fig. 17 demonstrate that the complementary filter can be highly tolerant to erroneous accelerometer but still track large movements.

Concluding remarks

Four methods were considered for the subject on AHRS design. Two non-linear Kalman Filters and two specialized PIDs. I've found little to distinguish my UKF and EKF. With the adaptive settings though, the EKF is much more resilient to high accelerometers movements than the UKF:

Fig. 18

All filters overlaid with artificial accelerometer signal

Fig. 19

All filters overlaid on fig 2. data set

Fig. 20
Comparison between all AHRS methods and non-fused data on Fig. 1 data set

General performance of all four sensors are very close to one another. At the time of writing this I cannot entirely justify using KF designs over the demonstrated PID ones. That doesn’t discount any AHRS outright though. This post showcased the results of small tests on each ARHS. I also had no way to knowing how close any AHRS were to the true orientation since data was generated by hand movements. More would have to be done to definitively declare one method superior for general use. Ultimately there is not an automatic best AHRS for a given application. The designer has to consider their performance needs and expectations.

For my application maintaining stability under large accelerations is a big concern of mine. I also want it to run well on an embedded system with limited computational power. So as of now I'm going to use the complementary filter to aid in UAV state estimation and control.

I hope, at least, that the two KF filters I presented can help those wanting to learn more about KF for sensor fusion and to give some comparison and accessibility to nonlinear Kalman Filters.

Thanks for reading,

- Joseph

Aerial Arithmetic

Wednesday, July 20, 2016

Sensor fusion and orientation estimation