The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore Visual Homing for Micro Aerial Vehicle - students.asl.ethz.ch

View in Fullscreen

Abstract In this thesis we developed a basic framework for homing a Mobile Aerial Vehicle (MAV). The visual navigation relies on a monocular image taken by a down-looking

Like this book? You can publish your book online for free in a few minutes!

http://anyflip.com/csgy/slph/

Download PDF

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by , 2017-02-03 04:15:04

Visual Homing for Micro Aerial Vehicle - students.asl.ethz.ch

Pages:

1 - 46

Abstract In this thesis we developed a basic framework for homing a Mobile Aerial Vehicle (MAV). The visual navigation relies on a monocular image taken by a down-looking

Autonomous Systems Lab
Prof. Roland Siegwart

Master-Thesis

Visual Homing for Micro
Aerial Vehicle

Autumn Term 2011

Supervised by: Author:
Ming Liu Simon Steinmann
Stephan Weiss

Contents

Abstract iv

Symbols v

1 Introduction 1

2 Related work 3

3 Theory 5
3.1 Camera and Image undistortion . . . . . . . . . . . . . . . . . . . . . 5
3.2 Fast Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Calonder Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Brisk descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Homing Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6.1 Yaw detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6.2 Height change . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.6.3 Planar translation . . . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Roll-Pitch compensation . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.8 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.9 Quadrotor control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Results 21

4.1 Loop Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Comparasion with PTAM . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Recorded Vicon data . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Flight test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Preliminary results with Brisk descriptor . . . . . . . . . . . . . . . . 29

5 Conclusion 31

A Application Design 33

A.1 Framework (GUI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Bibliography 37

i

ii

List of Figures

3.1 Original and undistorted image of a checkerboard . . . . . . . . . . . 6
3.2 FAST corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Detected FAST corners with CVD (upper) and OpenCV (lower) . . 7
3.4 Evaluation of calonder descriptor . . . . . . . . . . . . . . . . . . . . 8
3.5 Scale information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Comparasion of FAST/Calonder with AGAST/Brisk . . . . . . . . . 11
3.7 Generalized alignment of the reference (blue) and current image (red) 12
3.8 Rotation detection based on two reference matches . . . . . . . . . . 13
3.9 Current image and derotated reference image . . . . . . . . . . . . . 13
3.10 Remaining matching vectors for the aligned reference . . . . . . . . . 14
3.11 Image mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.12 Focal length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.13 Tilting correction distance . . . . . . . . . . . . . . . . . . . . . . . . 17
3.14 Roll-pitch correction vector . . . . . . . . . . . . . . . . . . . . . . . 17
3.15 Roll-pitch compensation verification . . . . . . . . . . . . . . . . . . 18
3.16 Helicopter Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Loop times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Loop times analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Homing vector for a circular path (x-y) . . . . . . . . . . . . . . . . 23
4.4 Homing vector for a circular path (x-z) . . . . . . . . . . . . . . . . . 24
4.5 Error distributions with PTAM as reference . . . . . . . . . . . . . . 24
4.6 Position estimate error of the homing algorithm . . . . . . . . . . . . 25
4.7 Position estimate error (z, ψ) of the homing algorithm . . . . . . . . 26
4.8 Homing vector errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.9 Homing vector errors for vectors with length larger than 0.15m . . . 27
4.10 Homing controlled flight . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.11 Brisk Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.12 Homing vector computed with Brisk . . . . . . . . . . . . . . . . . . 30

A.1 Dynamic Reconfigure Parameters . . . . . . . . . . . . . . . . . . . . 33
A.2 Application plot: Matches . . . . . . . . . . . . . . . . . . . . . . . . 35
A.3 Application plot: Homing . . . . . . . . . . . . . . . . . . . . . . . . 35

iii

Abstract

In this thesis we developed a basic framework for homing a Mobile Aerial Vehicle
(MAV). The visual navigation relies on a monocular image taken by a down-looking
fisheye camera mounted at the MAV. Based on the image features and their scales
the algorithm is able to estimate a velocity vector pointing to a previously taken
keyframe (e.g. home position). By integrating the velocity vector to the previous
frame for each image (i.e. visual odometry) the method is able to provide a position
estimate as well. The helicopter control requires those two inputs. We implemented
the homing method with two different feature descriptors. The first implementation
relies on a FAST feature detector with a Calonder Feature descriptor which has been
enhanced by adding scale informations for the detected features. In the second
implementation the feature descriptor was replaced by the Brisk descriptor which
improves the performance due to a rotational invariant matching and native scale
informations. We tested our algorithm first with a hand-held camera and compared
the results with existing approaches such as the parallel tracking and mapping
(PTAM). The algorithm was finally tested and verified with vicon position data.

iv

Symbols

Symbols roll, pitch and yaw angle

φ, θ, ψ scaling factor (factor between the images)
scale (scale ID of the image)
SF Score of the matching (distance in descriptor space)
S Focal length [px]
Q
f x,y,z axis corresponding to a north-east-down System

x, y, z u axis (Image coordinates, corresponds to x)
v axis (Image coordinates, corresponds to y)
u
v mean value
standard deviation
µ
σ

Indices

c current Image
r reference Image

Acronyms and Abbreviations

ETH Eidgenoessische Technische Hochschule
MAV Mobile Areal Vehicle
UAV Unmanned Areal Vehicle
IMU Inertial Measurement Unit
CVD Computer Vision Library of the Cambridge University
OpenCV Open Source Computer Vision Library
FAST FAST Feature Descriptor
CFD Calonder Feature Descriptor
Brisk Brisk Feautre Descriptor
ROS Robot Operating System

v

vi

Chapter 1

Introduction

The ability to return to a previously visited position is the most fundamental part
in robot navigation, which is usually named homing problem. The homing can
be done even without prior knowledge of the environment and therefore without
knowing the accurate position. A topological map of the environment is built by
taking several keyframes which represent the path the robot passed. The starting
position at the begin of this part is the final goal of the homing. In a real application
this first position may be a charging station where the helicopter should return after
each flight to recharge for another flight. Therefore it is important to have a reliable
homing method which allows the robot to navigate back to the previous position.
The helicopter control in this project is already available and not further considered
here.
The homing based on a path stored in a topological map makes this algorithm
suited for path following applications too. In observation tasks for example the
MAV usually follows the same path again and again. Another possible application
of the same algorithm is pose stabilization where the helicopter should hover at
constant position. The hovering is provided because the current image is compared
with a keyframe in order to calculate the homing vector. By keeping the keyframe
constant the helicopter will approach the reference position.
The robot does not need to know its absolute position for the navigation. It is
sufficient to estimate a velocity direction between the two images which indicates
the correct translation and rotation. Based on the images it is not possible to
determine the absolute scale factor. The amplitude of the velocity vector can be
improved during flight but the homing is not depending on having an accurate scale.
In the tests a hexocopter was used but the developed homing method does not rely
on it. It is only assumed to have a MAV with 4 degrees of freedom (3 translational
and the rotation about the z-axis). As far as sensor is considered, a monocular
camera with a fisheye lens is mounted on the MAV. A camera is a lightweight
sensor which makes it well suited especially for flying robots with limited payload.
A monocular camera is not able to detect the distance to the features which is
a shortcoming especially for the cases where the absolute height of the camera is
required. This can be avoided by estimating the height change between the two
images based on the scale change of the feature points instead of estimating an
absolute height. The fisheye lens enlarges the field of view to 100◦ or more which
allows to observe a larger part of the environment.
The first chapter of this report is a short wrap-up of some related work. Then the
theory about the algorithm and some tools behind were described in detail. Finally
some results of several tests made with a hand-held camera and of the first flying
test are presented. In the Appendix we describe the structure of our application.

1

Chapter 1. Introduction 2

Chapter 2

Related work

Homing is a basic navigation task for robots. Franz et al. [2] describe that a suc-
cessful navigation does not necessarily require knowledge about the current position
or the environment but rather on the direction which leads to the goal.
There are several applications for this task using laser range sensors and other
extrinsic or intrinsic sensors. An only vision based homing method was developed
by Liu et al.[1] for mobile robots. Their approach uses an omni-directional camera
mounted on the robot. For the homing the current image is compared with the
reference taken at the home position. The detected features in the unwrapped
image were classified by a simple 1D-RANSAC. The feature positions are used
to estimate the homing direction. While the feature scales provided by the Scale
invariant feature transform (SIFT) detector are used to infer the distance to the
home position. By averaging over all matched features a good estimation of the
velocity vector pointing to the home position is calculated. The method was tested
in an indoor environment and compared with similar approaches ([3],[4]) based on
a dataset of scanned images. The total average angular error (TAAE) was used as
comparison criteria. The TAAE reflects the average angular error for each image
compared with every other image out of the dataset. The low TAAE (of 11.6◦)
allows a more reliable homing. The additional distance estimate offers enhanced
possibilities for the robot controlling compared to previous methods. The planar
method runs at low Hertz rates which is enough for a mobile robot. A flying robot
would need faster updates to guarantee a stable flight.
A vision based navigation for flying robots could be done with the PTAM (parallel
tracking and mapping) framework [14] of Klein et al. The algorithm already showed
its ability on a helicopter setup [15]. This approach computes a full 3D map of
the environment based on the images. The accuracy of this setup is about a few
centimeters. A drawback of the PTAM is its memory requirement since it has to
store a big map of the whole environment. The map makes this algorithm even
suited for other navigation tasks than homing. Additionally it can provide a more
accurate pose description. The map includes informations about obstacles such as
walls too. To reduce the memory requirements it is possible to keep the map small
by getting rid of certain keyframes. In this case the map looses informations about
its starting position which conflicts with the ability of returning to the starting
position (i.e. homing).
A standard way to navigate based on camera images are visual servoing approaches
based on [17]. Those approaches need much more computational power due to a
matrix inversion and other demanding processes. They can provide good results if a
powerful CPU is available and the demanded frame rate is low. For high numbers of
features those approaches are no more able to run at real time. Especially for flying
robots the required frame rates are high which makes visual servoing approaches

3

Chapter 2. Related work 4

unsuitable.
There are methods for local navigation based on optical flow too [16]. Based on the
optical flow the distances to the features are estimated. The proposed application
showed that the helicopter is able to avoid collisions to a wall in a corridor environ-
ment. The flying robot was actually able to keep the distance to both walls equal
which leads to a the best possible collision avoidance in such environments. This
approach is not designed for navigation but for collision avoidance.
All image based approaches face the same problems such as illumination or lack of
features. None of the above mentioned approaches is able to navigate without the
presence of an adequate number of features. Therefore in most test an environment
with an high number of detectable features is chosen. Another problem is the vary-
ing light conditions which makes it hard to match the features. A good matching
of the features is the main issue. For approaches like the optical flow approach
varying light conditions are less embarrassing since the features are only compared
over a short time period in which it can be assumed to have similar illumination.
When building a map of the whole environment this may not be the case. To test
the algorithms the environment is mostly chosen to minimize the influence of the
drawbacks of cameras. This means an indoor environment test with good light
conditions and a high number of detectable features is the favored place for testing.
An alternatives to camera based navigation is using the GPS but this is only avail-
able for outdoor applications. Indoor localization can be provided by a laser range
sensor which implies a much higher payload to the helicopter.
Most existing visual homing approaches are only designed for planar navigation
(e.g. mobile robots). A simple monocular visual homing which provides an altitude
velocity estimation as developed in this thesis makes the visual homing applicable
to flying robots as well.

Chapter 3

Theory

3.1 Camera and Image undistortion

The visual homing is we developed is using on a monocular camera as sensor. An
important criteria for the camera is a large field of view (FOV). It provides more
overlap between consecutive images and this results in a more reliable matching of
the two images. Therefore we choose a camera with a fisheye lens for this project.
The tests are mainly done with a 150 degree fisheye lens. The camera is a PointGrey
Firefly WVGA[19] with a resolution of 752x480 pixels.
A fisheye lens does obviously not have the same mapping function as a normal cam-
era. The mapping function for normal (i.e. gnomonical) cameras can be described
as follows:

r = f · tan(θ) (3.1)

Where r represents the distance in pixels from the image center, f is the focal
length and θ the angle between the line to the object and the optical axis through
the center of the image and the lens. This equation is only valid for a FOV smaller
than 180◦.

For fisheye cameras the mapping function is slightly different:

r = 2f · sin θ (3.2)
2

With the latter equation each pixel corresponds to an equal area on the unit sphere.
Compared to an gnomonical image the fisheye lens distorts the image. Straight lines
appear as curved lines in the image and so on.
For the undistortion of such an image exist several solution. One way is to describe
the image coordinates as three dimensional vector to the unit sphere instead of
planar coordinates. This description corresponds to the way the objects are mapped
to the image. An advantage of this approach is its unlimited FOV which makes it
applicable to omni-directional cameras as well as 180◦ fisheye lenses.
Another solution is to dewarp the distorted image. This is possible if the field of
view is below than 180 degrees. The dewarped image corresponds to an image
taken by the standard image mapping equation (Eq. 3.1). Remapping each pixel
generates an image like it was taken with a gnomonical camera. Straight lines are
mapped on straight lines in the image and so on.
Both proposed solutions require a camera calibration. This calibration is done
with a Matlab Toolbox provided by D. Scaramuzza ([11],[12],[13]). This calibration
needs no prior knowledge about the intrinsic parameters of the camera. Taking

5

Chapter 3. Theory 6

several images of a checkerboard at varying angles allows to calculate the calibration
parameters.
For the feature detection and matching an undistorted 2D images has some advan-
tages compared to the three dimensional vector representation. A corner in the
image center may not detected as corner somewhere close to the border of the im-
age because of the distortion. If the image is undistorted first the corner should
appear similar in both images. Even if both corners are detected correctly they
may not match together because of different signatures generated by the descriptor.
Therefore we use the second approach (i.e. undistortion of the image) in this thesis.
The undistortion of the image is done with the OpenCV [18] remap function. The
required maps are generated based on the calibration parameters of the camera. An
example of an undistorted image is shown in Figure 3.1.

Figure 3.1: Original and undistorted image of a checkerboard

In some tests we used another lens with a smaller field of view of 100 degrees. For
the 100◦ fisheye the image distortion is smaller. The algorithm showed that it can
work without the undistortion. In that case the undistortion is not mandatory.
Even with the 150◦ lens the whole algorithm works without undistortion but the
matching quality (i.e. number of inliers) is lower and therefore the homing becomes
less reliable.

3.2 Fast Feature Detection

A basic framework for the FAST detector which we used was already developed in
a semester thesis [20]. It provides a FAST detector and a Calonder descriptor with
a matching method.
The FAST algorithm by Rosten et al.([5],[6]) is a fast method to detect image
corners. It is fast enough to handle real time applications also on a UAV. Other
Feature detectors such as SIFT [7] may provide additional informations about the
features but they require more hardware resources and were not able to run in real
time applications. The FAST uses a previously generated mask (Fig. 3.2) to detect
the features. The size of this pattern can vary but is constant during execution. The
source code of the detector is generated automatically based on a machine learning
algorithm. For most applications it is not necessary to generate a specific source
code and a provided source can be used.
The existing framework used a CVD FAST implementation to extract the features.
This implementation of FAST showed some strange behavior while scanning down-
scaled images. For the original (i.e. unscaled) image the feature positions are
reasonable like they are for half sampling the image. For other scale factors than
2.0 (e.g. 1.5) the feature positions do not match with the expected ones (Fig. 3.3
upper images). There was no systematics recognizable. Even at certain scales the
feature positions seemed to be reasonable again. We detected the source of this

7 3.3. Calonder Descriptor

Figure 3.2: FAST corners
behavior in the CVD function. An alternative was found in the openCV library.
This implementation does not show such unreasonable feature positions as shown
in the lower images of Figure 3.3.

Figure 3.3: Detected FAST corners with CVD (upper) and OpenCV (lower)
The disadvantage of FAST is the lack of scale information about the detected Fea-
tures which is essential to estimate a height change. Therefore the existing frame-
work had to be adapted to the new task as described below (Sec. 3.4).

3.3 Calonder Descriptor

A fast and accurate feature detection is the base of the homing. The next step is an
accurate matching of the detected features. Therefore each corner is first described
by a feature descriptor. We use the Calonder Feature Descriptor (CFD) by Calonder
et al. ([9], [10]) for this task. The CFD is chosen due to its short execution time.
The CFD has to be trained first on several images similar to the ones observed
later on. A minimum of 176 features is required in the training image(s) because of
the dimension of the descriptor. In the training process a space with dimension is
generated. This space is optimized to separate the features of the training images

Chapter 3. Theory 8

as much as possible. For each feature the descriptor analyzes a patch of 32 square
pixels. The CFD calculates for each patch a vector within the trained space. For
the matching of two descriptors the L1-distance between them is checked and has to
be below a certain threshold in case of a positive match. Instead of the L1-distance
we considered the angle between the two vectors as matching score. But the latter
distance showed similar or worse matching rates in our tests and was therefore not
used further.

To improve the matching results the training of the descriptor is important. The
CFD is not rotation invariant which is a major disadvantage. A MAV is able to
rotate about the camera axis and therefore the matching has to be able to reliable
match rotated features. The training program which was already available used
a single image which was simply rotated about an random angle. This does not
really represent rotated features since it just uses an single angle. To improve
the rotation invariance of the descriptor we created a new trainer which is able
to handle several images and calculates the corners and the corresponding patches
at different rotations. This additions lead to a much larger number of features
which are used to train the descriptor. We compared the inlier rates of the two
differently trained descriptors for the matching of a single image with a rotated
version of it. The training with several rotations and more features has a higher
inlier rate for large rotations (Fig. 3.4). The lower rate at small rotations is only a
small handicap because the number of inliers is sufficient to calculate the homing
vector. The bottleneck in the design of the RANSAC which is used to filter out
the outliers (Sec. 3.8) is the lowest expected inlier rate. Therefore the lower rate at
small rotations does not change anything.

Inlier % 80
trained with 1 image
trained with 7 images

70

60

50

40

30

20

10

0
0 10 20 30 40 50 60 70 80 90
Rotation [degree]

Figure 3.4: Evaluation of calonder descriptor

9 3.4. Scaling

3.4 Scaling

As mentioned above FAST does not provide any scale informations about the de-
tected features. But for the homing it is important to have information about the
size of the features. Therefore we downscale each image several times and detect
new features in the downscaled image. We choose 1.5 as the downscaling factor in
most applications. Half sampling (i.e. factor 2.0) could increase the speed of the
down sampling since the pixel values do not have to be calculated by interpolation.
A higher scaling factor also leads to smaller downscaled images which are checked
faster. But with a factor of 1.5 we can get a good trade-off between speed and
precise scale informations.
A feature detected in a downscaled image corresponds to a discrete feature size.
This discrete scale information is used to provide an estimation of the altitude
change of the camera. To improve the distinctiveness of the scale information the
detected features are projected to artificial features in the other scaled images.
Then we extract the descriptors for the artificial features and match them with the
descriptor of the reference feature (i.e. the feature of the reference image matched
with the original feature). Since the matching score is represent a distance a lower
score indicates a better match. An example of the scores for several descriptors are
plotted in Figure 3.5. In this particular case the best match (i.e. the original match)
is found at scale 1. We calculate the scores for the scales -1 and -2 by upscaling the
image. For this task we only upscale the pattern and not the whole image. This
upscaling gives us more scale information especially for the features found at scale
0 (i.e. the unscaled image). The additional descriptors and the resulting score let
us compute a non discrete scale information.
We do not use the scores of each scale level because the computing of the descriptor
is a time demanding step and it would slow down the algorithm if each scale level is
checked. Therefore only the two points of the adjacent images are checked. Using
more descriptors did also not lead to a higher accuracy due to the fact that there
is no function description which can describe the dependency of the descriptor to
scale changes. A fitted spline through all points does not really improve the result
compared to a parabola fitted through only the three points we use. The minimum
of the parabola gives us the more accurate floating point scale. The floating point
scale information is not assumed to be very precise but it improves the discrete
scale information.
The calculation of the floating scale difference (SDfloat) (i.e. the minimum of the
parabola) is based on the discrete scale difference (SDraw) along with the match-
ing scores of the adjacent scales. Qmin is the distance of the original match, i.e.
the features which are used to calculate the discrete scale difference. Qmin+1 and
Qmin−1 are the distances from the reference feature to the feature projected to the
lower respectively upper scale image.

SDraw = SF 2 − SF 1 (3.3)
(3.4)
S Df loat = S Draw + Qm−1in+1 − Qm−1in−1
Q−m1in+1 + Qm−1in + Q−m1in−1

The downscaling of the images is not only used for the scale information. It also
leads to more detected features which improves the stability of the homing. The
original FAST is not able to detect large corners. After downscaling the images
those corners are detected because they fit in the pattern mentioned in Section 3.2.
Therefore our extended FAST detector is able to detect a higher number of features
compared to the original FAST which does not scale the images. A higher number of
features usually leads to more matches which again makes the homing more reliable.

Chapter 3. Theory 10

6000 Floating point scale difference
5000
Distance
Fitted Parabola
floating point Scale

Matching Score (Distance) 4000

3000

2000

1000

0
−2 −1 0 1 2 3 4

Scale

Figure 3.5: Scale information

3.5 Brisk descriptor

At the end of the project we created an improved version of the homing. The
main improvement is to use the Brisk descriptor instead of the calonder descriptor.
Additionally we replace the FAST feature detector by an AGAST detector [8] which
is just a slightly improved version of the FAST detector. The AGAST is able to
adapt the pattern size at runtime which makes it a bit more applicable for changing
environments.
Brisk is a Feature descriptor created at the ASL by S. Leutenegger recently. Com-
pared to the calonder descriptor the Brisk descriptor has two main advantages.
First it is a rotational invariant descriptor and second it provides a scale informa-
tion about the features.
The rotation invariance is important an improvement especially for helicopter ap-
plications because the rotation about the camera axis (yaw) is a basic movement
of the MAV. The rotation detection of the homing relies on an accurate feature
matching which is only granted by a rotation invariant feature descriptor. The de-
scriptor reaches it rotation invariance by first computing a main direction in the
pattern around the feature. The descriptor is then calculated with respect to that
main direction.
The second improvement by the Brisk descriptor is its scale information. The
matched features already have a size information. This makes the floating scale
calculation redundant. Additionally the size information of the Brisk is even more
accurate than the floating point scale we calculated in the previous application. The
Brisk descriptor is more time consuming compared to the calonder descriptor for a
similar amount of detected features (Fig. 3.6). The execution time highly relates
with the number of features. A bigger downscaling factor in the first approach leads
to smaller downscaled images and this results in a lower number of total features
detected. Therefore the FAST/Calonder approach with a scaling factor (SF ) of

11 3.6. Homing Vector

2.0 is faster than the one with 1.5. Due to the additional informations got from
the descriptor the Brisk requires almost twice as much time as the calonder does.
But overall the advantages outweigh the disadvantages. The time comparison was
done for a similar amount of features. But the improved matching with the new
descriptor does need less features to come up with a similar amount of inliers. In
this case the execution time shrinks quadratically with the number of features. Even
the RANSAC which is the most time consuming part of the algorithm (Sec. 4.1)
can be executed much faster because the inlier ratio is much higher. The resulting
lower amount of RANSAC-iterations to filter out the outliers improves the speed
of the whole algorithm again. Overall the higher inlier ratio almost compensates
the additional time consumption with an additional benefit of a more accurate
matching.

Agast/Brisk vs Fast/Calonder

FAST/Calonder SF 1.5
(F:311, M:172)

FAST/Calonder SF 2.0
(F:227, M:122)

AGAST/BRISK 5 10 15 20 25 30 35 40 45
(F:285, M:162) Time [ms]

0

undistort downscale getFeatures
getDescriptors getMatches floatingScale

Figure 3.6: Comparasion of FAST/Calonder with AGAST/Brisk

3.6 Homing Vector

The goal of this project is to get a homing vector without any reconstruction of the
environment. The only map which has to be stored is a chain of certain keyframes
to allow the algorithm to handle large distances. Those keyframes are generated
in the forward path automatically if the distance to the previous stored keyframe
surpasses a threshold.
To be able to compare all features detected in any scaled image the used feature
coordinates are with respect to the original image. Pixel coordinates of the down-
scaled images are hence multiplied with the scaling factor to the power of the scale
number to get the position of the similar feature in the original image. This coor-
dinates are needed for the rotation estimation as well as for the planar translation.
The height change does not rely on the feature positions but rather on their scales
(the level at which they are detected).
A possible alignment of the reference and current image is shown in Figure 3.7.

3.6.1 Yaw detection

The first task in order to find the homing vector is to find the yaw rotation between
the two images. The rotation is estimated based on two matches. A single reference
match could be sufficient if there is no additional translation movement between
the two images. This would allow to consider the bearing from the image center to

Chapter 3. Theory 12

Figure 3.7: Generalized alignment of the reference (blue) and current image (red)

the feature position as angle. Unless in this ideal case the rotation can not assumed
to be around the image center. Therefore a second reference match is required.
We create a vector based on the two picked features of the current image a vector
from the first to the second feature. A similar vector is created within the reference
image (Eq. 3.5, 3.6) The rotation ψ between the two images is defined by the angle
between the two vectors (Fig. 3.8). The cosine of the angle is defined as the scalar
product between the two normalized vectors. This only gives us the absolute value
of the rotation but not the direction due to the fact that the cosine is symmetric
with respect to the y axis. Therefore the direction of the rotation is determined by
the sign of the z-component of the cross product between the vectors.

cV = cF1 − cF2 (3.5)
rV = rF1 − rF2 (3.6)

cosψ = cV rV (3.7)
|cV ||r V |
(3.8)
signψ = sign([cV × rV ]z) (3.9)

ψ = signψ · acos(cosψ)

For the following estimation of the translation of the feature points we use the
derotated image coordinates. Therefore each feature coordinate of the reference
image is rotated about the image center according to the detected yaw rotation.
After this rotation the two image orientations are aligned as in Figure 3.9.

3.6.2 Height change

The next part of the homing which is estimated is the altitude. An first estimate
of the height change is be provided by the scales of the features. The two reference

13 3.6. Homing Vector

Figure 3.8: Rotation detection based on two reference matches

Figure 3.9: Current image and derotated reference image
matches which are selected for the rotation estimation are used for this task. Ac-
cording to this estimate we scale the image coordinates with respect to the image
center.
Together with the rotational alignment this leads to an equal scale and orientation
of the current image and the reference image (Fig. 3.10).
For the height difference between two images we use the previously described (3.4)
size information of the features. A scale difference (SD) of 1.0 means the feature

Chapter 3. Theory 14

is scale factor (SF ) times bigger in the current image than in the reference image.
Generally the size of the feature in the current image is SF −SD times the size of
the corresponding feature in the reference image. The distance to the feature (i.e.
altitude of the camera) is directly related to the feature size. The inverse of the
factor used to describe the change in feature size is used to describe the relation of
the altitudes. A twice as big feature indicates the image was taken at half of the
height and vice versa.
We estimate the height change by computing the average scale difference over all
inliers. The floating point scales lead to a slightly better approximation for the
height change here. The equation for the vertical homing direction is written below
(Eq. 3.11). This direction estimate is similar to setting the initial height to 1 meter.
As long as there is no absolute altitude available this qualitative result is good
enough to allow a successful homing. If the previous (e.g. initial) height is known
the current height can be computed recursively. A properly scaled homing vector
which is metric based can improve the helicopter control.

1 cSk − rSk (3.10)
SD =
k (3.11)
N (3.12)

∆z ∼ SF −SD − 1

zi = SF −SD · zi−1

Figure 3.10: Remaining matching vectors for the aligned reference

3.6.3 Planar translation

After aligning the two images in rotation and scale. We detect the horizontal
movement of the camera. The image plane is assumed to be parallel to the ground
plane on which the features are. This assumption is possible because the camera
which is mounted on the helicopter is looking straight down and the helicopter
mostly operates in horizontal position. By undistorting the image first we assure

15 3.6. Homing Vector

a gnomonical image. The gnomonical image together with the parallelism between
image plane and ground plane results in a simple relationship between a translation
of the camera in the world system and a translation in image coordinates. This
allows to calculate the horizontal homing vector based on the feature coordinates in
pixels (u.v). We use the average of the vectors between the two matched features
which are shown in Figure 3.10. This is similar to optical flow approaches. The
resulting average vector is again a qualitative result for the homing vector. It
indicates the correct direction but not the absolute distance. We can improve the
homing accuracy if the velocity vector is scaled appropriately. To estimate the
distance which camera moved in meters we scale the homing vector. The scaling
factor is determined by the mapping equation of a pinhole camera (Eq. 3.13, 3.14,
Fig. 3.11). The pinhole camera model is a valid approximation because we use an
undistorted image. The model does not even have to fit perfect because it is only
used to improve the length of the homing vector. The direction is not affected by
the scaling and therefore the crucial part of the homing vector is not get affected
by an eventually imprecise scaling. As long as the homing vector points in the right
direction the MAV will be able to reach the home position.

u
f

z

x

Figure 3.11: Image mapping

x≈ z ·u (3.13)
f (3.14)

y≈ z ·v
f

To get the metric based homing vector the pixel based vector is simply scaled by z .
f

The height z is already known from the height estimate described in the previous

section. If the absolute altitude is not available it is approximated by setting the

initial height to 1 meter. In this case the scaled homing vector can not represent

metric values. Since all translational elements of the homing vector (x, y, z) are mis-

scaled by the same factor this does not affect the control which is still balanced.

The focal length f is determined experimentally. Therefore the ground truth (e.g.

vicon) data are compared with the observed pixel distances. Another possible ap-

proach is to measure the size in pixel of a known object at known distances. With
the latter approach we determined the focal length for the 150◦ lens to be about

300px for the undistorted image (Fig. 3.12). As long as the camera lens remains

unchanged this value is constant.

For the smaller lens (100◦) we approximated the focal length at about 720px.

Chapter 3. Theory 16

[m/px] x 10−3 Conversion pixels to meter 2.5
9
8 f(z) = z/300 2
7
6 0.5 1 1.5
5 Height z [m]
4
3 Figure 3.12: Focal length
2
1
0

0

3.7 Roll-Pitch compensation

The helicopter operates most of the time in (almost) horizontal position. This allows
us to assume that the image is taken from a vertical orientation compared to the
ground. But the Quadrocopter tilts when accelerating/decelerating the horizontal
speed and that is why the camera is not always looking straight down. This differ-
ence in the alignment causes an overshoot in the image based position estimation
which has to be filtered out.

Based on the assumption that the features are on a plane at a certain distance
(altitude) to the camera it is possible to implement a simple correction vector. The
correction vector shifts the feature positions to an artificial position similar to the
position of that feature with the camera looking straight downwards at the same
camera position. We calculate the correction vector based on the IMU data of the
UAV according to equations 3.15-3.19. First we generate a matrix representation of
the orientation based on the IMU quaternion. The z-axis of the helicopter which is
defined by the IMU data coincidents with optical axis. Based on the rotation matrix
it is possible to calculate a tilting angle (β) and a yaw angle (α). β is defined as the
arc-cosine of the scalar product between the z components of the rotation matrix
and the vertical axis in the world coordinate system. Since the rotation matrix is
an orthogonal matrix and the vertical axis is represented by the unit vector (ez)
β is defined by as the arc-cosine of the Rz,z component of the matrix. The tilting
angle allows us to define a correction distance r in the image plane (Fig. 3.13).
The distance r is measured in pixels and therefore directly applicable to the image
coordinates.

17 3.7. Roll-Pitch compensation

β = acos([Rez] · ez) = acos(R3,3) (3.15)
α = atan2([Rez] · ey, [Rez] · ex) = atan2(R2,3, R1,3) (3.16)
r = tan(β) · f = cos−2(β) − 1 · f (3.17)

ucorr = r · cos(α) (3.18)
vcorr = r · sin(α) (3.19)

Image r
Plane f

β

optical vertical
axis axis

x* x

Figure 3.13: Tilting correction distance

The correction distance r is splited into the u and v component by using the yaw
angle (α) computed based on the IMU rotation matrix (3.14). The u and v compo-
nents build a correction vector which is applied to each feature position in order to
shift it artificially.

vertical optical
axis axis

β

f
u

r αv

Figure 3.14: Roll-pitch correction vector

We tested the correction vector with a series of images taken with a flying heli-
copter. The position estimate based on integrating the homing vector is compared
with the vicon position data which is used as ground truth. As shown in Figure 3.15
the position estimate without the correction vector has a large overshoot compared
to the ground truth. This overshoot appears as expected when the helicopter ac-
celerates or decelerates its horizontal movement. By applying our correction vector
we are able to filter out the overshoot. The remaining error between the ground
truth and the position estimate with the roll-pitch correction is a drift caused by
the fact that the position estimate is based on integration of the velocity vectors.

Chapter 3. Theory 18

y [m] y Position
4

vicon
without correction
3.5 with correction

3

2.5

2

1.5

1

0.5

0
0 5 10 15 20 25 30 35
Time [s]

Figure 3.15: Roll-pitch compensation verification

3.8 RANSAC

The feature matching includes many false positive matches. Especially the match-
ing based on the calonder descriptor is not accurate enough. A match is set if the
matching score is below a threshold and only the best match for each feature is con-
sidered. A high threshold leads to many false positive matches. A lower threshold
may result in no matches which is even worse. The threshold is set high enough to
have enough matches for rotated images and large translations. Therefore a filter
has to be applied first to get rid of the outliers. We choose a simple RANSAC filter
to fulfill this task.
The RANSAC algorithm is well suited for this problem since it does not need a
model description nor a minimum number of inliers. The algorithm simply choses
the parameter set which leads to the highest number of inliers. In this particular
problem the number of inliers is not important. Two inliers would theoretically be
enough to determine a homing vector. The crucial part is to reliable filter out the
outliers.
The selection of the RANSAC consists of two matches which are taken as reference
matches. According to those matches the rotation and scaling between the two
compared images is calculated. The inliers are found by comparing the bearing and
length of the remaining vectors (as in Fig. 3.10). The reference bearing and length
is defined by the average of the two picked matches. By using the average of two
vectors the filter is less susceptible to noise. In case of both selected references are
inliers the reference image should be aligned and scaled appropriate. In this case all
vectors are pointing in the same direction with equal length. Errors in the alignment
of the two images, a tilted camera or variations in the feature detection lead to
deviations. To handle this small deviations the bearing error and the length error
are independently weighted. If the total error value is beyond a certain threshold
the match is considered as an inlier. The number of inliers is counted for each

19 3.9. Quadrotor control

RANSAC iteration. At the end the reference pick with the most inliers is used for
the further calculations. The homing vector is calculated based on all inliers.
The number of iterations (N) of the RANSAC is set according to experienced data.
It has to be high enough to ensure that two inliers are picked. Especially for rotated
images the inlier rate quite low (20%) due to the fact the calonder descriptor is not
a rotational invariant descriptor. We set the number of iterations to 100 which
should allow to find a correct set of inliers while not demanding to much time.

The structure of the RANSAC is described as follows:

• do N times:
select two random matches
calculate yaw rotation
rotate reference features to match with current
calculate scale difference
scale reference features to match with current
calculate bearing & length of the remaining vector for the selected matches
do for each match:
mark as inlier if bearing and distance errors are small

• choose the largest set of inlier for the remaining calculations

• averaging all remaining vectors for the x,y components of the homing vector

• averaging the scale to determine height

We introduced some modifications to improve the output and speed of the RANSAC.
The references pick is rejected if the length of any of the two vectors is too small. In
that case the angle calculation is susceptible to noise. The reference pick is rejected
too if the difference between the detected rotation and the rotation of the previous
homing vector is larger than a threshold. This allows the algorithm to run faster
since wrong picks are detected early in the RANSAC before each match is checked
if it is an inlier. Therefore a lot of useless calculations are prevented. We also
introduced a aborting condition if the inlier rate is higher than a threshold. If the
RANSAC detects more than for example 80% inliers in the first run it is not really
necessary to do all 100 iterations. An inlier rate of 80% or above is only reached
if they are true positive matches. False positive matches are almost randomly dis-
tributed and therefore it is not possible to come up with a very high rate. But
if only 20% of the matches are true positives the RANSAC may choose a wrong
set of inliers. To lower the impact of such a false selection we use a small moving
average filter and additionally weight the output according to the number of inliers
detected. The window of the filter is small because the additional delay must no be
too large.

3.9 Quadrotor control

In order to use the homing vector we need a helicopter controller. This controller is
available and it requires two inputs. To control the helicopter a homing-velocity is
not sufficient. There is no rotor speed which guarantees a stationary position like it
exists for ground vehicles. Therefore the a position estimate is required to know if
the helicopter is moving. We found a way to provide both outputs by running the
homing algorithm twice. First to get the homing vector as intended. We calculate

Chapter 3. Theory 20

a position estimate by integrating the homing vector between the current and the
previous image. The integration of the velocity vector implies a drift for the position
estimate. A small drift does not interfere with the homing ability because the main
navigation is based on the homing vector. The position estimate is just used to
describe the current movement of the helicopter.
The homing vector can either be a velocity command or a target position. The
homing vector consists of the four values (x, y, z, ψ). It is not possible to set the
other two degrees of freedom (roll and pitch) of a three dimensional space manually
because they can only be at zero for a stationary flight.

Figure 3.16: Helicopter Control

Chapter 4

Results

We made the first tests of our homing algorithm with a hand-held camera. To pro-
vide a reference position we compared the outputs with those of a simultaneously
executed parallel tracking and mapping (PTAM). Later in the project we used ROS
bag-files which were created during a remotely controlled flight with the helicopter.
Those bag-files contain all necessary informations. The stored data allows us to
compare different parameter settings by always having the same input to the hom-
ing. At the end of the project we control the helicopter in real time by the output
of the homing. During all tests the homing was executed on a 2.5GHz dual-core
laptop. The images taken by the helicopter were therefore streamed over a WLAN
connection.

Loop times (3900 loops)

300 20 40 60 80 100 120
250 milliseconds
200
150 Figure 4.1: Loop times
100

50
0
0

21

Chapter 4. Results 22

4.1 Loop Times

To ensure a smooth flight the whole algorithm has to run several times a second.
The analysis of 3900 loops showed an average loop time of about 50 milliseconds
(Fig. 4.1). The 20Hz update rate is fast enough to control the helicopter.
Figure 4.2 shows how those 50ms are splited over the different parts of the algorithm.
The most demanding part is the RANSAC. The evaluation of the longest run shows
the large variation in the RANSAC times which are the main reason for the delay.
The other parts always take about the same time to execute. The getMatches,
floatingScale and RANSAC appear two times because the homing algorithm is used
twice (to get a position and velocity estimate). The execution time depends on the
number of features found in each image. The RANSAC duration itself is linked
to the number of detected matches. Therefore it is important to set the feature
detection and matching thresholds appropriately.

Figure 4.2: Loop times analysis

4.2 Comparasion with PTAM

If there is no ground truth data (e.g. vicon) available a new method can be com-
pared with existing alternatives. An alternative which has proven to be an accurate
localization method is the PTAM [14]. The provided position estimate can be as-
sumed to be good if the map quality parameter is good. During our tests this was
always the case.
The homing method developed in this thesis has some advantages compared to the
PTAM. The memory requirements are smaller since only some keyframes are stored
and no complete map is provided. This makes it less susceptible for getting lost.
Additionally the homing does not need an initialization first to provide a velocity
command. A known initial height can improve the performance of both methods

y23 4.2. Comparasion with PTAM

but it is not necessary. The initial height can be used in either method to scale the
output appropriately which can improve the performance of the helicopter control.

Homing Vector verified with PTAM Position

−0.2

−0.1

0

0.1

0.2

0.3

0.4 ReferencePoint
PTAM−Path
HomingVector

0.5
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

x

Figure 4.3: Homing vector for a circular path (x-y)

To be able to compare the homing vector with the PTAM positions they are ap-
propriately scaled by a constant factor. We determine this factor by comparing the
integrated position estimate with the position estimation of the PTAM.
We compared both algorithm by tracking a circular path around a single keypoint
with a hand held camera. This allows us to compare the homing vector with the
position estimation of the PTAM. The resulting homing vector for each position is
shown in Fig. 4.3. Most of the vectors are pointing towards the reference point as
expected. Small errors in the feature positions or matching affect the short vectors
more than the long ones. This causes the slight deviations for positions close to the
reference position. The length of the vector even contains a good estimation of the
distance.
To verify the height change estimate the z component of the homing vector is plotted
along with the x component (Fig 4.4). This is just a side view of the previous imate.
In this image there is also not a single outlier for the large height differences. The
distance estimate is again reasonable. For positions close to the keypoint the homing
vector may point in a wrong direction but in this particular case the good distance
estimate which is small reduces the negative effects on the homing. The combination
of direction and distance estimate allows a good estimation of the homing vector
for any position.
A statistical result of the test is shown in the histograms of Figure 4.5. The homing
vector is therefore splited in to three parts. The horizontal direction error has some
outliers. They are mainly occurring for short vectors which are much more sensitive
to noise or other inaccuracies. For homing vectors from position further away from
the reference the direction error is almost zero. The mean value of the errors is at
0 but the variance is quite large for this data.
The horizontal and vertical distance error show almost no outliers. This leads to

Chapter 4. Results 24

Homing Vector verified with PTAM Position

−0.3

−0.4

−0.5

−0.6

z

−0.7

−0.8 ReferencePoint
−0.9 PTAM−Path
HomingVector

−1
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

x

Figure 4.4: Homing vector for a circular path (x-z)

a variance of the errors of 0. This proves the first impression which was that the
distance estimate over the whole dataset is good but the direction for small distances
may be wrong.

Errors compared with PTAM Position (597 Datapoints)
300 300 300

µ:0.01 σ2:1.02 µ:−0.04 σ2:0.01 µ:0.01 σ2:0.00

250 250 250

200 200 200

150 150 150

100 100 100

50 50 50

00 0 1 0
−2 0 2 −1 −1 0 1

horizontal Direction [rad] horizontal Distance [m] vertical Distance [m]

Figure 4.5: Error distributions with PTAM as reference

dx [m]25 4.3. Recorded Vicon data

dy [m]4.3 Recorded Vicon data

The first flight tests were not controlled by the homing algorithm. To get some real
data including vicon position and IMU measurements the helicopter was remote
controlled and all necessary informations were stored in a bag file. Based on the
saved images and the IMU data the homing algorithm was executed.
First the position estimate is compared with the ground truth (shown in Fig 4.6).
The drift of the error for the horizontal position (x, y) over the whole dataset is
even smaller than expected. This proves that the position estimate can be used to
control the helicopter. We created this plot by using the vicon altitude. Since the
scaling of the position estimate the altitude of the camera it is crucial to have an
good height estimate. In this case we used the vicon altitude to prove the usability
of the horizontal position estimate.
The altitude estimation does not show as good results as the x and y estimate
(Fig. 4.7). Especially after 130 seconds the height estimate is really bad. This is
likely caused by the faster movements which are also about the yaw axis of the
helicopter. The overlap between subsequent images is therefore lower which results
in a worse matching. The altitude estimation is also influenced by the floating point
scale information which is not accurate enough. The latter issue should be canceled
by using the Brisk descriptor which offers a better scale information for the detected
features. The rotation about the yaw axis is detected better with the Brisk as well.
Additionally all helicopter tests were done with the smaller 100◦ fisheye lens which
was not calibrated before. An undistortion of the images can improve the stability
of the algorithm. Features close to the border of the image may not match with the
corresponding features in the center of the other images because of the distortion.
This problem mainly appears if two subsequent images differ much which is the case
for high dynamic movements of the helicopter.

Error of Integrated position vs. Ground Truth (Vicon)
2

1

0

−1

−2
20 40 60 80 100 120 140 160 180 200 220
Time [s]

2

1

0

−1

−2
20 40 60 80 100 120 140 160 180 200 220
Time [s]

Figure 4.6: Position estimate error of the homing algorithm

Based on the saved dataset the homing vectors are compared with the ground truth

Chapter 4. Results 26

Error of Integrated height vs. Ground Truth (Vicon) Error of Integrated Yaw−Rotation vs. Ground Truth (Vicon)
2 60
50
1 40
30
0 20
10
−1
0
−2 −10
−20
−3 −30

−4 20 40 60 80 100 120 140 160 180 200 220
20 40 60 80 100 120 140 160 180 200 220 time [s]
Time [s]
dz [m]
Angle [deg]

Figure 4.7: Position estimate error (z, ψ) of the homing algorithm

data. Our homing creates more than 200 keyframes while evaluating the dataset.
The resulting homing vectors are spread over many different lengths.
The evaluation is again plotted in three histograms (Fig. 4.8). The mean values
of all errors are again at zero like they should be. All three histograms appear a
bit wider than in the PTAM experiment. The variances of the distance estimates
are slightly bigger than before. This result reflects a more dynamic test. It has
much more keyframes and the camera is not constantly looking straight down. The
deviations in the camera orientation from the vertical axis are compensated by the
roll-pitch compensation. This compensation can never be as good as if the camera
is always pointing straight downwards.
To show that most of the wrong directions are at points close to the respective
keypoint the same histograms are plotted only for homing vectors larger than 0.15m
(Fig 4.9). The variance of the horizontal direction error shrinks to almost a third
of the previous value (40.1◦) to 14.9 degrees. As stated above the latter error is of
more interest because the good distance estimate for short distances already limits
the impact of the bad direction.
Additionally the navigation was designed to keep the goal-keyframe far away. This
is done with a higher distance threshold to mark a keyframe as reached during the
homing compared to the threshold for storing a new keyframe while tracking (e.g.
learning). We set the threshold for creating a new keyframe to 40 pixels which means
if the estimated homing vector in pixels is longer than 40 pixels a new keyframe is
stored. During the homing the threshold of reaching a keyframe is about twice as
high. The keyframe which is chosen to be the current goal is therefore not the closest
keyframe to the current position but rather the second closest. This guarantees a
large homing vector which leads to a good homing direction estimation.

27 4.3. Recorded Vicon data

Errors compared with Ground Truth (4352 Datapoints)

1000 µ:−0.02 σ2:0.70 1000 µ:−0.01 σ2:0.02 1000 µ:0.01 σ2:0.05
900 900 900

800 800 800

700 700 700

600 600 600

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

00 0
−2 0 2 −1 0 1 −1 0 1

horizontal Direction [rad] horizontal Distance [m] vertical Distance [m]

Figure 4.8: Homing vector errors

Errors compared with Ground Truth (2281 Datapoints)

700 µ:0.03 σ2:0.26 700 µ:−0.08 σ2:0.01 700 µ:0.02 σ2:0.05

600 600 600

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

00 1 0
−2 0 2 −1 0 −1 0 1

horizontal Direction [rad] horizontal Distance [m] vertical Distance [m]

Figure 4.9: Homing vector errors for vectors with length larger than 0.15m

Chapter 4. Results 28

4.4 Flight test

The first test with actually controlling the helicopter with our homing approach
were done very late in the project. The reason is that there was no vicon available
at the ASL earlier. The vicon is used to guarantee a safe flight and to compare the
homing with some ground truth data.
We could no test the each part of the algorithm at once. For the first test we used
the vicon for the position estimate and tested only the x and y component of the
homing vector. During the test we faced a hardware problem. The camera often
stopped transmitting images which made the homing fail. If there is no camera
image available the homing is obviously not able to execute.
The very first test was to hover at constant position. Therefore a keyframe is saved
and the homing should navigate the helicopter always back to the position where
the keyframe was taken. The helicopter was able to hover in a limited range but it
was not able to constantly stay at the same position. This reflects the fact that the
direction estimate may be wrong for short distances and small disturbances may be
misinterpreted. Due to lack of time we were not able to optimize the parameters
for this particular task.

−1.2 Homing Vector verified with Reference Position
−1
Vicon−Path
Reference Points
Homing Vector
New goal position

−0.8

y

−0.6

−0.4

−0.2

−2.4 −2.2 −2 −1.8 −1.6 −1.4 −1.2 −1
x

Figure 4.10: Homing controlled flight

In a second test we tried to check the homing ability of the helicopter. Therefore the
helicopter should fly back a previously trained path. The keyframes were generated
by moving the helicopter and the camera by hand at a constant height. Over the L-
shaped path a total of 29 keyframes were stored (Fig. 4.10). The absolute positions
of those keyframes are shown as green crosses in the figure. At the position to
the lower right we started the rotors of the helicopter and switched to the homing
mode. In this mode the homing does not create new keyframe but rather select one
of the previously taken keyframes as goal. The helicopter was able to navigate back
several keyframes. It would likely been able to navigate back the whole path but we
often faced the problem with the image transmitting mentioned above. The homing
vectors at various positions are also shown in the image. The homing vector should

29 4.5. Preliminary results with Brisk descriptor

ideally always point to the current goal position. The goal position is changing if
a keyframe is reached. The goal is always about two keyframes away (i.e. to the
left in this image) from the current position. The positions at which a keyframe is
reached and a new goal is selected are indicated by the black lines which connect the
current position with the new goal position. The blue homing vectors do not always
point in an appropriate direction. The bad direction estimates may be caused by
false positive feature matches. We did not further investigate it because we focused
on the Brisk implementation which promises to be more accurate.

4.5 Preliminary results with Brisk descriptor

Based on the data stored during the flight test we were able to test the homing
based on the Brisk descriptor. We did no flight test where the helicopter is actually
controlled by the Brisk-homing but we feed the saved data to the new application
which allows us to compare it with the Calonder implementation.
The Brisk descriptor shows a much more stable performance. The average inlier
rate of the calonder version was only about 25%. This ratio is improved by the Brisk
to about 80%. The output is also much smoother because the feature matching is
more reliable whereas the Calonder implementation appeared more random. It is
obviously that the higher inlier ratio improves the whole algorithm. The RANSAC
can speed up since it needs much less samples to find the correct set of inliers.
The histograms (Fig. 4.11) approve the first impression. The variance of all errors
is much smaller because there are almost no outliers.

Errors compared with Ground Truth (375 Datapoints)

250 250 250

µ:−0.06 σ2:0.42 µ:−0.04 σ2:0.00 µ:−0.02 σ2:0.00

200 200 200

150 150 150

100 100 100

50 50 50

00 0
−2 0 2 −1 0 1 −1 0 1

horizontal Direction [rad] horizontal Distance [m] vertical Distance [m]

Figure 4.11: Brisk Histogram

We did no flight test with the Brisk based homing so far. But it is possible to
use the recored data from the first flight test and simulate a new homing output
(Fig. 4.12). The traveled path is obviously the identical to the one before. The
important difference is in the homing vectors. They look much more promising
since there is no vector pointing in a wrong direction.

Chapter 4. Results 30

Descriptor Calonder Brisk
Avg. Inlier rate
Angular error 27.7% 81.5%
horizontal distance error 10.23◦ 3.77◦
altitude error 0.021m 0.041m
0.011m 0.016m

Table 4.1: Comparasion between Calonder and Brisk descriptor

Homing Vector verified with Reference Position

−1.4

Vicon−Path

Reference Points

−1.2 Homing Vector
New goal position

−1

−0.8

y

−0.6

−0.4

−0.2

−2.5 −2 −1.5 −1
x

Figure 4.12: Homing vector computed with Brisk

Chapter 5

Conclusion

We successfully showed that our new homing method is able to fulfill its tasks. Es-
pecially the improved version with the better Brisk descriptor is able to provide a
good estimate of the homing vector. The very small direction error of the velocity
estimate allows a reliable homing. During the experiments with the flying helicopter
we faced some hardware problems which did not allow us to test everything. The
homing in vertical direction is not tested on the real system so far but the experi-
ments with the hand-held camera look promising. These results have to be verified
in an experiment with the flying helicopter. The same applies to our approach for
getting a position estimate which is required for the helicopter control.
The implementation with Brisk descriptor was only tested in simulations so far. This
implementation should lead to even better results while flying due to an improved
accuracy in the feature matching.
In some of our experiments we worked with the distorted camera image. Our
results for the raw images of the 100◦ lens mounted on the helicopter showed that
our algorithm is robust enough to handle those distortions. The undistortion of a
camera is not a big task nowadays but if the algorithm does not have to dewarp the
image each loop this speeds up the execution time.
The memory requirements are lower in our approach than in alternatives which
store a detailed map of the environment. In our current setting the memory and
CPU requirements are not a crucial factor since the homing is executed on a laptop.
For a later setup where everything has to be done on board this may change. Our
approach navigates with a topological map based on keyframes which reduces the
probability to get lost. Another improvement compared to the PTAM is that our
method does not need an initialization first. Both approaches can deliver more ac-
curate distance estimates if they are appropriately scaled. But an accurate absolute
distance estimate is not crucial.
In order to improve the algorithm in the future it is recommended to do much more
tests on the real system. This will show the possibilities and limits of our approach.
There are still some parameters which can be tuned by further testing such as the
feature detection and matching thresholds. They highly depend on the environment
and the light conditions. The threshold for creating a new keyframe for the homing
is not definitely determined as well.
Future work about this topic could contain an improved filtering (e.g. based on
a covariance matrix) of the homing vector estimates. Many applications based on
monocular cameras have to deal with the lack of depth informations. It might be
possible to find a way similar to our altitude estimate based on the feature scales
to provide some sort of depth informations.

31

Chapter 5. Conclusion 32

Appendix A

Application Design

A.1 Framework (GUI)

Our application is implemented as package ROS (brisk homing). The image and
IMU data are read from a ROS topic and the outputs (i.e. homing vector) are pub-
lished as topic as well. The parameters can be changed in a Dynamic Reconfigure
Window (Fig. A.1). The parameters are described in the table below.

Figure A.1: Dynamic Reconfigure Parameters
33

Appendix A. Application Design 34

topics img-topic to listen from
resetIntegrator set position estimate to current vicon position
newRef take a new keyframe (this is automatically
done when in homing track mode
plot Plot the matches (as in Figure A.2)
undistort Choose if the undistorted image is used
Feature thres Threshold for detecting a corner as a feature
Feature scales Number of scales (octaves) to check for features
Matching thres Distance threshold to match two feature descriptors
RANSAC iter Maximum number of iterations of the RANSAC
RANSAC thres Threshold to flag a match as inlier
W distance Length error which indicates still an inlier (weight) [pixel]
W bearing Bearing error which indicates still an inlier (weight) [deg]
Filter Filter type to use for the homing vector
(None, Moving Average, Weighted Mean)
FilterMemory Total weight for the Weighted Mean filter:
current estimate is weighted by its number of inlier
focallength Focal length of the camera [pixel]
Trust Set how to trust the position estimate
(Trust all, use vicon-z, use vicon-xyz )
alternativeZ use alternative z-position estimate (based on homing
with respect to last keyframe instead of last frame)
adjust h set the initial height manually to ’height’
height manually set altitude
useIMU use IMU data for roll & pitch compensation
useYawEstimate use previous yaw to filter bad selections in the RANSAC
homing track automatically store new keyframes for the homing
homing enable homing mode, navigate backwards through the keyframes
homing plot plot informations about the homing (Fig. A.3)
homing new Distance threshold to automatically save a new keyframe [pixel]
homing back Distance threshold to set a keyframe as reached (homing mode) [pixel]

35 A.1. Framework (GUI)

If the plot check-box is enabled the application displays a window with all matches.
This window should not be enabled all the time since it enlarges the execution time
significant. A typical output is shown in the figure below (A.2). The current image
is shown on the left along with the keyframe on the right. The colored matches are
matches which are tagged as inlier while the black matches indicate the outlier.

Figure A.2: Application plot: Matches
For testing and debugging we created another output window which contains the
main informations about the homing (Fig. A.3). On the upper left the number of
the current keyframe (6) is displayed. This helps to know how many keyframes
are stored so far. The big number on the bottom (35.2621) stands for the distance
to the last keyframe. If this distance exceeds the value set in homing new a new
keyframe is generated. In the homing mode this distance is used to check if a
keyframe is reached (homing back ). The small numbers show the values of each
component of the homing vector in meters respectively degrees. This values are
graphically displayed as well. A blue line is used for the horizontal part of the
homing vector while the altitude change is drawn as a vertical green line. If a yaw
rotation is detected this is shown as a red circular segment. The graphical display
of the homing vector allows to check in real time if the homing vector is reasonable.

Figure A.3: Application plot: Homing

Appendix A. Application Design 36

Bibliography

[1] M. Liu, C. Pradalier, F. Pomerlau, R. Siegwart Visual Homing from
Scale with an Uncalibrated Omni-Cam

[2] M. Franz and H. Mallot Biomimetic robot navigation Robotics and au-
tonomous Systems, vol. 30, no. 1-2, pp. 133aˆ154, 2000.

[3] D. Churchill and A. Vardy Homing in scale space in IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems, 2008. pp. 1307aˆ1312.

[4] P. Corke Mobile robot navigation as a planar visual servoing problem, in
Robotics Research: The Tenth International Symposium, vol. 6. Springer, 2001,
pp. 361aˆ372.

[5] E. Rosten, T. Drummond Machine learning for high-speed corner detection
European Conference on Computer Vision, 2006

[6] E. Rosten, R. Porter, T. Drummond FASTER and better: A machine
learning approach to corner detection IEEE Trans. Pattern Analysis and Ma-
chine Intelligence, 2010

[7] D.G. Lowe Object recognition from local scale-invariant features The Pro-
ceedings of the Seventh IEEE International Conference on Computer Vision,
1999

[8] E. Mair, G- D. Hager, D. Burschka, M. Suppa, G. Hirzinger Adaptive
and generic corner detection based on the accelerated segment test In European
Conference on Computer Vision, September 2010

[9] M. Calonder, V. Lepetit, P. Fua Keypoint Signatures for Fast Learning
and Recognition Proceedings of European Conference on Computer Vision,
2008

[10] M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, P. Mihe-
lich Compact Signatures for High-speed Interest Point Description and Match-
ing Proceedings of the International Conference on Computer Vision, 2009

[11] D. Scaramuzza, A. Martinelli and R. Siegwart A Flexible Technique
for Accurate Omnidirectional Camera Calibration and Structure from Motion
Proceedings of IEEE International Conference of Vision Systems (ICVS’06),
New York, January, 2006.

[12] D. Scaramuzza, A. Martinelli and R. Siegwart A Toolbox for Easy Cal-
ibrating Omnidirectional Cameras Proceedings to IEEE International Confer-
ence on Intelligent Robots and Systems (IROS 2006), Beijing China, October,
2006.

37

Bibliography 38

[13] D. Scaramuzza Omnidirectional Vision: from Calibration to Robot Motion
Estimation, ETH Zurich, PhD Thesis no. 17635. PhD Thesis advisor: Prof.
Roland Siegwart. Committee members: Prof. Patrick Rives (INRIA Sophia
Antipolis), Prof. Luc Van Gool (ETH Zurich). Chair: Prof. Lino Guzzella
(ETH Zurich), Zurich, February, 2008.

[14] G. Klein, D. Murray Parallel Tracking and Mapping for Small AR
Workspaces, Proc. Sixth IEEE and ACM International Symposium on Mixed
and Augmented Reality (ISMAR’07), Nara, Japan 2007

[15] M. Blo¨sch, S. Weiss, D. Scaramuzza and R. Siegwart Vision Based
MAV Navigation in Unknown and Unstructured Environments, ETH Zurich,
ASL, 2010

[16] S. Zingg, D. Scaramuzza, S. Weiss, R. Siegwart MAV Navigation through Indoor
Corridors Using Optical Flow, ETH Zurich, ASL, 2010

[17] L. Weiss, A. Sanderson, C. Neuman Dynamic Sensor-Based Control of
Robots with Visual Feedback IEEE Journal of Robotics and Automation, Oc-
tober, 1987.

[18] openCV (Open Source Computer Vision)
http://opencv.willowgarage.com

[19] Point Grey
http://www.ptgrey.com/

[20] P. Mattmann, S. Weiss, M. Achtelik Fast Feature Extraction for Visual
Navigation, ETH Zurich, ASL, 2010

Click to View FlipBook Version