3d code graphics opencv opengl programming Recommended video vision Website

Structure from Motion and 3D reconstruction on the easy in OpenCV 2.3+ [w/ code]

This time I’ll discuss a basic implementation of a Structure from Motion method, following the steps Hartley and Zisserman show in “The Bible” book: “Multiple View Geometry”. I will show how simply their linear method can be implemented in OpenCV.
I treat this as a kind of tutorial, or a toy example, of how to perform Structure from Motion in OpenCV.
See related posts on using Qt instead of FLTK, triangulation and decomposing the essential matrix.
Update 2017: For a more in-depth tutorial see the new Mastering OpenCV book, chapter 3. Also see a recent post on upgrading to OpenCV3.
Let’s get down to business…

Getting a motion map

The basic thing when doing reconstruction from pairs of images, is that you know the motion: How much “a pixel has moved” from one image to the other. This gives you the ability to reconstruct it’s distance from the camera(s). So our first goal is to try and understand that from a pair of two images.
In calibrated horizontal stereo rigs this is called Disparity, and it refers to the horizontal motion of a pixel. And OpenCV actually has some very good tools to recover horizontal disparity, that can be seen in this sample.
But in our case we don’t have a calibrated rig as we are doing monocular (one camera) depth reconstruction, or in other words: Structure from motion.
You can go about getting a motion map in many different ways, but two canonical ways are: optical flow and feature matching.
Also, I will stick to what OpenCV has to offer, but obviously there is a whole lot of work.

Input pair of images, rotation and translation is unknown

Optical Flow

In optical flow you basically try to “track the pixels” from image 1 to 2, usually assuming a pixel can move only within a certain window in which you will search. OpenCV offers some ways to do optical flow, but I will focus on the newer and nicer one: Farenback’s method for dense optical flow.
The word dense means we look for the motion for every pixel in the image. This is usually costly, but Farneback’s method is linear which is easy to solve, and they have a rocking implementation of it in OpenCV so it basically flies.
Running the function on two images will provide a motion map, however my experiments show that this map is wrong in a fair bit of the times… To cope with that, I am doing an iterative operation, also leveraging the fact the this OF method can use an initial guess.
An example of using Farneback method exists in the samples directory of OpenCV’s repo: here.

Dense O-F using Farneback

Feature Matching

The other way of getting motion is matching features between the two images.
In each image we extract salient features and invariant descriptors, and then match the two sets of features.
It’s very easily done in OpenCV and widely covered by examples and tutorials.
This method however, will not provide a dense motion map. It will provide a very sparse one at best… so that depth reconstruction will also be sparse. We may talk about how to overcome that by hacking some segmentation methods, like superpixels and graph-cuts, in a different post.

SURF features matching, with Fundamental matrix pruning via RANSAC

A hybrid method

Another way that I am working on to get motion is a hybrid between Feature Matching and Optical Flow.
Basically the idea is to perform feature matching at first, and then O-F. When the motion is big, and features move quite a lot in the image, O-F sometimes fails (because pixel movement is usually confined to a search window).
After we get features pairs, we can try to recover a global movement in the image. We use that movement as an initial guess for O-F.

The rigid transform flow recovered from sparse feature matching

Estimating Motion

Once we have a motion map between the two images, it should pose no problem to recover the motion of the camera. The motion is described in the 3×4 matrix P, which is combined of two elements: P = [R|t], which are the Rotational element R and Translational element t.
H&Z give us a bunch of ways of recovering the P matrices for both cameras in Chapter 9 of their book. The central method being – using the Fundamental Matrix. This special 3×3 matrix encodes the epipolar constraint between the images, to put simply: for each point x in image 1 and corresponding point x’ in image 2 the following equation holds: x’Fx = 0.
How does that help us? Well H&Z also prove that if you have F, you can infer the two P matrices. And, if you have (sufficient) point matches between images, which we have, you can find F! Hurray!
This is simply visible in the linear sense. F has 9 entries (but only 8 degrees of freedom), so if we have enough point pairs, we can solve for F in a least squares sense. But… F is better estimated in a more robust way, and OpenCV takes care of all of this for us in the function findFundamentalMat. There are several methods for recovering F there, linear and non-linear.
However, H&Z also point to a problem with using F right away – projective ambiguity. This means that the recovered camera matrices may not be the “real” ones, but instead have gone through some 3D projective transformation. To cope with this, we will use the Essential Matrix instead, which is sort of the same thing (holds epiploar constraint over points) but for calibrated cameras. Using the Essential matrix removes the projective ambiguity and provides a Metric (or Singular) Reconstruction, which means the 3D points are true up to scaling alone, and not up to a projective transformation.

cv::FileStorage fs;"camera_calibration.yml",cv::FileStorage::READ);
Mat F = findFundamentalMat(imgpts1, imgpts2, FM_RANSAC, 0.1, 0.99, status);
Mat E = K.t() * F * K; //according to HZ (9.12)

Now let’s assume one camera is P = [I|0], meaning it hasn’t moved or rotated, getting the second camera matrix, P’ = [R|t], is done as follows:

SVD svd(E);
Matx33d W(0,-1,0,	//HZ 9.13
Matx33d Winv(0,1,0,
Mat_ R = svd.u * Mat(W) * svd.vt; //HZ 9.19
Mat_ t = svd.u.col(2); //u3
P1 = Matx34d(R(0,0),	R(0,1),	R(0,2),	t(0),
		 R(1,0),	R(1,1),	R(1,2),	t(1),
		 R(2,0),	R(2,1),	R(2,2), t(2));

Looks good, now let’s move on to reconstruction.

Reconstruction via Triangulation

Once we have two camera matrices, P and P’, we can recover the 3D structure of the scene. This can be seen simply if we think about it using ray intersection. We have two points in space of the camera centers (one in 0,0,0 and one in t), and we have the location in space of a point both on the image plane of image 1 and on the image plane of image 2. If we simply shoot a ray from from one camera center through the respective point and another ray from the other camera – the intersection of the two rays must be the real location of the object in space.
In real life, none of that works. The rays usually will not intersect (so H&Z refer to the mid-point algorithm, which they dismiss as a bad choice), and ray intersection in general is inferior to other triangulation methods.
H&Z go on to describe their “optimal” triangulation method, which optimizes the solution based on the error from reprojection of the points back to the image plane.
I have implemented the linear triangulation methods they present, and wrote a post about it not long ago: Here.
I also added the Iterative Least Squares method that Hartley presented in his article “Triangulation“, which is said to perform very good and very fast.

"Depth Map"

3D reconstruction

A word of notice, many many times the reconstruction will fail because the Fundamental matrix came out wrong. The results will just look aweful, and nothing like a true reconstruction. To cope with this, you may want to insert a check that will make sure the two P matrices are not completely bogus (you could check for a reasonable rotation for example). If the P matrices, that are derived from the F matrix, are strange, then you can discard this F matrix and compute a new one.
Example of when things go bad...

Toolbox and Framework

I created a small toolbox of the various methods I spoke about in this post, and created a very simple UI. It basically allows you to load two images and then try the different methods on them and get the results.
It’s using FLTK3 for the GUI, and PCL (VTK backend) for visualization of the result 3D point cloud.
It also includes a few classes with a simple API that let’s you get the features matches, motion map, camera matrices from the motion, and finally the 3D point cloud.


Code & Where to go next

The code, as usual, is up for grabs at github:

Now, that have a firm grasp of SfM 🙂 you can go on to visit the following projects, which implement a much more robust solution:
And Wikipedia points to some interesting libraries and code as well:

111 replies on “Structure from Motion and 3D reconstruction on the easy in OpenCV 2.3+ [w/ code]”

hi! 🙂
it’s a very good work! your tutorial give me a solution to my problem of SVD decompostion, extract the R and T matrix from E.
but i m know stuck in triangulation i don’t know how to deale withe it because i m not using only 2 images and also i don’t how to implement the triangulation.
can you please help?

hi, thank you for your previous answer.
this is time i have an other problem:
i m doing the 3D reconstruction from multiple views,
first, i take the two images that have the most number of key points matching.second, i do the same job as you.I find the fundamental matrix by matching then i find the essential matrix from the fundamental, and i compute the R and t of the second image.(let Pa and Pb be the two images)
Pa=[I|0],Pb=[Rb|tb]. I use triangulation to get the 3d position of the key points in image a and b.
now,i want to get the relative pose of the new image c up to unknown scale.
my question is: if i make the same steps using the image b and c to determinate Pc.will the result be correct?(Pb is the same, i don’t change it values, they are still the same values from the first step)
thank you for your help! 🙂

sorry i made a mistake in notation:
Pa an Pb are the extrinsic parameter of the camera(camera matrix if you want)
a and b are the name of images(the same for c)

@sampie, My code assumes that P0 is [I|0] and that’s how it works. You shouldn’t simply use Pb for P0 with the parameters you got from prior motion estimation.
First get Pb, then get the relative Pc’ when you use Pb’=[I|0], and then Pc = [Pc’R * PbR | Pc’t + Pbt] (basically concatenate the rotation and translation from last frame.

thank you for your reply.
if i understand:first i have to get Pb and then to get the relative Pc’ using the same method to determinate Pb but instead of Pa(in the first step) i use Pb’=[I|0] and without forgetting about the first Pb.
my problem was in Pb=[I|0]. i wasn’t sure about working with Pb in it normal value or in [I|0].
thank you very much.

@Roy: First of all thanks a lot for answering me. I am grateful.
I wanted to ask you that in my program I have all the matrices you mentioned in above post like Essential matrix, R, t, p0, p1. Now I want to just calculate depth map of all the points I do not care about actual positions just want depths of all the points. So can you guide me about how to do it? It is urgent please help me.

To get depth for a point you need its position in both images.
If you have a match between every point in image A to a point in image B, you can get their (approximate) depth.
Put all the point-pairs in vectors, then simply use the TriangulatePoints function in my code to get a 3D position for all of them.

Hey @Roy once again thanks for the advice. I have applied cvTriangulatePoints() function. Now, how do I test whether the data I have got is right or not?
And from 3D positions (last argument of cvTriangulatePoints) of all the corners in the image how do I calculate the distance of points from camera (In other words how do I compute Depth map) ?

Re testing, well if it’s real world data – you cannot really test unless you stage a scene with objects at known distance from the camera. Even then you cannot really test because the triangulation is only up to scale… There is really no way around it, only if you know what is a true distance between two 3D points, or your cameras, and then apply this scaling to the rest of the scene.
To see if your data makes sense, simply look a the 3D point cloud and see if it is reasonably close to reality.
Re depth map, simply give each point its distance from the camera center point. You have 2 points in 3D space, just calculate the distance.

Hi, it looks good for start so I would like to try this – but I’m not able to compile this. It’s because of FLTK 3 you use – version 3 is in development stage so no binaries are available (or I’m not able to find them) and compilation of this lib fails, because some files are missing. So I would like to ask which version exactly did you used. Thanks :).

Ok, I deeply apologize, just after I wrote previous comment, I found solution – they renamed “SysMenuBar.cxx” to “” but project files didn’t reflect that. (This happens to me all the time – I was trying find solution for many, many hours.)
I found little problem you maybe should know about; at DistanceUI.cpp, line 29 (imread). When using release version of OpenCV, this causes segfault. Debug version of OpenCV is ok. (I use Windows XP, SP3, Visual Studio 2008, OpenCV 2.3.1.)

Hey @Roy what is W & W_inv matrix used by you in your program? How can it be calculated?

@roy: When I am compiling your code using cmake I am getting error “FLTK not found”
I already have installed FLTK library.

The W matrix is defined in the code, you don’t need to calculate it.
According to the CMakeLists.txt I’ve included it’s looking for FLTK near the code
So you can put it for example like this:


Then it would find it…

@Roy: When I am running your program it compiles successfully in Release mode in VS-2008 but when I load 2 images and click match or other things it gives runtime exception. And also when I try to run your code in debug mode it simply says entry point not found in MSVCP90D.dll
Can you help to run the code successfully?

What is the runtime exception that you get?
In Debug it sets all kinds of preprocessor commands to enable some parts of the GUI.
See that it doesn’t block the ‘int main()’ function by mistake.

Hey the last 4 lines of the CMakeLists.txt are as follows:
But while compiling through Cmake it tells that it can not find DrawKeypoints.cpp
Is this a bug in code??

your previous problem (the runtime exception) is probably connected with:“../../Calibration/out_camera_data.yml”,cv::FileStorage::READ); (c’tor Distance in distance.h) from where the camera intrinsics are taken.

Ok, I have one question. On which OS and with which compilers you compiled this example? Thanks.

@Roy: Thank you.
Well, theoretically yes, but my experiences are little different, so I wanted to have some pictures about probability of a) I’m doing something wrong (IMHO most of cases) and b) code contains something unexpected for WindowsXP/VS (which is little more probable when you use other OS and compiler). (By the way, problems can flow from using FLTK 3 which is alpha version.) (By the way, I’m only little experienced user of programming tools, so I’m not claiming that my conclusions are really true.)

If FLTK doesn’t work, you can always simply use OpenCV’s GUI and just use the SfM-Toy-Library parts without FLTK…

Hey @Roy I want to build a car that detects and avoids obstacles using single camera. So, can you tell me what approach should I apply?

@Roy: Well, I was able to compile FLTK 3 (but had to alter FLTK code, otherwise it wouldn’t compile), my point is that it’s IMHO really unsafe to use alpha.
About GUI: yeah, I was planning to replace it (with Qt), but while I don’t understand problematic of Structure from motion well enough, I afraid I could change sfm code during changing GUI part, so I wanted to compile and run unchanged code first and after then alter it and learn from it with possibility of comparing results from original program and my alters (so I could better identify when I have wrong data and when I did an error).
Anyway, I finally compiled and prepared all libraries for cmake and now I found one example of what I was talking about earlier: you code won’t compile in any environment. In DrawKeypoints.cpp you include libgen.h which is not available at Visual Studio. (Source: )
By the way, I apologize for all errors in my English, it’s not my native language.

@laethnes: comment out the libgen.h line from code and also lines containing basename(last line with imwrite). It is only used to test drawkeypoints.cpp . Instead what you can do is to save certain image in the project folder say “img.jpg”
then in the begining of drawkeypoints.cpp write
Mat img_1 = imread(“img.jpg”);

@kewal: Thanks for your help. Anyway, it still is not compiling. It’s many errors like “MultiCameraDistance.h(58) : error C2065: ‘left_im’ : undeclared identifier”. Also, when included , it couldn’t find the header file. I Googled that I need to install Visual Studio SP1 so I did but now it generates errors like “tuple(498) : error C2065: ‘_Is_swap_move’ : undeclared identifier”. So I guess I just have wrong environment (compiler) so no need to worry about me, I’ll try libmv :3.

Sorry, in my previous post I used include symbols and I totally forgot it’s same as HTML tags. So in second part, I’m talking about including “tuple”.

Hey when I run your program it detects flow only in some portion of the image also the 3d-Visualization is of only that small portion. So is it that it requires images with some specific texture or so?

@Roy I want to build a car that detects and avoids obstacles using single camera. So, can you tell me what approach should I apply?

That is a very big undertaking. Not only you are concerned with sensing from a single camera, but also with pathfinding algorithms and artificial intelligence.
I would say the best way to start with some robotics tutorials online, and start building your robot simple, then going on to higher level vision from a single camera.

@roy I have completed course on building a robotic car.
What I am asking is that is there any technique with which I can find that is there an obstacle in my robot’s path & also can build map of environment so that I could apply path planning algorithm.
I want to do above mention things by using single camera as a sensor and using your smf-toy library.

thank you so much for your great tutorial, it’s really rare to find some good on the net.
after matching the points and do test to filter a good points matching for calculating the Fundamental matrix, i m stock because i don’t know how to use this matrix for finding correspondence point for every single pt in the image, in other terms scanning all points in img1(not only the corners detected by surf) and use fundamental to determinate their correspondence point from img2 with openCv.because l= F.x
F: fundamental matrix
x: point coordinate
thanks 🙂

You are missing some piece of information to make it work.
l = Fx
is very good, but it results in a -line-, an epipolar line but a line still, that is – a collection of points, not a single point.
If you wish to find a corresponding point from image 1 to one image 2, you can either use feature matching, optical flow, infer from other points you have knowledge for, or if you already know all or part of the 3D structure of the scene… there are plenty of ways.

I am trying to track rigid bodies with stereo tracking of retroflective blobs detected from a homemade camera rig . So i have the 4 blobs form two pair of rectified cameras and I compute epipolar for stereo matching, i then correctMacthes with opencv and perform triangulatepoints,i calculate distances and compare to an object list precompilled to group them in a same constellantion soy i can track more than one rigid body , but now i want to know the pose of this constellation from 3d points, or the distances in betwen then.
I am really confused about a roboust method . please let me know if you have any idea.

The easiest way to filter for good matchings should be to check after surf if x’Fx=0 or more x’Fx < T where T is a threshold to deal with the noise.
I want do this too, but my question is how to get F if you already know the motion. I found something like F = [e]P'P+ where P+ is the "pseudo inverse" of P and [e]=P'C. But what is C and how to deal with a pseudo inverse?
is there a easy way?
Really great work, i love this s**t.

Finally got time to play with it :3.
I have one problem with it, can you, please, help me? When I run it with various images, always it ends on Triangulation.cpp:138, code is “u =;”. The function throws exception, code throwing it is at the beginning of “at” function and it is this one:
CV_DbgAssert( dims <= 2 && data && (size.p[0] == 1 || size.p[1] == 1) &&
(unsigned)i0 < (unsigned)(size.p[0] + size.p[1] – 1) &&
elemSize() == CV_ELEM_SIZE(DataType::type) );
I checked it with debugger and variable “um” has following values: dims = 2; data != NULL; size.p[0] = 3; size.p[1] = 1; elemSize() = 8. CV_ELEM_SIZE(DataType::type) returns for _Tp=Point3d value 24. So obviously, last test of assert fails.
So I checked how “um” was created. It’s “Kinv * Mat_(u);” where u has hard-coded size (3) Kinv has size 3×3. First, I thought maybe I have done wrong calibration, but then I found, that code loading from calibration file is commented and instead, cam_matrix (from which is Kinv calculated) is hard-coded too.
I downloaded OpenCV 2.3.0 code yesterday and compiled. For debug version of program, I use debug version of OpenCV (so I can track where exactly errors come from).
So, what I did wrong?

I apologize, again, I forget that < and > are treated as tags and removed. Here are corrections:
“u =;” -> “u =<Point3d>(0);”
“CV_ELEM_SIZE(DataType::type)” -> “CV_ELEM_SIZE(DataType<_Tp>::type)”
(Too bad there is no “Preview” button…)

I really feel stupid for being unable to do just simple thing as compiling and running the program. (And so being unable to focus on main problem – SfM.)
I found something else I have problem with:
MultiCameraPnP.h:164, 169 and 174 – do I understand correctly, that because of index 3 imgpt_for_img, I have to use at least 4 images? Why? Especially when adding point clouds is uncommented only for index 3? (I altered it so it checks size of the array and adds points in all ifs. About my results later.)
FindCameraMatrices.cpp:238 – this always causes exception (but the branch of the program is not always executed). With debugging, I found out that problem it’s because depth (of matrix converted from input parameter) is 7 – user defined – and for that, mean function (cv::mean(…)) does not have defined special function (and the assertion fails). I altered it from “X = mean(pcloud);” to “X = mean(CloudPointsToPoints(pcloud));” but still, I don’t know what I did wrong.
In order to experiment, I also made temporal alter so I could avoid problem I wrote a few minutes ago. I changed “u =<Point3d>(0);” to “u = Point3d(<double>(0),<double>(1),<double>(2));”
And with these alters, I can finally test your program. (Well, there is also one more alter: I couldn’t compile code for visualization of results – see the error in one of my posts here – so I implemented mine with SDL and OpenGL.) I don’t know if alters I mentioned in this post are good or not, but it finally returns point cloud which was quite good in one of my test images (others images generates totally wrong results, but I guess it’s bad quality of them) (these images – with good results – are only two).

first of all congrats for this beautiful code, it is brilliant the way it is coded. But I have problem with compiling it. My question is:
I have linux mint 13 mate distribution, along with OpenCV 2.4 latest nightly and obtained PCL from their rep did not compile on my own..
I have a lot of compiling error first (since descriptor /matcher has been changed slightly to nonfree I had to modify the code slightly to accommodate that. I came to the point PCL library giving me error:

Visualization.cpp:250:27: error: no matching function for call to ‘pcl::MovingLeastSquares::setSearchMethod(pcl::KdTree::Ptr&)’
Visualization.cpp:250:27: note: candidate is:
/usr/include/pcl-1.5/pcl/surface/mls.h:102:7: note: void pcl::MovingLeastSquares::setSearchMethod(const KdTreePtr&) [with PointInT = pcl::PointXYZRGB, NormalOutT = pcl::Normal, pcl::MovingLeastSquares::KdTreePtr = boost::shared_ptr<pcl::search::Search >]
/usr/include/pcl-1.5/pcl/surface/mls.h:102:7: note: no known conversion for argument 1 from ‘pcl::KdTree::Ptr {aka boost::shared_ptr<pcl::KdTree >}’ to ‘const KdTreePtr’
make[2]: *** [CMakeFiles/SfMToyUI.dir/Visualization.cpp.o] Error 1
make[1]: *** [CMakeFiles/SfMToyUI.dir/all] Error 2
make: *** [all] Error 2
that is over my head to debug currently 🙂
What is the build requirement? How should I modify the code to eliminate the compile error?
thanks for your help

I would like to ask one more thing – I finished analyzing your code and I would like to ask if I can use some parts from it. It’s for school diploma project and of course, I’m not going to claim authorship of these parts. Of course it may end that I don’t use anything from that at the end (and so use will be for my education only in that case), but still :3.

If you found it useful – you can use it!
The only thing I ask anyone in regards to my code is that they put a reference in their blogs, and share their own knowledge with the world.

I’ve made some changes to the code that will solve this problem. Please check out from git and compile again.

thank you for addressing my question but I checked out revised code from git then tried to compile the code again, I still get compiler error on “GetAlignedPointsFromMatch” function function header and its call from the code mismatch
Projects/test/royshil-SfM-Toy-Library-3a6c30e/Distance.h: In member function ‘virtual const std::vector<cv::Vec >& Distance::getPointCloudRGB()’:
Projects/test/royshil-SfM-Toy-Library-3a6c30e/Distance.h:50:83: warning: returning reference to temporary [enabled by default]
Projects/test/royshil-SfM-Toy-Library-3a6c30e/Distance.h:109:68: error: too few arguments to function ‘void GetAlignedPointsFromMatch(const std::vector&, const std::vector&, const std::vector&, std::vector&, std::vector&, std::vector&)’
Projects/test/royshil-SfM-Toy-Library-3a6c30e/Common.h:25:6: note: declared here

@Ray Ok, thank you :). Reference is matter of course for me and about sharing information – yeah, of course, but I hope it’s not problem that it won’t be in English because my English skills are not good for writing so much text. But if I use it, it will be in the diploma thesis which is always freely accessible by public when finished.

Hello. I have a monocular camera and its intrinsic parameter. I try to find the movement of a camera from images in an unknown world, and unknow points. I don’t want reconstruction I only want the rotation and the translation of the camera at different time to discribe its mouvement in 3D. Is your chapter “Estimating motion” useful for me? Thanks

Yes you can benefit from Structure from Motion techniques even without reconstructing.
You might as well do the reconstruction as it will make your pose estimation more robust (you will have more 2d-3d point correspondences for a better PnP solution). This way you will create more of a SLAM method, that makes a 3d map and tracks it.
But if you only want to know P=[R|t] you can just stick to working with 2 frames at a time, matching features, getting the Fundamental Matrix then decompose it to R and t.
Keep in mind the translation will be at arbitrary scale every time you do this, every 2 frames will give you a result on a different scale! so you can really only count on the rotation element. If you want the translation to be consistent, you must have some known variable that can lock the scale factor (like distance to the “marker”, size of marker, etc.)

Hi first of all – nice Tut!!! Thank you for that and for sharing!
The Question: Why are you using:
Mat_ R = svd.u * Mat(W) * svd.vt; //HZ 9.19
Mat_ t = svd.u.col(2); //u3
Recording too Hartley and Zisserman there are 4 options.
R coult be yours or UW^TV^T and t could be plus or minus.
I heard that you can check which one is correct like this:
calculate the 3D Point for each option and use this one, where the 3D point is infront of both cameras. //HZ Fig.12 (the four possible solutions for calibrated reconstruction from E.)
Have you done this in your code somewhere else? Is this constant for your system?
I try to do a sfm only with intrinsic parameters and Images. And i got stuck at this point. My translation should be more or less constant – but actual its jumping between plus and minus. Do I have too check this for every Frame?
Thanks for your help

Thank you so far for all your help!
I am trying to understand it with the math behind. I hope i can ask you again if I need help – but you got some papers for this question, that would help a lot.
May I just add something?
I, and I guess a lot more people, would appreciate your work with a little more explanation. Please do not understand this wrong – I do not criticize any of your work – its so damn good! But I was searching for good tutorials who brings us into this material (sfm bundel adjustment and so on…). All i got are some piece here and there how to do it. But at the one hand there was the explanation missing and at the other there was only the math and explanation but no way how to transfer this into code. But it is important to get both! I know this is some kind of tinker (playing with lego blocks) but I realy wish a more detailed tutorial from people who has done it so far. With more code at the explanation.
It seems like theres are a lot of different ways to do this. I have one camera with a known cameramatrix and i have more than 100 images and want to get the translation and rotation of the move of this camera. Maybe later i want the 3D modell but this is in far distance. So long way to go but it seems i got, because of you, one step further. But there are so many things I have to try like: which method for getting feature points? which method for select them to be good feature points? use stereo method or bundle adjusment. How to solve my rotation and translation (as you mentioned there is a alternative to SVD) Can I get better results when I filter my Images and and and… It would be nice to get one good tutorial who explain in detail how to get to a good result and than you can try different ways. Okay now I have to go working.
Thanks a lot Roy. Thank you so much!

I’m working on another post for the SfM work that I’m doing right now.
It should deal with much of what you talk about: matching, filtering, validating, making it more robust, fast and easy.
I will post it soon, but you will have to wait for a little while.

hi Roy,
i m doing a 3D reconstruction with S.F.M method, i got bad result of my scene, so i conclude the value of fundamental matrix are faults or my function of getting motion throw the 2 planes(images) are faults.
i m doing a scene reconstruction so all the points(or visible points in 2 views) are matched by adding the difference in coord:i subtract the matched feature points(sirf)-> x – x’ / y – y and store the result when it finish storing i analyze the vector to determinate which one is the most repeated.
the problem is difference isn’t the same, you find some having the same diff X but not in Y(+1 to 10)ot the opposite.
so why i can’t get 2 points having same differences in both X/Y?
how can i reduce faults matches?(filtering…but how)
how can i know if the F Mtx have strange value?(i never calculated manually values of F mMtx)
this are values i get from matching two images(strange or not?):
6.4849e-06 -0.000114901 0.0128916 0.000121995 3.28844e-05 -0.0296755 -0.0136524 0.0150315 1
sorry i have a lot of questions and i hope you will answer all of them, but i m really confused, stopped.
thanks a lot 🙂
for information this are the steps i did:
1)calibrate camera: get camera matrix K
2)detect/match feature points with sirf
3)get fundamental matrix
4)get Essential Matrix
5)get Projection Matrix(rotation/translation)
6)triangulate points
note:i m trying to add undistort step but failed due to segment fault.

@ sampie,
HI Sampie i can’t realy help you with you Questions but I have trouble with my F too, I guess. My triangulation gets 3D who are far away from a chessboard (I was trying to reconstruct the 3D points of the chessboard, hopeing it would get a chessboard after all – it didn’t).
I selected the keypoints with the robust matcher as descriped at the OpenCV 2 Computer Vision Application Programming Cookbook. The matches looks good and stable but I do not know if my Fundamental is right. How can we prove it? Its one of the important steps -> because E and R|t are “made” out of F.
If you want to write our experience -> pls ask Roy for my Emailadress and lets write about our progress. I am working at my problem every day so I need to get further but I stuck here.
hope too write you soon

thank you for your reply,
if you don’t mind we talk on skype it’s better
js_qt_chedy skype name

thanks for your code about triangulation, i appreciate your work.
The above link
is not working. Can you please add the real link.
Its important for me, because I have strange results in the triangulation with Feature Matching and FMatrix.
I have nineteen Matching points. (i choose three sides of a cube, with points between the edges)
Than I estimate the F-Matrix robust with RANSAC.
My F-Matrix fulfill the epipolar constraint for all points. So its must be right.
After using linear or iterativ triangulation and solve with SVD, my z-Coordinate is always like a hundred times smaller than x and y.
I have no idea whats wrong. You talked about ‘projective ambiguity’. Is it possible that this is the reason? But thats is a too big projective distortion, isn’t it?
Hope you can help me. Thanks.

The file moved, as I’m changing the structure of the library: separating core lib from UI and executables.
Have a look at:
I think 19 points for a RANSAC process is very low. You must add much more points for a robust estimation.
But – if you are absolutely sure about your points – you don’t need RANSAC, and even better off without it!
Just solve for Fundamental Matrix in the 5-point,7-point or 8-point algorithms (they exist in OpenCV).
To go for a metric triangulation (instead of projective) you should use the camera’s calibration matrix – if you don’t have it just use a mock-up one, like I use in my code, and later refine it (auto-calibration) once you get enough robust 3D points with 2D correspondances (use OpenCV’s calibrateCamera() functions).

Hey Roy,
thanks for the fast support.
Sorry, but I have still like a thousand questions….
You suggest to use 5, 7 or 8 point algorithm, if i have only 19 point, where i am absolutely sure of the points?
Why? And which one of the points of the 19 would you choose? A random one?
If i don’t have enough time, i think there is no reason to avoid RANSAC.
I am quite sure about my points, but if I manually click some points a few pixel away from the right position, the robust algorithm would eliminate that. Right?
You say if i want a metric reconstruction i only have to use the calibration matrix. So you mean the E-Matrix (E = K^T * F * K) and extract R and t from that (with the four solutions you mentioned). At auto-calibration and metric reconstruction i use absolute dual quadric (ADQ). So my question is: Is there any difference of the 3D points between 1) using E-Matrix and extract R and t from it and than triangulate and 2) using F-Matrix extract R and t (only up to a projective ambiguity), and apply H^-1, which I get from ADQ (page 463 in HZ), to the points?
Can you please say some words for the problem about my z-coordinates, which are wrong in my reconstruction with F-Matrix (not metric)?
You got an idea?
I got two other questions:
In method “CheckCoherentRotation” you check determinant(R) +-=1, to see if R is a rotationmatrix. I understand that, because that is one of the property of a rotationmatrix.
But in HZ 2nd Edition, page 256 it says R is [e’]_x * F, with RANK 2.
If the matrix has rank 2 determinant(R) = 0, right? So its always 0, and always not a real rotationmatrix (for example its not invertible). May I understand something wrong?
To get the translationvector t = e’, i have to solve F^T * e’ = 0, which is a linear equation system and can solved with SVD algorithm. Where I see the problem, is the fact, that
1) the sign of the SVD is not unique
(Matlab and OpenCV has different signs for example),
so I don’t know the real direction. Okay, i know it by myself, because i move the camera. So its possible to change the sign, if its wrong. But its not possible to get the right sign automatically, right?
(Or should I have to use both and check, in which one of the two the points are in front of the camera?)
2)SVD normalize the solution vector to 1. So I lost the exact direction. Is that a typical problem in using Feature Matching algorithm?
I hope that there is no limit in the amount of questions, which I can ask in this forum…
greeting and thanks

The sentence is wrong
If i don’t have enough time, i think there is no reason to avoid RANSAC.
the right one is
If i have enough time, i think there is no reason to avoid RANSAC.

first of all thanks a lot for your great tuto!! it will helps me in farther steps.
my problem is not being able to match all visible points in 2 images, i try to define difference between the feature point extracted. but it didn’t work because i can’t get the right difference, they just always came up different-> fault matches
please can you tell me how do you do?
if it’s the hybrid method can you pls explane it more?how do i exactly proceed?

Hey Roy,
i have an idea / answer about my post above and the z-coordinate problem.
What I haven`t mentioned is, that I have generate the cube, which i want to reconstruct, in OpenGL.
I think the problem could be the fact, that the camera centre is too far away from the image plane or in other word the focal length is too big. If the focal length is too big, the ray’s (C to the image point x) will be too near to each other, so there will be not quite much height (z-coordinate) in the triangulation. Is that right? (I can upload a picture to illustrate that, if you want)
Its also the problem, if the 3D-Objekt is next to the image plane, right?
Anyway, i can not really set the focal length in OpenGL. I can only manipulate the scale factor, so i scaled my cube, but the triangulation isn’t different. I don’t understand what I am doing wrong.
You got an idea, please help me, its urgent 🙁

You could add a simple check for consistent planer luminosity gradients within selected coplaner data points, and if I recall there is an old paper describing the method in greater detail.
Personally, I find that such methods are only good for reducing errors in pose recovery processes. Essentially, even good feature detectors suffer instability due to various aberrations, and the SURF class hessian function parameter in the recent OpenCV is still immutable.

I’d be happy to integrate any code of yours that does coplanar points color-consistency filtering!
And, I moved to using FAST+ORB features as SURF is now in the non-free section in OpenCV…

Unfortunately, my old scaffold code is too tightly coupled into a rather messy detector class for old 2.3. Another one of my attempts for accurate scene context classification (classic “car” or “toy car” disambiguation). However, I still try to report bugs to the OpenCV group when I have time.
I’m sure a well commented simple standalone SfM example for students would be welcomed by the OpenCV group.

Dear Roy,
thanks so much for the guide and for sharing the code. I found something I’m not quite clear about:
In triangulation.cpp, where you implement Hartley and Strum’s iterative method, within the IterativeLinearLSTriangulation, and within the loop, you first assign X_ = LinearLSTriangulation(u,P,u1,P1). You compute the error weights and re-weight matrices, perform SVD again as in the linear method and solve for X_ again. However, at the next iteration, the loop reaches “X_ = LinearLSTriangulation(u,P,u1,P1)”, discarding the weighted X_.
It seems like the weighted X_ should be reused instead, or am I missing something in the algorithm?
Thanks in advance.

hi dear Roy,
i have a question. i want to find automatic relative R,T for the block of images. but i don’t know how i can automatically realize which solution of four solution(R,T) is true?
please help me. my thesis is in danger.
thank u so much.:*

it looks that code was developed on windows machine and i’m just curious: anyone managed to compile this code under linux? [or it’s just me, poor little newbie, who can not do it properly:)]

Code works fine under linux. Clone it from github, first compile SSBA like README says, and then compile using cmake from source root dir

Hi Roy,
After generating projects by CMake, I am compiling your codes via Visual Studio 2010. The SfMToyLibrary was compiled successfully but other two projects (SfMToyUI & TestStuff) failed because there was a link error: “cannot open file v3d.lib”. Could you please tell me where can I get v3d.lib?

hi roy
i want to get distance between camera to an object in enviroment, whether this code can helps me??! i wanna do this project by single camera.

Hi, I am a master in science student from Mexico, I am currently dealing with the SfM problem and I am trying to try this code, so far I have been able to build it on Windows 7 using Visual Studio 2010, actually I was able to run the project and I get to the part where the cloud point is created, problem is, I just can’t install FLTK3 (I tried to compile with MinGW and got errors, then I compiled V1.3 but when using CMake to configure the SfMtoy project it just won’t find it), so I get results in the command line but I can’t see anything visually, I just get a black screen with a FPS count that changes slightly during the execution, any idea on how to get visual feedback?

Dear Roy,
I managed to get your code running and at first I tried using it without bundle adjustment, which worked for me. After uncommenting the bundle adjustment (OpenCV-Version) I get very strange results. It always looks like a cone-like structure which just doesn’t look like the actual scene. Is this normal?
Best regards,

Indeed BA, or even without BA, the results can end up very bad and not slightly related to reality. The problem is with the stability of the projection matrices recovery. This is why Snavely et al. put in so much work into filtering the point-matches so that the recovered projection matrix will correct (especially the first 2-view triangulation, in which they spend a lot of resources to find the right 2 views to use). I believe Snavely et al. also re-do the reconstruction a number of times starting from different views until they come up with a good baseline, something which I don’t do in my code.

Hi Roy,
thanks for the code even though i don’t understand all parts of it i’m trying to modify a few things. Is there a easy way to to implement the algorithm in an iterative way. So that at the the beginning the is a set of images and i start to reconstruct the scene. Later i get another bunch of images and want to expand the pointcloud. I managed to adapt the matchMatrix but something goes horribly wrong with the alignment, so the result is totally wrong.
Another question regarding the CloudPoint struct. Is there a easy way to attach a descriptor (from RichFeature Matcher) to each CloudPoint?
Maybe if for every CloudPoint the descriptor is known 3D-2D Matching could be speed up (if an extrinsic guess is known).
Thanks in advance!

The algorithm is already iterative. It adds more and more images to the reconstruction starting from a baseline.
What you need (if I get you right) is persistency. Means to go back to the reconstruction after it finished and keep adding more images.
Certainly if you can add descriptors (so simply 2D image patches) for each point in the could, and then save this information to disk after you’re done, you can then pick up where you finished with more images to add. (The descriptors/patches will supply 2D-3D correspondences)
This is how Snavely et al. work, by the way. In the paper they call it “tracks”. Look it up.

Thanks for sharing your code. I’ve been trying to get it running on OSX and continue to have a problem I can’t track down.
I installed most of the required programs with macports with the exception of VTK and PCL, which I compiled from source. After some work, I can get SfMUI to compile, but have the following errors on execution. Not sure if it is a dynamic library problem or not but I thought I’d see if you could give any hints. I’m stumped.
objc[71137]: Class vtkCocoaTimer is implemented in both /usr/local/lib/libpcl_visualization.1.6.dylib and /Users/xxx/OpenSourceApps/SfM-Toy-Library-master/build2/./SfMToyUI. One of the two will be used. Which one is undefined.
objc[71137]: Class vtkCocoaServer is implemented in both /usr/local/lib/libpcl_visualization.1.6.dylib and /Users/xxx/OpenSourceApps/SfM-Toy-Library-master/build2/./SfMToyUI. One of the two will be used. Which one is undefined.
objc[71137]: Class vtkCocoaFullScreenWindow is implemented in both /usr/local/lib/libpcl_visualization.1.6.dylib and /Users/xxx/OpenSourceApps/SfM-Toy-Library-master/build2/./SfMToyUI. One of the two will be used. Which one is undefined.
objc[71137]: Class vtkCocoaGLView is implemented in both /usr/local/lib/libpcl_visualization.1.6.dylib and /Users/xxx/OpenSourceApps/SfM-Toy-Library-master/build2/./SfMToyUI. One of the two will be used. Which one is undefined.
USAGE: ./SfMToyUI [use rich features (RICH/OF) = RICH] [use GPU (GPU/CPU) = GPU] [downscale factor = 1.0]

Looks like it’s working and waiting for arguments..
Try to run
./SfMToyUI RICH CPU 1.0 directory/of/images/

I have had difficulty getting the visualization element of the code working properly. I’m sure that this results from my implementation. I have tested the code only by turning off the pcl visualization elements and writing a pcd file to disk and viewing the results.
I’ve used current versions of OpenCV 2.4.5, PCL 1.6 and VTK 5.10.1. Would that cause a problem?
Here is an example of an exception that occurred prior to compiling with visualization turned off.
=========================== Load Images ===========================
objc[18869]: Object 0x7fd433c5c810 of class __NSDictionaryM autoreleased with no pool in place – just leaking – break on objc_autoreleaseNoPool() to debug
———— Match P1000966.JPG,P1000967.JPG ————
———— Match P1000968.JPG,P1000969.JPG ————
———— Match P1000965.JPG,P1000965.JPG_keypoints.png ————
———— Match P1000970.JPG,P1000971.JPG ————
OpenCV Error: Assertion failed (prevPyr[level * lvlStep1].size() == nextPyr[level * lvlStep2].size()) in calcOpticalFlowPyrLK, file /Users/xxx/OpenSourceApps/opencv-2.4.5/modules/video/src/lkpyramid.cpp, line 727
libc++abi.dylib: terminate called throwing an exception

I want to know whether FLTK 1.x will work or not, I tried a lot to download FLTK3 but I failed. It was asking for SVN access I don’t have the access. Do you have link for downloading FLTK3 for free?

Hi, Roy, thanks for your code! I’ve compiled SfMLibrary and SfMToyUI (command line version). Than I’m using “SfMToyUI.exe img/jpg” command to start the whole process.
=========================== Load Images========================
——————– extract feature points for all images ——————-
———— Match P1000965.JPG,P1000966.JPG ————
imgpts1 has ———— Match P1000968.JPG,P1000969.JPG ————
23843 points (descriptors 23843)
imgpts1 has 20647 points (descriptors 20647)
imgpts2 has 22522 points (descriptors 22522)
imgpts2 has 22572 points (descriptors 22572)
After that SfMToy Library Viewer window appears and I see nothing, just black window (with 1000 FPS). Could you give any suggestions ?
Thanks in advance !

Hi roy, thanks for such a great tutorial!
I’ve followed your lead, and got the cloud points finally, but what I need is a dense depth map of the scene, how can I generate the depth map from these points?
I’ve googled around and found a paper talking about generating the depth map from “3d triangles”, is it the right way to do it?

Hi Roy;
Thanks for your excellent share. I would like to ask you about the 3D points definition. If I understand it well, 3D points are defined according to the first camera coordinates ? but then why you are using two different container as pcloud and pcloud1 ?
Thanks a lot..

Hi Roy!
Thanks for sharing your work! I have a question concerning the triangulation step. Why do you
use normalised image coordinates here? In my SFM project, I somehow get better reconstruction
results when just using pixel coordinates (in homogeneous representation). Also in the work of
Hartley and Sturm (CVIU, 1997) this coordinate transformation is not explicitly stated.
Best regards,

Hi Roy,
Thank you for this discussion. I really needed it.
I tried to use your code in VC2010. unfortunately I got an error during the project build. Here is the message:
—— Rebuild All started: Project: SfMToyLibrary, Configuration: Debug Win32 —–
Build started 1/30/2014 10:46:15 AM.
Deleting file “SfMToyLibrary.dir\Debug\SfMToyLibrary.lastbuildstate”.
Touching “SfMToyLibrary.dir\Debug\SfMToyLibrary.unsuccessfulbuild”.
Building Custom Rule I:\SfM-Toy-Library-master/SfMToyLib/CMakeLists.txt
CMake does not need to re-run because I:\SfM-Toy-Library-master\build\SfMToyLib\CMakeFiles\generate.stamp is up-to-date.
cl : Command line error D8021: invalid numeric argument ‘/Wno-sign-compare’
Do you have any idea about the problem or it’s solution? Any help would be appreciated.

Regarding to my previous problem, I solved it by removing the related option from VS project file.
Also, to compile the code I did another change: replacing “glEnable(GL_RESCALE_NORMAL)” with “glEnable(GL_NORMALIZE)” in “sfmviewer.cpp” file.
However, when I run the application with default parameters, after loading the images and starting the process, it stops with these messages:
QPixmap: It is not safe to use pixmaps outside the GUI thread
QColor::setRgbF: RGB parameters out of range
I have no idea how to deal with it. Any help is appreciated in advance.

I have an implementation for 3d reconstruction of coplanar points (features of text that I want to scan in), recorded by the same camera that is simply moved over the text.
For some reason the second camera is turned by 180deg (rodriguez angle), which also can be seen in the matrix as e.g. R00 == (-0.98…).
Is there a classic misake I might have made?

excuse me, there is no real “second” camera. it is supposed to be stereoscopic 3d reconstruction via two consecutive images in the whole line of the scan process.

hi Roy,
thank you for this post. I really needed it.
I compiled your coed successfully in Ubuntu12.04. However when I run the application I get this error:
X Error: GLXBadFBConfig 178
Extension: 153 (Uknown extension)
Minor opcode: 34 (Unknown request)
Resource id: 0x5c0000b
SfMToyUI: /home/hamid/Apps/eigen-eigen-6b38706d90a9/Eigen/src/Core/DenseStorage.h:78: Eigen::internal::plain_array::plain_array() [with T = double, int Size = 16, int MatrixOrArrayOptions = 0]: Assertion `(reinterpret_cast(array) & 0xf) == 0 && “this assertion is explained here: ” “” ” **** READ THIS WEB PAGE !!! ****”‘ failed.
Aborted (core dumped)
do you have any suggestion to get rid of it?

hi i downloaded the code from and compiled it I have installed QGLViewer in VS 2010 environment. I configured the code but i get the error
2>—— Build started: Project: SfMToyLibrary, Configuration: Release Win32 ——
2>Build started 3/8/2014 7:50:20 PM.
2> Touching “SfMToyLibrary.dir\Release\SfMToyLibrary.unsuccessfulbuild”.
2> All outputs are up-to-date.
2>cl : Command line error D8021: invalid numeric argument ‘/Wno-sign-compare’
2>Build FAILED.
2>Time Elapsed 00:00:00.11
3>—— Build started: Project: SfMToyUI, Configuration: Release Win32 ——
3>Build started 3/8/2014 7:50:20 PM.
3> Touching “SfMToyUI.dir\Release\SfMToyUI.unsuccessfulbuild”.
3> All outputs are up-to-date.
3> ViewerInterface.cpp
3> sfmviewer.cpp
3> moc_ViewerInterface.cxx
3> moc_sfmviewer.cxx
3> main.cpp
3>C:\Users\Reeba\Desktop\SfM-Toy-Library-master\SfMToyLib\Distance.h(70): warning C4172: returning address of local variable or temporary
3> Generating Code…
3>LINK : fatal error LNK1181: cannot open input file ‘SfMToyLib\Release\SfMToyLibrary.lib’
3>Build FAILED.
How do i solve this problem I really need help. I have been stuck with this code for sometime now. Do tell me if there is any solution to the problem

Hello Roy,
Incredible piece of code given it is provided “open”.
Here I’d like to post hints for those having issues compiling it (linux, Qt 4.8.4 and ) but also ask about an error I get after successfully loading the dataset and the sfm, upon display.
First the hint:
Upon running cmake, it couldn’t find libQGLViewer-2.5.1 complaining:
QGLVIEWER not found
So for this specifiy in the CMakeLists.txt where the libQGLViewer is located:
set(QGLVIEWER_DIR_HINT “” CACHE PATH “libQGLViewer directory”)
set(QGLVIEWER_DIR_HINT “/your/path/to/libQGLViewer-2.5.1” CACHE PATH “libQGLViewer directory”)
And also there’s no release directory in /QGLViewer, so change:
2) my problem: I can execute the program, load the dataset and run the sfm. But then after the computation is done I get:
QGLContext::makeCurrent(): Failed.
QPixmap: It is not safe to use pixmaps outside the GUI thread
QGLContext::makeCurrent(): Failed.
And this one I couldn’t find how to solve it…

Hello, first of all, this is awesome.
I was wondering how to get a point cloud from you software and save it into a file.
I was using the getPointCloudBeforeBA() function in your MultiCameraPnP class.
and wrote the points into a file in the format:
x0, y0, z0,
x1, y1, z1,
I plotted the point I got in the file but it seemed very unrealistic.
Am I doing it right?

Does the setup assume that the camera calibration is the same for all images? In the case of two different cameras with different calibration should a separate K and Kinv be used through the triangulation steps? Would that require modifying the code?
Thank you for the great work. It has been a wonderful learning experience to go through the details of SfM!

hi thank you for your work.
i’ve downloaded your code but i can’t compile it with cmake
when i use the command “cmake .” Q_WS is not found
— Looking for Q_WS_WIN – not found
— Looking for Q_WS_QWS
— Looking for Q_WS_QWS – not found
— Looking for Q_WS_MAC
— Looking for Q_WS_MAC – not found
can someone help me please??
thank you

Hello Sir Roy,
I am very thankful for the code. But i don’t know how to use this code. I have tried compiling the code using CMAke but it gave me errors. Any solution you can provide please. I need it.

Hello there,
I am very new in 3D reconstruction aand currently I m trying to Do SFM. I have camera Calibration matrix, then I have image pair, then I have F and then essential matrix and R and t from E decomposition. I am tangled triangulation and visualization. idont know why my 3d looks like a line. No real reconstruction. Could anyone help???

I’m trying to play with SFM myself and tried to use your sample as both tutorial and a kind of ground truth.
Currently, I’m unable to get it working on my data sets; it ends up with 0/XXXX inliers, though the start is very promising.
Can you please provide a kind of “golden” data to try on?

hi roy,
how to implement this codes in visual studio 2012, because im gettinf errors

Thank you very much for providing these helpful information.
Are the likes to the codes are broken; whereas none of them opens the website!

Hello! Thanks for article. I compute R and t for second camera with OpenCV function recoverPose, but it compute t as unit vector. Does your code also compute translation as unit vector?

Hi Fred, You have to CMake and build the SSBA-3.0 and add the path to the Debug/Release folder of SSBA build to the SFMToyUI Linker properties.

Hi Sir Roy,
How do you get the depth map you presented? I guess that you applied hybrid method. Can you say more about that ( how to get that depth map?)

Leave a Reply

Your email address will not be published. Required fields are marked *