Wednesday, January 29, 2003
      ( 11:00 AM ) Matt  

Image processing and the edit distance

A few days ago I talked about gaze detection, and a principle component of my method was to extract depth information from a pair of stereo images. The tricky part about getting depth information from stereo images is determining which parts of one image correspond to which parts of the other images. If you are familiar with Unix, you will know of the diff utility which takes two files and determines which parts are the same and which parts are unique to each file. This is effectively what we need to do with the stereo images. So, let's start by looking at how diff works.

The diff utility computes something called the edit distance between the two files. The edit distance is the minimum number of edits required to change one file into another. An edit can be the insertion of a character, deletion of a character, or modification of a character. In the process of computing the edit distance, it finds a minimal set of edits required to turn one file into another. All the insertions, deletions, and changes correspond to places where the files differ.

Here is what the edit distance equation looks like:

D(a.A,b.B) = D(A,B) if a = b
min(D(A,B) + 1, // modification
D(a.A,B) + 1, // insertion
D(A,b.B) + 1) // deletion

If the first characters are equal, then the edit distance of the complete files is the same as the edit distance of the remainder of the files without the first characters. If the first characters are different, then we need to make some sort of change. The best one, is whichever makes the remainder of the files have the smallest edit distance. We add the cost of each type of change and find the minimum.

We could do a similar process on a stereo pair of lines from our stereo images. The places where no change is necessary means that those areas probably correspond to each other.

There's a problem though. How often are two images going to have regions which are exactly the same? There is invariably going to be noise in the images, and even without noise one image may be offset by a distance less than a pixel. So, how do we account for this?

What if we make the cost of a modification dependent on how much has to be modified? That way, if two pixels are very similar to each other the cost of modification will be small. For two nearly identical pixels, the modification cost would be almost the same as it would be if they were really equal. For pixels which are significantly different, there would be an appropriately larger cost. For pixels which are very different, it might even be cheaper to perform an insertion and deletion instead of a modification. This way, if we set up our costs right, then modifications in our result also indicate correspondence.

Here is a modified edit distance equation:

d(a,b) = |a-b| // Cost of a change

D(a.A,b.B) = min(D(A,B) + d(a,b), // modification
D(a.A,B) + i, // insertion
D(A,b.B) + i) // deletion

The cost of a change is given by the new function d(a,b) which computes the difference between two characters, or pixels. The cost of insertion and deletion is another constant which may be tuned. 2i should be less than the maximum value returned by d(a,b) so that insertions and deletions can be preferred over modifications.

# -

Comments: Post a Comment

Dreams I have...

Powered by Blogger
Feel free to e-mail me.

free hit counter