A study of Spatial Resolution (Hyperacuity) based on Bayesian Ideal Observer method

 

Psych221 Final Project

Sam Kavusi

 

 

 

Table of contents:

Introduction

Explanation of the study

Check for the Spatial Frequency

Check for the spatial resolution (Hyperacuity)

References

Appendices:

Appendix A) Why is there the name Bayesian?

Appendix B) Effect of sampling on the correlation result

 

 

Introduction:

 

One interesting approach for modeling eye is using the concept of ideal observer. The methodology is that we assume we have a good model for the optics of our eye and also we have a good model about the cone responses. Then we decide what is the best decision that our brain can make based on the distorted data that it has received from the eye. Interestingly, this is very similar to the fundamental problem in Statistical Signal Processing.

 

In this project, I wanted to compare the ability of the Bayesian ideal observer to distinguish spatial offsets between a pair of lines. I also wanted to compare this result when we have the same ideal observer and find the maximum spatial frequency that it can detect.

 

It has been mentioned in the class as well as Professor Wandell’s “Foundations of Vision” book (pp 239-243) that we can localize targets at a finer resolution than the spacing of the cone mosaic. To the best of the author’s knowledge Westheimer and McKee [1] are the first ones who addressed this point directly. Many researchers have studied this phenomenon, referred to as Hyperacuity.

 

The interesting fact is that, the resolution of our localization is much better than our spatial frequency resolution. It seems counter intuitive at the beginning because normally we think the minimum period that we can see is the minimum distance that we can observe as well. Obviously, they are related but they are not the same.

 

Different researches had different approaches about this problem. After the Chevron-line experiment of Westheimer-McKee [1], many other researchers tried to do more complicated experiments. Burbeck has an interesting chapter in [2] showing a version of Weber’s law about spatial resolution. It basically says that if the lengths of the lines are larger the discrimination threshold will become larger as well. There is a good collection of methods of approaching to the problem in [3].

                                                                                                                                                                      

The most interesting approach is the approach of Ideal Bayesian Observer developed by Wilson Geisler. A nice overview of the method is in [4][5] and some simulation results are shown. Basically, one can assume an ideal discriminator in every stage of human visual system and get an upper bound on the performance. He has done this study for the general case considering all possible psychological mechanisms, and after this sequential Ideal-Observer he compared the performance results with the actual data. This is a very interesting approach giving us hints on the behavior of our perceptual system. However, it was not clear to me how exactly they have been conducted and if they did consider the effect of sampling for example. Moreover, in the model ML detector was considered as the best detector, which is not necessarily true. The best detector is MAP detector, and it can be equal to the ML detector only if all of the possibilities are equally probable. An example as a very first introduction to these rules can be found here.

 

 

Explanation of the study:

 

What I have done was studying the ideal observer just for a specific case, and for two different stimuli. The model is a very simplified model, but it gives a good intuition to us. I will carefully consider the effect of the Cone Sampling in my analysis. In Fig 1 the block diagram of my analysis is shown. The range of my stimulus is only 1200 seconds (angular) with resolution of 1 second.

 

 

 

Figure 1

 

 

The Westheimer Line Spread Function is drawn in Fig 2  representing the optics of our eye.

Figure 2

 

I will just study the simple case of one-dimensional mosaic of cones and I assume to have one cone filling 100 seconds. In Fig 3 the two different schemes that I can locate the cones are shown. We can have the cones such that the center of the stimulus is on the center of the cones giving us 11 possible cones (Mosaic 2) or having cones started from the beginning of the respective area giving us 12 possible cones (Mosaic 1). It is assumed that the response of a cone is the integral of the light that it receives.

 

 

                            

Figure 3

 

One might wonder if the position of the cones changes the ideal observer’s ability. The answer is definitely YES, and the blunt explanation is that because sampling is NOT SHIFT INVARIANT. It might not be very important if the sampling rate is high but in the cases like my study it is very important to take this into account. Actually, this fact plays a very important role in my study. Obviously, depending on the starting point of the cones, other Mosaics were also possible. Here I am just trying two extremes. Actually, I have checked other shifts in the cones but it seems that these are the two best choices for the different cases.

 

It should be mentioned that our system is still linear. Line spread function is linear, and Integration, windowing, and sampling are linear as well.

 

Then there is our Ideal detector. For our ideal detector, I assume that we have an ML detector choosing between limited possibilities. The name Bayesian for our observer actually come from the methodology of our detector, which is the best possible the best of our knowledge. If you are not familiar with ML and MAP rules you can refer to this example.

 

 

Check for the Spatial Frequency:

 

In this .m file stimuli with spatial frequencies from 10…200 second was used. The Signal To Noise ratio was assumed to be 100 and the noise to be Gaussian. An ML detector (just a correlator) was used to decide whether there is any variation or not. The assumption was that the observer would check the stimulus with all possible frequencies. For every iteration, a counter would count up or down if the detection was correct or not. The simulation was done for the two different Mosaics. In Fig 4 the result of the simulation is shown. In this figure the red graph is the prediction reliability rate for Mosaic 2 and blue graph is for Mosaic 1 (both ideal observer). We can put the threshold to be above 40 (It means 45 correct and 5 wrong guesses or 90% correct predictions) and then we can have some interesting observations:

 

1)      It is important to choose the correct Mosaic. Here Mosaic 2 is the better choice although it has one less cone. A possible conclusion is that at least for this ideal observer moving the eye for a tiny amount of 50 second can change the ability of the observer to observe spatial frequencies.

2)      Obviously when the spatial period is 100 seconds no variation can be detected. This is the case when the period of the sampling and stimulus are the same and obviously it cannot be detected. However, we will clearly see that there are periods smaller than the period of the cones that are detectable. It is explainable with an analogy with down-sampling. One can look at the cone responses without any noise in this .m file, and then the effect of line spread and down-sampling will be obvious.

Figure 4

 

 

Check for the spatial resolution (Hyperacuity):

 

The setup of this simulation was that I wanted to check whether an ideal observer can differentiate between at center and a line shifted by a second to right or left.

The answer is YES if we put the cones in the correct position. In this study I did not include noise into account mostly because finding the best observer considering noise requires modeling the noise response and distribution in the output of our cones. This is possible but requires extensive simulation and I decided not to have it, and only stick to the ML detector in a heuristic way. Therefore, the result is just the result for a very good observer and not the best observer.

 

In my .m files, we create different stimulus. Samples are shown in Fig 5 where the black one is the reference and we are wondering if the observer can distinguish if the stimulus is to the right (green) or left (red).

 

 

Figure 5

We can try the detector, but I decided to make the estimator first and create the detector out of that. Clearly, estimation of the distance of two stimulus is a harder problem and making the detector is very easy and would be choosing right when the distance is positive and left when it is negative. We will see that it gives us a very good intuition why it really matters to choose the right mosaic.

 

This study is not exactly the same as the spatial resolution (hyperacuity) test because I assume that the observer already knows about the reference, while the actual experiment it is impossible. The explanation is that I assume that the actual observer has the reference stimulus in one row of his/her cones and the other stimuli in another row. In the simulation I are using the same row two times instead of in two different rows.

 

In Fig6 and Fig 7 the cone responses for three stimuli (the distance is just 2 seconds) are shown. Fig 6 is for mosaic 1 and Fig 7 is for mosaic 2 respectively.

 

 

 

Figure 6                                                                                                                               Figure 7

 

It is easy to see that there is a larger difference if we use mosaic1; the reason is basically again shift variance of sampling, and for these stimuli it turned out like this. Note that, the system is linear giving us the right to use many of our known methods. Note that here mosaic 1 is better while for the frequency study mosaic 2 was better.

 

The next stage is checking the correlation of the response of each stimulus with others and itself. Interestingly in the case of mosaic 1 we will see that the correlation of the center stimulus with itself is smaller than the correlation of the center stimulus and the one, which is shifted 5 seconds to the right. The correlations are drawn in Fig 8 and Fig 9 for mosaic 1 and mosaic 2 respectively. It is clear from the picture that I just check for the shifted stimuli up to 20 seconds to left and right.

 

 

Figure 8                                                                                                                Figure 9

 

Now, let’s check where is the maximum correlation of each stimuli:

For Mosaic 1, in Fig 10 and Fig 11 the positions are shown. Note that Fig 11 is just the zoomed in version.

Figure 10

 

Figure 10

Look at Fig 11. Isn’t it interesting? If the stimulus has a shift of 1 second, the estimator guesses that it has a shift of 10! No wonder we have a much higher spatial resolution than our cone size. If we look at Fig 10, it will become clear how it works; we have a gain in our estimation, which is very high when there is a small deviation and low when there is a higher deviation.

 

Actually, the previous paragraph is not quite correct! But the explanation is quite intuitive. The fact is that when there is no shift we estimate a shift of 5 and when there is a shift of 1(-1) we estimate a shift of 15(-5). Obviously, this is not a good estimator (when there is no noise our estimation is wrong!). In fact it is not because, our cone responses is not shift invariant; therefore, we cannot just check for the correlation. However, the perfect estimation is analogous to MAP rule while we are using ML rule. The heuristic is using ML rule and having a correction offset of 5. Therefore, the observer can always say if the estimation is greater than 5 it is to the right and when it is smaller than five the stimulus is to the left.

 

In summary, it shows that for my setup with mosaic 1 and no noise using our ML detector, we can easily differentiate 1 second of spatial distance between two lines. Adding noise will decrease this resolution; the analysis is possible, but it is complicated. The same analysis based on ML is possible, but it is not the best one.

 

Now let’s look at the estimation results with mosaic 2 in Fig 12 and Fig 13 peers of Fig 10 and Fig 11.

 

Figure 12

 

 

Figure 13

 

The results shows that even without noise with mosaic 2 it is impossible to have a spatial resolution better than 4 seconds.

 

 

 

Discussion:

 

This study shows that a very small movement of eye can change the ability of the observer drastically. It also shows that there is no unique best choice for the best mosaic and it is dependent on the kind of discrimination that is required.

 

 

References:

 

[1] Westheimer, G. and McKee, J. 1977. Spatial configurations for visual hyperacuity. Vision Research, 17: 941-947

 

[2] Pattern Recognition by Man and Machine, 1991, edited by Roger J. Watt, Vision and Visual Dysfunction, Vol 14, CRC Press

 

[3] Spatial Vision, edited by David Regan, 1991, Vision and Visual Dysfunction, Vol 10, CRC Press

 

[4] Geisler, W., 1987. Ideal Observer analysis of Visual Discrimination. Frontiers of Visual Science: Proceedings of the 1985 Symposium

 

[5] Geisler, W., 1989. Sequential Ideal-Observer Analysis of Visual Discriminations. Psychological Review, April 1989, Vol. 96

 

 

Appendix A)
Why is there the name Bayesian?

 

If you already know about MAP and ML rule you can easily skip this part.

 

Let’s understand that problem first:

Assume that you are communicating with your friend in a very primitive language made by only two alphabets ‘A’ and ‘B’.  However, your communication is not very reliable and sometimes you hear ‘B’ while your friend said ‘A’. Let’s assume that you can model your communication channel and have the percentages for receiving ‘A’ while ‘A’ was sent as well as receiving ‘A’ while ‘B’ was sent. Some examples are shown in Fig 1A.

 

Figure A1

 

In Case 1, 90% of the time you hear things correctly. Well, the best thing that you can do seems to assume ‘A’ when you hear ‘A’ and vice versa. We can write the equations for that case like:

 

 

In , ‘R = B’ means that you have received ‘B’ given that ‘S =B’ or in other words that you r friend has sent ‘B’.

 

Let’s go for Case 2, in this case you will hear the wrong thing 90% of the time. Well, the best decision rule seems to be choosing ‘A’ when you hear ‘B’ and choosing ‘B’ when you hear ‘A’.

 

Now, what about Case 3, when you receive ‘A’? This one is interesting because your decision is highly dependent on the frequency of ‘A’ and ‘B’ in your friend’s sentences. Think of the cases when the frequency of ‘A’ is much higher than ‘B’ and vice versa.

 

 

As a matter of fact you need to compare  and . Here is when you need to use Bayes’ rule to evaluate these two probabilities that is:

 

 

This decision rule is called the MAP (Maximum A posteriori Probability) rule and our intuitive decision rule for Case 1, 2 is ML (Maximum Likelihood) rule. Fortunately, in the case that frequency of ‘A’ and ‘B’ is the same MAP and ML rule will give the same results.

 

This was a high level example and if you are interested to know more about this you can refer to EE278 class.

 

In short, we can have this general approach: If you see something, you try to think how similar it is to the things that you already know, and you also take into account how probable each scene is.

 

In our example, we use correlation of the observed stimulus with the ideal response of other stimuli to check similarity. Actually, this is an approximation to make the calculations easy. Generally, we should use MAP rule even if we assume (all stimuli are equally probable) because our system is not shift invariant. This will require modeling of the noise, and as far as we do not have a very exact model of the noise using correlation and ML should not cause a considerable difference.

 

 

Appendix B)

Effect of sampling on the correlation result:

 

SAMPLING IS NOT SHIFT-INVARIANT.

 

This is a well-known fact and easy to see mostly when the sampling rate is low. A good example would be Fig; it is clear that the sampling shown in green is very different from sampling shown in red. While both have the same sampling rate, the starting position is different.

 

 

Another important fact to remember is the following:

 

Looking at Fig shows the correlation of the response of a Linear Shift Invariant System to two non-periodic signals (vectors) X(n) and its shifted version X(n-k). The two systems are the same. Clearly the correlation is maximum when there is no shift (k=0). Note that if X(n) was periodic correlation was still maximum when (k=0) but it would not be unique.

 

 

 

 

 

However, if the systems were not shift invariant we could not make this statement. As a matter of fact, we will see this in the simulations.