image

A few days ago, I was having lunch in a coffee shop when I noticed someone in workout clothes walk by. Actually, I didn’t really notice the first such person I saw. It was only when a third person wearing workout clothes walked by that I realized I had seen similarly garbed individuals walk by in the same direction over the span of 30 minutes.

Three gym-going (or leaving) people walking by in a half-hour in the same direction. It must mean a gym was nearby. But how close exactly?

I wasn’t familiar with this part of the city, and I didn’t know of any gyms nearby, but I knew the rate at which I was seeing these people must be related to my distance from a gym. This intuition was fuzzy but there seemed to be a concrete question underlying it:

From a café window, you see gym-goers pass by every 10 minutes in one direction. Estimate how far you are from a gym.

This question is a good starting point, connecting rates to distance in a way that suggests a mathematical model is around the corner. Could we use what I observed and some best guesses about city street infrastructure to build such a model and answer the question?

The Question Refined

Let’s take the relevant city blocks to compose a square grid. The lines of the grids are the streets of a city, and the corners of the squares define intersections.

We assume people traverse the city by moving along the lines of the grid (and not into the squares).

Take the observer’s location (i.e., my location in the coffee shop) to be in the middle of a line on this grid, and the gym’s location to be in the middle of some other (unknown) line on the grid. For simplicity, we can define distance in terms of city blocks. The number of intersections you need to cross to get from one point to another is the number of blocks between these two points.

Gym in a random location

Coffe shop in a city block with a gym at an uknown location

Now let’s talk about the walking people.

People in a city have different places they’re trying to get to. Each of their destinations might be fixed in their minds, but taking a large enough number of them together, we can assume the that the set of all of their trajectories correspond to a set of random walks around the city blocks. For such random walks, we assume these people walk with equal probability in any direction when they have a choice of where to go.

Random walk around city grid

People executing a random walk around city blocks

We will assume (again for simplicity) that the people I saw were leaving the gym. By symmetry, we could also assume they were going to the gym, and the analysis would be the same. But I won’t assume both since I only saw them going in one direction.

Say that people leave the gym at a rate \(\lambda\). We will ignore their walking speed since it doesn’t affect the observed rate at which we see people [1]. Since the gym is in the middle of a city street (i.e., a line), the people have generally two directions they can go: They can either go to the left or to the right. In reality, people can also cross the street and enter another building, but we will assume people can only move along the lines which represent city streets.

When someone leaves the gym there is a probability of 1/2 that they move in either the left or right direction.

Two path choices

When someone encounters an intersection, there is a probability of 1/3 that they move in any of the cardinal directions that is not in the direction they came from.

Three path choices

By randomly walking along the city blocks, people leaving the gym can eventually pass by the coffee shop where I was having lunch.

Assume that the shortest such path from the gym to the coffee shop is \(k\) blocks, that is, someone would have to go through \(k\) 4-line intersections to get from the gym to the coffee shop

The initial homework assignment can then be rephrased as follows:

Say that you observe people passing in front of you at a rate \(\lambda'\). These people leave the gym at a rate \(\lambda\) and execute a random walk around the grid points. How can we estimate the number of blocks \(k\) we are from the gym?

Initial Answer

To answer this question we need a formula relating the quantities \(\lambda\), \(\lambda'\), and \(k\).

The above discussion of steps and probabilities gives us a seemingly easy way to obtain this formula. When someone leaves the gym, there is a \(1/2\) chance they will move either left or right. Let’s assume the coeffee shop is only in one of these directions and thus only sees the \(1/2\) of people who move that way. Therefore the initial rate of \(\lambda\) is cut in half to \(\lambda/2\). Next, every time a person reaches an intersection, there is a probability \(1/3\) they move in any particular direction. If the coffee shop is \(k\) intersections away from the gym, then \((1/3)^k\) people must have made the specific trajectory choice to wind up in my view of the coffee shop. Thus, the rate \(\lambda'\) at which I am seeing people must be

$$ \lambda' \overset{?}{=} \frac{\lambda}{2} \left( \frac{1}{3}\right)^{k} \qquad (1) $$

This is a clean enough answer and we could use estimates of \(\lambda\) and my observed rate \(\lambda'\) to determine \(k\). But reflecting on the explanation, we can see that it isn’t quite right. First, it doesn’t consider the fact that people can take longer trajectories than the shortest \(k\)-block path and still end up in front of the coffee shop. Moreover, even if we focused on the shortest trajectory, there are often multiple such trajectories connecting two points.

Thus there are some corrections we need to make to this answer, and they all extend from a discussion of paths.

Shortest Paths on a Grid

In a square lattice, there are many different paths that connect two points, so there are similarly many different paths that connect the gym to the coffee shop. There are shorter paths and longer paths. However, because each step of the path leads to an additional factor of \(1/3\) in the path probability calculation, shorter paths contribute more than longer ones [2]. So we will focus only on the contribution from the shortest path under the assumption that the random walk nature leads the contributions from other paths to be suppressed by factors less than \(1\).

Shortest vs longest path

But even for the shortest path there is multiplicity. Within a grid, two points are connected by multiple shortest paths. So to get the net contribution from these multiple paths, we would need to multiply Eq.(1) by the sum of all possible shortest paths between the coffee shop and the gym. There is a related formula for this sum: If two points in a grid are separated by \(N\) steps of which \(m\) are horizontal steps and \(N-m\) are vertical steps, then the number of paths between them is

$$ \hspace{2cm} \binom{N}{m} \hspace{2cm}\qquad (2) $$

The explanation of this formula is straightforward. For the \(N\) steps, we need to choose \(m\) of them to be the horizontal ones and there are “\(N\) choose \(m\)” ways to do this. In our case, we have assumed that the coffee shop and gym are separated by \(k\) blocks, which translates to \(N = k-1\) steps in the grid between them. (Note that the variable \(N\) defining grid steps does not take into account origin point, but in defining \(k\) we do take the origin into account since the coffee shop and gym are “in the middle” of a block. So \(k = N+1\)).

However, there is an additional ambiguity. Because we don’t know where the gym is, we also don’t know how many horizontal or vertical steps define this separation. We could have all horizontal steps, all vertical steps or anything in between.

To incorporate our ignorance into the model, we will assume all of these possibilities are equally likely, and we will compute the average number of paths contributed from all possibilities together. Take the total number of steps \(N\) to be constant. Then, there could be anywhere from \(0\) to \(N\) horizontal in our paths. So the average number of paths (weighted by uniform probability of \(N\) steps) for an unknown number of horizontal steps is

$$ \hspace{2cm}\frac{1}{N} \sum_{m=0}^{N} \binom{N}{m} = \frac{2^{N}}{N}. \hspace{2cm}\qquad (3) $$

With \(k= N+1\), we find the average number of paths from a gym \(k\) blocks away to the café is

$$ \hspace{2cm}\frac{2^{k-1}}{k-1}. \hspace{2cm} (4) $$

Now, multiplying Eq.(1) by the weight factor Eq.(4), we find the observed rate of \(\lambda'\)

$$ \hspace{2cm} \lambda' = \frac{\lambda}{4}\left(\frac{2}{3}\right)^{k}\frac{1}{k-1}. \hspace{2cm} (5) $$

Note that this result only applies for \(k>1\). If \(k=0\), then we would have \(\lambda' = \lambda/2\) and if \(k=1\), then we would have \(\lambda' = (\lambda/2 )(1/3)\). The combinatorial formula for “number of shortest paths” Eq.(4) doesn’t apply in the cases of \(k=0\) or \(k=1\) because Eq.(2) doesn’t apply in these cases. When the gym and coffee shop are one intersection point away (i.e. \(k=1\)) or are on the same block (i.e., \(k=0\)), then we haven’t left the origin that defines Eq.(2) and there is only a single shortest path from the starting point to the ending point.

Solving Eq.(5) for \(k\) requires us to introduce a Lambert W function, but is doable. Rewriting the equation as

$$ \hspace{2cm} \frac{4 \lambda'}{\lambda} (k-1) = \exp\left[-k \log(3/2)\right], \hspace{2cm} \qquad (6) $$

and then solving for \(k\), we obtain

$$ \hspace{2cm} k = 1 + \frac{1}{\log(3/2)}W_0\left(\frac{\lambda \log(3/2)}{6\lambda'}\right). \hspace{2cm} \qquad (7) $$

Everyday Estimates

OK so now we have a formula for the number of blocks we are from the gym in terms of the rate \(\lambda'\) at which we observe the gym-folk and the rate \(\lambda\) that people leave the gym. Let's use the formula to get a real value for \(k\).

I observed three gym-goers in 30 minutes, so the observed rate is \(\lambda' = 0.1 \text{ min}^{-1}\). The source rate is harder to determine. How often do people tend to leave the gym at 3pm on a random weekday? I don’t quite know, but I can at least bound it by reasonable estimates.

At one end, I don’t imagine people will be leaving the gym more often than once every minute so we could set the maximum value to \(\lambda_{\text{max}} = 1\text{ min}^{-1}\). This maximum source rate will give us a minimum number of blocks we are from the gym.

At the other end, people can’t be leaving the gym less often than the twice the observed rate \(\lambda'\). So we can set the minimum of \(\lambda\) to \(\lambda_{\text{min}} = 0.2\text{ min}^{-1}\). This minimum source rate will give us a maximum number of blocks we are from the gym.

Inserting these values into Eq.(7), we find that I was between \(1.3 \) and \(2.1\) blocks, so about \(1\) to \(2\) blocks away.

Validation

Of course the easiest way to check how far you are from the gym is to look at a map. After working through this analysis, I checked google maps for gyms close to my coffee shop. It turns out that there were four (!) different gyms within two blocks of the coffee shop. The multiplicity of gyms was something I didn’t consider in the analysis, but it could be accounted for with a different parameter.

Still, the actual result of “a gym is less than two blocks away” conforms to the estimate above.

However, thought it is reassuring that the analysis produced a result that reflects, this seeming confirmation does not strongly validate the analysis. When doing theoretical work one must always be wary of fooling oneself through coincidental affirmations of tortured models.

The truth is that this problem gets easier to solve the closer the coffee shop (or observer) is to the gym (or source). For example, using the non-path corrected result Eq.(1) yields basically the same answer.

The validity of Eq.(5) would be more convincingly established if I were about 10 blocks away and observed a rate consistent with its prediction, but such a controlled experiment is slightly beyond my interest to conduct. Instead, it would be simpler to set up a simulation of people conducting a random walk within a grid from a random location and to then see if the Eq.(7) can consistently predict the distance the observer is from the source.

Such a computational perspective on this work is interesting, but will have to wait for another time.

Gym in a random location

Footnotes

[1] Why isn't the speed of the walkers relevant here? Intuitively, the walkers could leave at a specific rate but if they move very slowly then shouldn't they be observed at a rate that takes into account that speed?

Consider the one dimensional case, where people leave a source at a rate \(\lambda\) and then start walking with velocity \(v\) in a particular direction. What is the rate at which the people are seen at a distant point from the source?

The time between people leaving the source is \(1/\lambda\) so the distance between them is \(v/\lambda\). This distance between people remains the same even as they travel away from the source. When we see these people at at large distance away from the source, the time between we see each person is the distance between them divided by their speed, that is, \(v/\lambda\) divided by \(v\) which is \(1/\lambda\). Therefore, even when we're far from the source, the observed rate is just \(\lambda\) which is independent of velocity. So as long as the people have non-zero velocity, the velocity is not important in defining the rate at which they're observed. This argument generalizes to higher dimensions except now we need to include probabilities of going in particular directions.

[2] Since there are so many more longer paths than shorter ones, it is possible that the net contributions of the longer paths rivals that of shorter paths. We will assume this isn't the case, but it is worth validating through a calculation

[3] To be extra precise, we should multiply this result by \(4\) to account for the fact that the gym could be in any of the four "quadrants" relative to the coffee shop (assumed to be at the origin), and the shortest path formula only considers a single quadrant. However, we would also just divide the final result by \(4\) to account for the equal probabilities of occurrence of these situations, so the result would be the same.