Skip to main content

Detecting starting point of loop in a linked list

Detecting the starting point of a loop in a linked list is a very popular problem. In fact most you reading this might already be knowing the algorithm to solve this. However understanding why that algorithm works is a separate challenge altogether.

I was in the same state and decided to find an explanation of why it works. I could not find convincing explanations by simple google searches and hence decided to right this blog.
Hopefully, this will satisfy the curiosity of people trying to understand the reasoning behind this algorithm.

I believe coming up with this algorithm in the first place is an even bigger challenge. You will have to stay with this problem a lot longer if you want to do that. For now let's assume we have the algorithm and let's try to prove why it should work.

The Algorithm

The key insight of the algorithm is that we need to maintain two pointers: Slow and Fast,

Slow pointer: Moves one node at a time.
Fast pointer: Moves two nodes at a time.

The first thing to realise is that these pointers are definitely going to meet at some position in the loop. To convince yourself, consider this:
* Both the pointers are going to enter the loop at some point.
* The fast pointer will catch up with the slow pointer one step at a time (since it moves at double the speed)




Above diagram shows an example of a loop.

Now we know that the pointers are going to meet at some point in the loop.

The second step of the algorithm says that bring one of the pointers to the start of the list. 
Now move both the pointers (one at the start and one at the meeting point in the loop) one step at a time. The next time they meet will be the starting point of the list.

Let's try to prove this algorithm.

Proof

We need to consider certain variables to mathematically formulate the problem.

m - number of links a pointer has to travel to reach the starting point of the loop
l - the number of links a pointer has to travel to complete one cycle of the loop
k - the number of links a pointer has to travel from the starting point of the loop to reach the meeting point of the nodes in the first step of the algorithm.

Now that we have defined these variables, let us consider the following:

Distance travelled by the slow pointer to reach the meeting point = m + p*l + k

m is the distance travelled to reach the starting of the loop. Let's say it makes p rounds of the loop (notice that p can be zero) and finally it travels k distance from the loop start to reach the meeting point.


Distance travelled by the slow pointer to reach the meeting point = m + q*l + k

This is similar to the slow pointer except that the fast pointer makes q rounds of the loop. Since it travels at twice the speed hence q > p.

Now we know that, distance travelled by fast pointer = 2 * distance travelled by slow pointer

So, m + q*l + k = 2*(m + p*l + k) => (m + k) = (q - 2*p)*l

What this means is that (m + k) is an integral multiple of the length of the loop or in other words, (m+k) % l = 0. So if you travel a distance of (m+k) in the loop, you will reach the start of the loop.

Now we can consider the second part of the algorithm. We bring one pointer to the start of the list. 
Now when this pointer takes m steps, it reaches the start of the loop.
How much distance would the second pointer have covered by this time? It would be m+k. 
Why? Since it was already at an offset k after the first step of the algorithm and then it travels m more steps.

Now we know that (m+k) is a multiple of l. That means the second pointer has reached the start of the loop and so has the first pointer. This implies that both the pointer will meet at the beginning of the loop after the second step of the algorithm. Hence the algorithm works.

Comments

Popular posts from this blog

Streaming Algorithms - Lossy Counting

We had a discussion recently in our office on one of the famous  streaming algorithms  known as lossy count. I decided to write down my understanding of it focusing on the space complexity discussion and proving why it works in logarithmic space. This  is the original paper introducing this algorithm and most of the information discussed in this blog is (hopefully) a simplified version of this paper relating to lossy count. Problem Statement There are use cases where you may be receiving a continuous stream of data and you want to know which items in that data stream exceed a certain threshold. The data stream could be really big so you can't store it in memory. Moreover you want to answer queries in real time. So here are the challenges this problem imposes: 1. Need to do it in single pass (time constraint) 2. Limited memory 3. Large volume of data in real-time The above issues warrant the need of a smart counting algorithm. Data stream mining to identify events & pa...

Chinese remainder theorem

In this post I would like to talk about the Chinese Remainder Theorem. You might have heard this problem as a kid: There are x number of things. When taken in groups of 5, they leave a remainder 1. When taken in groups of 7, they leave a remainder 2. When taken in groups of 9, they leave a remainder 3. And when taken in groups of 11, they leave a remainder 4. Find the value of x. We will see how such problems can be solved using the Chinese Remainder Theorem(CRT). LINEAR CONGRUENCES Lets talk about linear congruences. You might have already encountered them informally in my previous post on the RSA encryption system. Here I would like to introduce it a bit more formally. ax ≡ b (mod n) Abusing the use of notation, we can express this equation into a form which might look more familiar, i.e ax%n = b. In other words, the integer ax when divided by n leaves a remainder equal to b. Lets say that we find a solution to the above equation, say x = x 0 . => ax 0 ≡ b (...

Maximum Flow

PROBLEM DESCRIPTION The problem of finding the maximum flow in a network, has a history which dates back to the second world war. Practically the maximum flow problem can be stated as:  “A list of pipes is given, with different flow-capacities. These pipes are connected at their endpoints. What is the maximum amount of water that you can route from a given starting point to a given ending point?” or equivalently “A company owns a factory located in city X where products are manufactured that need to be transported to the distribution center in city Y. You are given the one-way roads that connect pairs of cities in the country, and the maximum number of trucks that can drive along each road. What is the maximum number of trucks that the company can send to the distribution center?” Formally stated the problem is: We are given a network - a directed graph, in which every edge has a certain capacity c associated with it, a starting vertex (called the source), and an ending...