Skip to main content

Detecting starting point of loop in a linked list

Detecting the starting point of a loop in a linked list is a very popular problem. In fact most you reading this might already be knowing the algorithm to solve this. However understanding why that algorithm works is a separate challenge altogether.

I was in the same state and decided to find an explanation of why it works. I could not find convincing explanations by simple google searches and hence decided to right this blog.
Hopefully, this will satisfy the curiosity of people trying to understand the reasoning behind this algorithm.

I believe coming up with this algorithm in the first place is an even bigger challenge. You will have to stay with this problem a lot longer if you want to do that. For now let's assume we have the algorithm and let's try to prove why it should work.

The Algorithm

The key insight of the algorithm is that we need to maintain two pointers: Slow and Fast,

Slow pointer: Moves one node at a time.
Fast pointer: Moves two nodes at a time.

The first thing to realise is that these pointers are definitely going to meet at some position in the loop. To convince yourself, consider this:
* Both the pointers are going to enter the loop at some point.
* The fast pointer will catch up with the slow pointer one step at a time (since it moves at double the speed)




Above diagram shows an example of a loop.

Now we know that the pointers are going to meet at some point in the loop.

The second step of the algorithm says that bring one of the pointers to the start of the list. 
Now move both the pointers (one at the start and one at the meeting point in the loop) one step at a time. The next time they meet will be the starting point of the list.

Let's try to prove this algorithm.

Proof

We need to consider certain variables to mathematically formulate the problem.

m - number of links a pointer has to travel to reach the starting point of the loop
l - the number of links a pointer has to travel to complete one cycle of the loop
k - the number of links a pointer has to travel from the starting point of the loop to reach the meeting point of the nodes in the first step of the algorithm.

Now that we have defined these variables, let us consider the following:

Distance travelled by the slow pointer to reach the meeting point = m + p*l + k

m is the distance travelled to reach the starting of the loop. Let's say it makes p rounds of the loop (notice that p can be zero) and finally it travels k distance from the loop start to reach the meeting point.


Distance travelled by the slow pointer to reach the meeting point = m + q*l + k

This is similar to the slow pointer except that the fast pointer makes q rounds of the loop. Since it travels at twice the speed hence q > p.

Now we know that, distance travelled by fast pointer = 2 * distance travelled by slow pointer

So, m + q*l + k = 2*(m + p*l + k) => (m + k) = (q - 2*p)*l

What this means is that (m + k) is an integral multiple of the length of the loop or in other words, (m+k) % l = 0. So if you travel a distance of (m+k) in the loop, you will reach the start of the loop.

Now we can consider the second part of the algorithm. We bring one pointer to the start of the list. 
Now when this pointer takes m steps, it reaches the start of the loop.
How much distance would the second pointer have covered by this time? It would be m+k. 
Why? Since it was already at an offset k after the first step of the algorithm and then it travels m more steps.

Now we know that (m+k) is a multiple of l. That means the second pointer has reached the start of the loop and so has the first pointer. This implies that both the pointer will meet at the beginning of the loop after the second step of the algorithm. Hence the algorithm works.

Comments

Popular posts from this blog

Streaming Algorithms - Lossy Counting

We had a discussion recently in our office on one of the famous  streaming algorithms  known as lossy count. I decided to write down my understanding of it focusing on the space complexity discussion and proving why it works in logarithmic space. This  is the original paper introducing this algorithm and most of the information discussed in this blog is (hopefully) a simplified version of this paper relating to lossy count. Problem Statement There are use cases where you may be receiving a continuous stream of data and you want to know which items in that data stream exceed a certain threshold. The data stream could be really big so you can't store it in memory. Moreover you want to answer queries in real time. So here are the challenges this problem imposes: 1. Need to do it in single pass (time constraint) 2. Limited memory 3. Large volume of data in real-time The above issues warrant the need of a smart counting algorithm. Data stream mining to identify events & patterns can

RSA Encryption System

THE NEED OF ENCRYPTION In today's world where a lot of secured information like our credit card number, passwords etc., travel around the web, the presence of a secure encryption system is almost inevitable. We want a method to encrypt a message, send it over the insecure internet connection and allow only the receiver to be able to decrypt and read the original message. This exact problem was solved by R ivest, S hamir and A dleman(RSA) in the year 1978, in their paper A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. In this post, I will try to explain the method they adopted to create a secure encryption system. PUBLIC KEY CRYPTOSYSTEM A public key cryptosystem has the following properties: 1) D(E(M)) = M, if we decrypt an encrypted message, we should get back the message. 2) Both E and D should be easy to compute, if it takes days to encrypt or decrypt the message, then the cryptosystem is not very useful. 3) By publicly revea

Consistent Hashing

 I have been meaning to write this blog for some time now but somehow kept on postponing (read procrastinating...). I read about this technique of Consistent Hashing a while back and was mesmerized by its elegance. Recently read it again in this brilliant blog and thought of expressing it in my own words for posterity. So let us begin. Hashing I won't talk too much about hashing since it is a very basic computer science concept. In a word, it means mapping an object to another object. Or more generally, mapping a key to a value where their types don't matter. Mostly the mapping is from a string to int.  There could be multiple different hash functions that can exist which randomize the how the keys are hashed to the values. We are going to consider a simple use case where let's say you have N different servers and there are multiple keys that you want to distribute among those servers. How do you do it? Well, a simple strategy is that you have a function which applies some