Skip to main content

Posts

The Internals of SHA 256 Algorithm

 I came across this excellent video, explaining how the SHA-256 algorithm works internally. I had always been excited to understand how a real-world hashing algorithm works internally. SHA-256 is one of the most popular hashing algorithms, so after getting my hands on that video, I knew I had to try and implement this on my own. This feels like once in a lifetime opportunity. So here I am after having implemented the algorithm on my own and feeling confident understanding how SHA-256 works. I want to pass on that same confidence to the readers of this blog (probably including me in a few months :D) along with code snippets. So, lets dive right into it... Basic Operations There are some basic operations that the SHA-256 algorithm builds on top of. Here is a list of those: 1. Shift Right  - Shifts the bits of the original number pos bits to the right, dropping of things that slide of the end. 2. Rotate Right  - Rotates pos bits in the original number. Very similar to shift but instead o
Recent posts

Consistent Hashing

 I have been meaning to write this blog for some time now but somehow kept on postponing (read procrastinating...). I read about this technique of Consistent Hashing a while back and was mesmerized by its elegance. Recently read it again in this brilliant blog and thought of expressing it in my own words for posterity. So let us begin. Hashing I won't talk too much about hashing since it is a very basic computer science concept. In a word, it means mapping an object to another object. Or more generally, mapping a key to a value where their types don't matter. Mostly the mapping is from a string to int.  There could be multiple different hash functions that can exist which randomize the how the keys are hashed to the values. We are going to consider a simple use case where let's say you have N different servers and there are multiple keys that you want to distribute among those servers. How do you do it? Well, a simple strategy is that you have a function which applies some

Book Review: Atomic Habits

 I recently completed this wonderful book by James Clear,  Atomic Habits . More than a book review, this is going to be a compilation of the learnings that I have from this book. The book is such a vast treasure chest of information that it is very easy to forget all of it. Even writing it down is probably not enough. The actual value of the book will come from the application of the concepts it teaches in our everyday life. But, to start things off, here goes my learnings from this great book. Probably the strongest point which the book tries to drive home is that very small changes in your habit (atomic habits) can bring remarkable improvements in your life. This is also the tag line of the book: Tiny Changes, Remarkable Results. How true this is will only be confirmed after we apply the laws stated in this book. The author suggests that instead of trying to set some goals and achieve it, we should try to make very small changes in our mindset/system. We should not focus on what we w

Streaming Algorithms - Lossy Counting

We had a discussion recently in our office on one of the famous  streaming algorithms  known as lossy count. I decided to write down my understanding of it focusing on the space complexity discussion and proving why it works in logarithmic space. This  is the original paper introducing this algorithm and most of the information discussed in this blog is (hopefully) a simplified version of this paper relating to lossy count. Problem Statement There are use cases where you may be receiving a continuous stream of data and you want to know which items in that data stream exceed a certain threshold. The data stream could be really big so you can't store it in memory. Moreover you want to answer queries in real time. So here are the challenges this problem imposes: 1. Need to do it in single pass (time constraint) 2. Limited memory 3. Large volume of data in real-time The above issues warrant the need of a smart counting algorithm. Data stream mining to identify events & patterns can

Zookeeper Internals

I had been wanting to understand the internals of zookeeper for quite some time now. In fact, I had already read the basics a couple of times and even worked with the zkcli in my previous organization. However, as it is said that anything written down is more firmly impressed on the mind, hence I am writing this post. (This also has a reference to one of my favorite magician - Michael Vincent ) I recently presented this topic in my current organization and I thought now would be a good time to write this blog. I can accompany it with the slides I had used there which would hopefully make things more clear. Although, I would like to start off with a disclaimer that I am just a beginner in the study of zookeeper and there are multiple things I don't know/understand about it yet. Still I hope at least some of you would find this blog interesting and maybe learn something new from it. Enough with the chit-chat, lets begin! What is Zookeeper? Zookeeper as the name suggests is literally

Some Mysteries of Multi-Threading

 If you have been programming professionally for some time, you probably would have used threads. When used correctly, threads can provide significant speed up in the program. I had also written multithreaded programs many times at Sumo Logic. However, recently I felt that since threading is such an important topic, I should probably dive a little deeper into how things work there. I started off by reading Java Concurrency in Practice  and although I have gone through only a couple of chapters right now, but still I have discovered some interesting things which I was not aware of before. The thing which makes multi-threading hard is that you may write a multi-threaded program which works correctly almost all the time but it may still be wrong. It may have some race condition which happens very rarely. The problem is that it might happen in production and cause an outage and you will be clueless about the issue since you have never seen it before. The only way to write correct multi-thr

Euclids GCD Algorithm

 This post is going to focus on a very simple concept, finding the GCD of two numbers. If you think this is too easy and we don't need a post on it, well you are probably right :D. You might even be knowing of efficient ways of finding the GCD (mostly used in competitive programming). But many of you may not be aware of the mathematics and proof behind how that method works. If that is the case, you are not alone. I was in the same boat for many years. I have used that 3 line GCD method many times in solving programming questions never thinking deeply about why it works. There is no issue in this. After all there is a good reason why abstraction is so popular. But I do believe that when you get a chance there is no harm in diving deep. You may get to learn a few things and as always be fascinated by what math has to offer even in the simplest of concepts. So lets begin... Context Now we all know one way at least of finding the GCD of two numbers. Find all the factors of both the n