Strong components in directed graphs are components in which all vertices are strongly connected to each other. 2 vertices A and B are strongly connected if there is a directed path from A->B and B->A.
Need for computing strongly connected components
Computing strong components for graphs has been a need in computation since many years now. Biologists use the strong components to find out sub-ecosystems in a food chain. During the early advent of the internet, this became an absolute necessity as dependency graphs of various software modules needed to be processed and their strong components extracted, to know which modules could be packed together. Especially today, with traffic routing problems and due to the presence of one-ways, directed graphs and operations on them have been studied by engineers working in many top tier companies.
Many years since the advent of computation people had been wondering how to find the connected components in a directed graph(efficiently). So the problem that they faced in simple words was : Given a graph G with V vertices and E directed edges, obtain the strong components of a graph. At first sight this seems very hard and many seem to think of the brute force approach, i.e. take all pairs of vertices and check for strong connections. And indeed this was what was being done prior to 1972 when the breakthrough occurred and the algorithm was named for its inventor, Robert Tarjan(he’s still alive). And a huge surprise was that the solution was linear!
How the algorithm works
The algorithm assigns what is called ‘low-link values’ to all nodes(vertices) in a graph. When the algorithm is done all vertices of a strong component will belong to a strong component. Also the assignment of low-link values to a node if done during the DFS to get that optimised O(V+E) linear complexity.
These are the main steps in Tarjan’s Algorithm:
- A DFS is run from any node.
- A stack of valid nodes from which to update low-link values from, is maintained during the DFS. This stack basically has all nodes that are present in the current branch of the DFS.
- Nodes are added to the stack as they are explored for the first time.
- Nodes are removed from the stack if any path in the DFS ends in an already visited node say V, in this case elements are popped from the stack till the element that was visited again(V). And all these popped nodes along with V, are put in the same strong component, i.e. , their low link values are made the same as the revisited node.
- If the DFS hits a visited node but which is not in the stack then nothing is done.
A pictorial representation
In the above diagram we start the DFS from the node coloured orange and give it a low-link value of 0.
We then proceed with the DFS, until and assign a low-link value of 1 to the neighbour of the root.
We continue giving unique low-link ids to new nodes till node 2.
We now hit an already visited node(node 0). So we make the low-link value of all these nodes as 0. We then pop this connected component out of the stack. These are a connected component.
Next we pick another node(part of the DFS).
We do these steps recursively till we obtain more strong components of the graph.
The graph finally looks like this.
The time complexity of this algorithm is O(V+E) as all it does is basically run a DFS. However this asymptotic analysis assumes that looking up a vertex in a stack can be done in constant time. In reality this may not be the case and thus the time complexity would be greater. The space complexity is linear as it makes use of additional space for the stack.
The pseudocode is provided below.