**Question**: Given an array containing only 0's and 1's, find the maximum contiguous subarray containing equal no of 0's and 1's.

**Complexity**: O(n).

**Source**: Asked to me by Rajesh Kamineni

**My Solution**:

Call the original array 'A'. Let it's size be 'n'.

Create 2 arrays 'B' and 'C'of size '2n + 1' each. Initialize each to -1 everywhere.

Then, do the following [2]:

Here's what the above procedure does :

Let us denote, (No. of 1s in A before index i) - (No. of 0s in A before index i) by #(i).

Note that when we enter the i-th iteration, diff = (No. of 1s in A before index i) - (No. of 0s in A before index i) = #i

Also note that the value of diff can range from '-n' to 'n'[1].

As a result, we have

B[j] = the smallest index 'i' in the array A such that, #(i) = j - n,

C[j] = the largest index 'i' in the array A such that, #(i) = j - n,

i.e. #(B[j]) = #(C[j])

We need the largest sub-array of A with equal number of 1s and 0s.

Suppose the required sub-array is from index 'a' to index 'b' of A (with elements at index 'a' and index 'b' included).

Then, clearly,

#(b+1) = #(a) (Do you see why?)

And our goal is to maximize (b+1) - (a).

Therefore, we simply have to find the index 'j' such that C[j] - B[j] is maximum. This can done in O(n), in a single pass.

Once the 'j' is found, the required sub-array is from index 'B[j]' to index 'C[j] - 1' of A.

**Notes**:

[1] - One may further note that diff can take at most n+1 different values, and we don't explicitly require the actual value of diff. So the arrays B and C can be of size 'n + 1' each and the B[diff] can be replaced by B[diff%n].

(Do you see why?)

[2] - Optimization suggested by Vinod :

Note that 'i' increases from '0' to 'n' as the loop is executed. As a result, the value of 'B[diff + n]' needs to be set only once, for the lowest 'i'. Similarly, subsequent values of 'i' are always greater than 'C[diff + n]', so C[diff + n] needs to be set every time.

The check 'i < n' can also be avoided by increasing the size of A to 'n + 1' elements. It does matter what the value of A[n] is, because the final value of diff is never used.

Here is the optimized code :

Note that 'i' increases from '0' to 'n' as the loop is executed. As a result, the value of 'B[diff + n]' needs to be set only once, for the lowest 'i'. Similarly, subsequent values of 'i' are always greater than 'C[diff + n]', so C[diff + n] needs to be set every time.

The check 'i < n' can also be avoided by increasing the size of A to 'n + 1' elements. It does matter what the value of A[n] is, because the final value of diff is never used.

Here is the optimized code :

Feel free to post alternate solutions, extensions, comments, clarifications, optimizations, and mistakes (if any) as comments.

I suggest a small optimization though it doesn't matter much.The two if else blocks can be merged into one if-else block given below.

ReplyDeleteif(B[diff+n] == -1)B[diff+n] = i;

else C[diff+n] = i;

the reason is i is increasing over 0 to n so in else part of first block i is always greater than B[diff+n] and similar reasoning for else of second block.

You are right. Thanks!

ReplyDelete