March 18, 2017

* Featured Image – “removed.jpg” by Don Crowley, used under BY SA

I’ve experimented a few different ways to remove duplicate items in an array.

An array could be either ordered or not.

There are many solutions to this problem and I will show two different ways for “unordered” array, and one for the ordered one.

Let’s take care of the easy one first.

Supposed that you are given an array, *{1, 2, 2, 3, 4, 4, 4, 5}*.

Since an array is ordered, you can simply check if a next item is same as the current item being checked against. Here is the function, which implements the algorithm.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
private static int[] RemoveDuplicatesByCheckingPreviousElement(int[] a) { List<int> result = new List<int>(); for (int i = 0; i < a.Length - 1; i++) { if (a[i] != a[i + 1]) { result.Add(a[i]); } } result.Add(a[a.Length - 1]); return result.ToArray(); } |

You only add the current item, if the next one is not the same as the current one. E.g.)

E.g.) While on the 3rd item, “2”, then the next item is “3”, so we know that we have passed through all possible duplicates since the array is sorted.

We need to add the last item to the result because the last one doesn’t have the next element to check for a duplicate (line #12).

You can find more details on this YouTube video, How to remove duplicates from a sorted array in C/C++. The source is in C++ but I translated it to C#.

By definition, a set doesn’t allow duplicate value. So in .NET, instantiating a HashSet object by passing an array would have removed duplicates automatically.

But the point of this exercise is to do so manually. I am using HashSet because it has an O(1) lookup time while an array has an O(N) time complexity.

The algorithm is simple. While going through each item in the array, if the current item has not been seen, we add it to the set as demonstrated in following code snippet.

1 2 3 4 5 6 7 8 9 10 |
private static int[] RemoveDuplicatesUsingHashTable(int[] a) { HashSet<int> alreadySeen = new HashSet<int>(); foreach (int item in a) { if (!alreadySeen.Contains(item)) alreadySeen.Add(item); } return alreadySeen.ToArray(); } |

You can find details on this YouTube video, [Interview Question] Duplicate Integers in Array.

The algorithm is to sort the current array and apply case 1 algorithm.

1 2 3 4 5 |
private static int[] RemoveDuplicatesBySortingFirst(int[] a) { Array.Sort(a); return RemoveDuplicatesByCheckingPreviousElement(a); } |

Note that `RemoveDuplicatesByCheckingPreviousElement`

is the function name from case 1.

Sorting the array takes O(N log N) time and case 1, O(N), which results in O(N) total time complexity.

When you were asked to solve this problem in an interview, ask the interviewer if the array is sorted or not. Use any algorithms above depending on the context;

If the array is ordered, then use case 1 algorithm, else either 2.1 or 2.2. But be aware that case 2.2 returns an array in different order from the original one.

E.g.) Given an array `a = {8, 1, 1, 3, 2, 2}`

,

1 |
RemoveDuplicatesBySortingFirst(a) |

Above function returns {1, 2, 3, 8}, not {8, 1, 3, 2}.

Full working source code can be found on GitHub.