2017-03-18

Removing Duplicates from an Array (Using C#)

csharp, tutorial

banner

Featured Image - "removed.jpg" by Don Crowley, used under BY SA

I've experimented a few different ways to remove duplicate items in an array.

An array could be either ordered or not.

There are many solutions to this problem and I will show two different ways for "unordered" array, and one for the ordered one.

Let's take care of the easy one first.

Case 1 - Removing duplicates from an ordered array.

Supposed that you are given an array, 5.

Since an array is ordered, you can simply check if a next item is same as the current item being checked against. Here is the function, which implements the algorithm.

1private static int RemoveDuplicatesByCheckingPreviousElement(int a)2{3	List<int> result = new List<int>();4	for (int i = 0; i < a.Length - 1; i++)5	{6		if (a[i] != a[i + 1])7		{8			result.Add(a[i]);9		}10	}11
12	result.Add(a[a.Length - 1]);13	return result.ToArray();14}

You only add the current item, if the next one is not the same as the current one. E.g.)

E.g.) While on the 3rd item, "2", then the next item is "3", so we know that we have passed through all possible duplicates since the array is sorted. We need to add the last item to the result because the last one doesn't have the next element to check for a duplicate (line #12).

You can find more details on this YouTube video, How to remove duplicates from a sorted array in C/C++. The source is in C++ but I translated it to C#.

Case 2.1 - Removing duplicates from an unordered array using a set.

By definition, a set doesn't allow duplicate value. So in .NET, instantiating a HashSet object by passing an array would have removed duplicates automatically. But the point of this exercise is to do so manually. I am using HashSet because it has an O(1) lookup time while an array has an O(N) time complexity.

The algorithm is simple. While going through each item in the array, if the current item has not been seen, we add it to the set as demonstrated in following code snippet.

1private static int RemoveDuplicatesUsingHashTable(int a)2{3HashSet<int> alreadySeen = new HashSet<int>();4foreach (int item in a)5{6if (!alreadySeen.Contains(item))7alreadySeen.Add(item);8}9return alreadySeen.ToArray();10}

You can find details on this YouTube video, [Interview Question] Duplicate Integers in Array.

Case 2.2 - Removing duplicates from an unordered array by sorting it and check for next element for duplicates.

The algorithm is to sort the current array and apply case 1 algorithm.

1private static int RemoveDuplicatesBySortingFirst(int a)2{3	Array.Sort(a);4	return RemoveDuplicatesByCheckingPreviousElement(a);5}

Note that RemoveDuplicatesByCheckingPreviousElement is the function name from case 1.

Sorting the array takes O(N log N) time and case 1, O(N), which results in O(N) total time complexity.

Conclusion

When you were asked to solve this problem in an interview, ask the interviewer if the array is sorted or not. Use any algorithms above depending on the context;

If the array is ordered, then use case 1 algorithm, else either 2.1 or 2.2. But be aware that case 2.2 returns an array in different order from the original one.

E.g.)  Given an array a = {8, 1, 1, 3, 2, 2},

RemoveDuplicatesBySortingFirst(a)

Above function returns 8, not 2.

Full working source code can be found on GitHub.