Using LINQ to list all duplicates
Warning: Please consider that this post is over 12 years old and the content may no longer be relevant.
There’s plenty of examples on how to find duplicates using LINQ’s GroupBy method, but usually they use a projection to return a new object, like this:
_filteredSubmissions = (from s in _filteredSubmissions
group s by s.Email
into g
where g.Count() > 1
select new { Emails = g.Key, DuplicateCount = g.Count() }
Which will just return any email addresses that are duplicates and the count. But what if you want to list all the original items that are duplicates? If you just ‘select g’ in the example above you’ll end up with an IEnumerable<IEnumerable
_filteredSubmissions = (from s in _filteredSubmissions
group s by s.Email
into g
where g.Count() > 1
select g).SelectMany(g => g)
So now I want to find Submissions that have the same first and last names, I use an anonymous type.
_filteredSubmissions = (from s in _filteredSubmissions
group s by new { s.FirstName, s.LastName }
into g
where g.Count() > 1
select g).SelectMany(g => g)
And finally I want to find all Submissions that have the same first name and last name OR the same email. Simply use Union(), this will join the two collections and remove any duplicates (that’s duplicate instances of the classes, not to be confused with the duplicates we’re trying to find).
_filteredSubmissions = (from s in _filteredSubmissions
group s by new { s.FirstName, s.LastName }
into g
where g.Count() > 1
select g).SelectMany(g => g)
.Union(
(from s in _filteredSubmissions
where !string.IsNullOrWhiteSpace(s.Email)
group s by s.Email
into g
where g.Count() > 1
select g).SelectMany(g => g)
);
Note that I also stuck in a check for blank emails, I don’t consider a Submission the same just because they both have no email address, I didn’t do this on the FirstName or LastName fields because I know they are mandatory in my system.