--- id: sorting-old title: "Sorting with Custom Comparators Pt 1" author: Darren Yao, Siyong Huang, Michael Cao, Benjamin Qi prerequisites: - Bronze - Pairs & Tuples in C++ - Bronze - Introduction to Data Structures description: Both Java and C++ have built-in functions for sorting. However, if we use custom objects, or if we want to sort elements in a different order, then we'll need to use a custom comparator. --- import { Problem } from "../models"; export const metadata = { problems: { sample: [ new Problem("Silver", "Wormhole Sort", "992", "Normal", false, [], ""), ], general: [ new Problem("CSES", "Restaurant Customers", "1619", "Easy", false, [], "sort endpoints of intervals"), new Problem("Silver", "Lifeguards", "786", "Easy", false, [], "sort endpoints of intervals"), new Problem("Silver", "Rental Service", "787", "Easy", false, [], ""), new Problem("Silver", "Mountains", "896", "Easy", false, [], ""), new Problem("Silver", "Mooyo Mooyo", "860", "Easy", false, [], "Not a sorting problem, but you can use sorting to simulate gravity. - Write a custom comparator which puts zeroes at the front and use `stable_sort` to keep the relative order of other elements the same."), new Problem("Silver", "Meetings", "967", "Very Hard", false, [], ""), ], } }; ## Example: Wormhole Sort There are multiple ways to solve this problem. We won't discuss the full solution here, but all of them start by sorting the edges in nondecreasing order of weight. With C++, the easiest method is to use nested pairs. ```cpp #include using namespace std; #define f first #define s second int main() { int M = 4; vector>> v; for (int i = 0; i < M; ++i) { int a,b,w; cin >> a >> b >> w; v.push_back({w,{a,b}}); } sort(begin(v),end(v)); for (auto e: v) cout << e.s.f << " " << e.s.s << " " << e.f << "\n"; } ``` But what if the built-in comparison function for pairs didn't exist? - If we only stored the edge weights and sorted them, we would have a sorted list of edge weights, but it would be impossible to tell which weights corresponded to which edges. - However, if we create a **class** (or struct) representing the edges and define a **custom comparator** to sort them by weight, we can sort the edges in ascending order while also keeping track of their endpoints. ## Classes First, we need to define a **class** that represents what we want to sort (or a [`struct`](http://www.cplusplus.com/doc/tutorial/structures/) in C++, which is the same as a `class` in C++ but all members are public by default). In our example we will define a class `Person` that contains a person's height and weight, and sort in ascending order by height. ```cpp struct Person { int height, weight; Person (int h, int w) { height = h; weight = w; } }; int main() { Person p; p.height = 60; // assigns 60 to the height of p p.weight = 100; // assigns 100 to the weight of p } ``` ```java static class Person { int height, weight; public Person (int h, int w) { height = h; weight = w; } } ``` ## Comparators Normally, sorting functions rely on moving objects with a lower value in front of objects with a higher value if sorting in ascending order, and vice versa if in descending order. This is done through comparing two objects at a time. ### C++ What a comparator does is compare two objects as follows, based on our comparison criteria: - If object $x$ is less than object $y$, return `true` - If object $x$ is greater than or equal to object $y$, return `false` Essentially, the comparator determines whether object $x$ belongs to the left of object $y$ in a sorted ordering. A comparator **must** return false for two identical objects (not doing so results in undefined behavior and potentially a runtime error). In addition to returning the correct answer, comparators should also satisfy the following conditions: - The function must be consistent with respect to reversing the order of the arguments: if $x \neq y$ and `compare(x, y)`is `true`, then `compare(y, x)` should be `false` and vice versa. - The function must be transitive. If `compare(x, y)` is true and `compare(y, z)` is true, then `compare(x, z)` should also be true. If the first two compare functions both return `false`, the third must also return `false`. ### Java What a `Comparator` does is compare two objects as follows, based on our comparison criteria: - If object $x$ is less than object $y$, return a negative number. - If object $x$ is greater than object $y$, return a positive number. - If object $x$ is equal to object $y$, return 0. In addition to returning the correct number, comparators should also satisfy the following conditions: - The function must be consistent with respect to reversing the order of the arguments: if `compare(x, y)` is positive, then `compare(y, x)` should be negative and vice versa. - The function must be transitive. If `compare(x, y) > 0` and `compare(y, z) > 0`, then `compare(x, z) > 0`. Same applies if the compare functions return negative numbers. - Equality must be consistent. If `compare(x, y) = 0`, then `compare(x, z)` and `compare(y, z)` must both be positive, both negative, or both zero. Note that they don't have to be equal, they just need to have the same sign. Java has default functions for comparing `int`, `long`, `double` types. The `Integer.compare()`, `Long.compare()`, and `Double.compare()` functions take two arguments $x$ and $y$ and compare them as described above. ## C++ ### Method 1: Operator < [StackOverflow: Why const T&?](https://stackoverflow.com/questions/11805322/why-should-i-use-const-t-instead-of-const-t-or-t) - Pro: - This is the easiest to implement - Easy to work with STL - Con: - Only works for objects (not primitives) - Only supports two types of comparisons (less than (<) and greater than (>)) ```cpp #include using namespace std; int randint(int low, int high) {return low+rand()%(high-low);} struct Foo { int Bar; Foo(int _Bar=-1):Bar(_Bar){} bool operator < (const Foo& foo2) const {return Bar < foo2.Bar;} }; const int N = 8; int main() { srand(69); Foo a[N]; for(int i=0;i using namespace std; struct Edge { int a,b,w; friend bool operator<(const Edge& x, const Edge& y) { return x.w < y.w; } }; // a different way to write less than int main() { int M = 4; vector v; for (int i = 0; i < M; ++i) { int a,b,w; cin >> a >> b >> w; v.push_back({a,b,w}); } sort(begin(v),end(v)); for (Edge e: v) cout << e.a << " " << e.b << " " << e.w << "\n"; } /* Input: 1 2 9 1 3 7 2 3 10 2 4 3 */ /* Output: 2 4 3 1 3 7 1 2 9 2 3 10 */ ``` ### Method 2: Function Outside Class Let's say we have an array `Person arr[N]`. To sort the array, we need to make custom comparator which will be a function, and then pass the function as a parameter into the build-in sort function: ```cpp bool cmp(Person a, Person b) { return a.height < b.height; } int main() { sort(arr, arr+N, cmp); // sorts the array in ascending order by height } ``` If we instead wanted to sort in descending order, this is also very simple. Instead of the `cmp` function returning `return a.height < b.height;`, it should do `return a.height > b.height;`. - Pro: - Works for both objects and primitives - Supports many different comparators for the same object - Con: - More difficult to implement - Extra care needs to be taken to support STL We can also use [lambda expressions](https://www.geeksforgeeks.org/lambda-expression-in-c/) in C++11 or above. ```cpp #include using namespace std; int randint(int low, int high) {return low+rand()%(high-low);} struct Foo { int Bar; Foo(int _Bar=-1):Bar(_Bar){} }; const int N = 8; Foo a[N]; bool cmp1(Foo foo1, Foo foo2) {return foo1.Bar < foo2.Bar;} function cmp2 = [](Foo foo1, Foo foo2) {return foo1.Bar < foo2.Bar;}; // lambda expression // bool(Foo,Foo) means that the function takes in two parameters of type Foo and returns bool // "function"" can be replaced with "auto" int main() { srand(69); printf("--- Method 1 ---\n"); for(int i=0;i using namespace std; struct Edge { int a,b,w; }; int main() { int M = 4; vector v; for (int i = 0; i < M; ++i) { int a,b,w; cin >> a >> b >> w; v.push_back({a,b,w}); } sort(begin(v),end(v),[](const Edge& x, const Edge& y) { return x.w < y.w; }); for (Edge e: v) cout << e.a << " " << e.b << " " << e.w << "\n"; } ``` ## Java Now, there are two ways of implementing this in Java: `Comparable`, and `Comparator`. They essentially serve the same purpose, but `Comparable` is generally easier and shorter to code. `Comparable` is a function implemented within the class containing the custom object, while `Comparator` is its own class. For our example, we'll use a `Person` class that contains a person's height and weight, and sort in ascending order by height. ### Comparable If we use `Comparable`, we'll need to put `implements Comparable` into the heading of the class. Furthermore, we'll need to implement the `compareTo` method. Essentially, `compareTo(x)` is the `compare` function that we described above, with the object itself as the first argument, or `compare(self, x)`. ```java static class Person implements Comparable{ int height, weight; public Person(int h, int w){ height = h; weight = w; } public int compareTo(Person p){ return Integer.compare(height, p.height); } } ``` When using Comparable, we can just call `Arrays.sort(arr)` or `Collections.sort(list)` on the array or list as usual. ### Comparator If instead we choose to use `Comparator`, we'll need to declare a second `Comparator` class, and then implement that: ```java static class Person{ int height, weight; public Person(int h, int w){ height = h; weight = w; } } static class Comp implements Comparator{ public int compare(Person a, Person b){ return Integer.compare(a.height, b.height); } } ``` When using `Comparator`, the syntax for using the built-in sorting function requires a second argument: `Arrays.sort(arr, new Comp())`, or `Collections.sort(list, new Comp())`. If we instead wanted to sort in descending order, this is also very simple. Instead of the comparing function returning `Integer.compare(x, y)` of the arguments, it should instead return `-Integer.compare(x, y)`. ## Python ### Defining Operator ```py import random class Foo: def __init__(self, _Bar): self.Bar = _Bar def __str__(self): return "Foo({})".format(self.Bar) def __lt__(self, o): # lt means less than return self.Bar < o.Bar a = [] for i in range(8): a.append(Foo(random.randint(1, 10))) print(*a) print(*sorted(a)) ``` Output: ``` Foo(0) Foo(1) Foo(2) Foo(1) Foo(9) Foo(5) Foo(5) Foo(8) Foo(0) Foo(1) Foo(1) Foo(2) Foo(5) Foo(5) Foo(8) Foo(9) ``` ### Remapping Key - This method maps an object to another comparable datatype with which to be sorted. In this case, `Foo` is sorted by the sum of its members `x` and `y`. ```py import random class Foo: def __init__(self, _Bar, _Baz): self.Bar,self.Baz = _Bar,_Baz def __str__(self): return "Foo({},{})".format(self.Bar, self.Baz) a = [] for i in range(8): a.append(Foo(random.randint(1, 9)*10, random.randint(1, 9))) print(*a) print(*sorted(a, key=lambda foo: foo.Bar+foo.Baz)) def key(foo): return foo.Bar + foo.Baz print(*sorted(a, key=key)) ``` Output: ``` Foo(10,2) Foo(30,2) Foo(60,6) Foo(90,7) Foo(80,7) Foo(80,9) Foo(60,9) Foo(90,8) Foo(10,2) Foo(30,2) Foo(60,6) Foo(60,9) Foo(80,7) Foo(80,9) Foo(90,7) Foo(90,8) Foo(10,2) Foo(30,2) Foo(60,6) Foo(60,9) Foo(80,7) Foo(80,9) Foo(90,7) Foo(90,8) ``` #### Function / Lambda - This method defines how to compare two elements represented by an integer - Positive: First term is greater than the second term - Zero: First term and second term are equal - Negative: First term is less than the second term Note how the comparator must be converted to a `key`. ```py import random from functools import cmp_to_key class Foo: def __init__(self, _Bar): self.Bar = _Bar def __str__(self): return "Foo({})".format(self.Bar) a = [] for i in range(8): a.append(Foo(random.randint(0, 9))) print(*a) print(*sorted(a, key=cmp_to_key(lambda foo1, foo2: foo1.Bar - foo2.Bar))) def cmp(foo1, foo2): return foo1.Bar - foo2.Bar print(*sorted(a, key=cmp_to_key(cmp))) ``` Output: ``` Foo(0) Foo(1) Foo(2) Foo(1) Foo(9) Foo(5) Foo(5) Foo(8) Foo(0) Foo(1) Foo(1) Foo(2) Foo(5) Foo(5) Foo(8) Foo(9) Foo(0) Foo(1) Foo(1) Foo(2) Foo(5) Foo(5) Foo(8) Foo(9) ```