This repository has been archived on 2022-06-22. You can view files and clone it, but cannot push or open issues or pull requests.
usaco-guide/content/2_Bronze/Intro_DS.mdx

324 lines
11 KiB
Text

---
id: intro-ds
title: Introduction to Data Structures
author: Darren Yao, Benjamin Qi
description: "Basic data structures in multiple languages such as dynamic arrays."
frequency: 4
---
import { Problem } from "../models";
export const problems = {
bubble: [
new Problem("HR", "Bubble Sort", "https://www.hackerrank.com/challenges/ctci-bubble-sort/problem", "Easy", false, [], "O(N^2)"),
],
easy: [
new Problem("CSES", "Distinct Numbers", "1621", "Easy"),
new Problem("CSES", "Stick Lengths", "1074", "Normal", false, [], "Spoiler: Optimal length is median"),
new Problem("Silver", "Teleportation", "812", "Very Hard", false, [], ""),
],
};
<Resources>
<Resource source="IUSACO" title="4.1 - Dynamic Arrays">module is based off this</Resource>
<Resource source="CPH" title="4.1 - Dynamic Arrays" starred>vectors, strings</Resource>
<Resource source="PAPS" title="3.1 - Vectors, 6.1 - Dynamic Arrays">how dynamic arrays work</Resource>
<Resource source="CPC" title="2 - Data Structures" url="02_data_structures">assumes prior experience</Resource>
</Resources>
<br />
A **data structure** determines how data is stored (is it sorted? indexed? what operations does it support?). Each data structure supports some operations efficiently, while other operations are either inefficient or not supported at all.
## Introduction
<LanguageSection>
<CPPSection>
The C++ [standard library data structures](http://www.cplusplus.com/reference/stl/) are designed to store any type of data. We put the desired data type within the `<>` brackets when declaring the data structure, as follows:
```cpp
vector<string> v;
```
This creates a `vector` structure that only stores objects of type `string`.
For our examples below, we will primarily use the `int` data type, but note that you can use any data type including `string` and user-defined structures.
Essentially every standard library data structure supports the `size()` method, which returns the number of elements in the data structure, and the `empty()` method, which returns `true` if the data structure is empty, and `false` otherwise.
</CPPSection>
<JavaSection>
Java default [`Collections`](https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html) data structures are designed to store any type of object. However, we usually don't want this; instead, we want our data structures to only store one type of data, like integers, or strings. We do this by putting the desired data type within the `<>` brackets when declaring the data structure, as follows:
```java
ArrayList<String> list = new ArrayList<String>();
```
This creates an `ArrayList` structure that only stores objects of type `String`.
For our examples below, we will primarily use the `Integer` data type, but note that you can have Collections of any object type, including `Strings` , other Collections, or user-defined objects.
`Collections` data types always contain an `add` method for adding an element to the collection, and a `remove` method which removes and returns a certain element from the collection. They also support the `size()` method, which returns the number of elements in the data structure, and the `isEmpty()` method, which returns `true` if the data structure is empty, and `false` otherwise.
</JavaSection>
</LanguageSection>
## Dynamic Arrays
You're probably already familiar with regular (static) arrays. Now, there are also dynamic arrays ([`vector`](http://www.cplusplus.com/reference/vector/vector/) in C++, [`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html) in Java) that support all the functions that a normal array does, and can repeatedly reallocate storage to accommodate more elements as they are added.
In a dynamic array, we can also add and delete elements at the end in $O(1)$ time. For example, the following code creates a dynamic array and adds the numbers $1$ through $10$ to it:
<LanguageSection>
<CPPSection>
```cpp
vector<int> v;
for(int i = 1; i <= 10; i++){
v.push_back(i);
}
```
</CPPSection>
<JavaSection>
```java
ArrayList<Integer> list = new ArrayList<Integer>();
for(int i = 1; i <= 10; i++){
list.add(i);
}
```
</JavaSection>
</LanguageSection>
In C++, we can give a dynamic array an initial size. The code below creates a dynamic array with $30$ zeroes.
<LanguageSection>
<CPPSection>
```cpp
vector<int> v(30); // one way
vector<int> v; v.resize(30); // another way
```
</CPPSection>
<JavaSection>
Unfortunately, there is no Java equivalent; just use a for loop.
```java
ArrayList<Integer> list = new ArrayList<Integer>();
for (int i = 1; i <= 30; i++) list.add(0);
```
</JavaSection>
</LanguageSection>
We can ensure that a dynamic array's **capacity** must be at least $x$, meaning that it will not reallocate storage until we add more than $x$ elements to it. If the dynamic array's capacity is already at least $x$, then nothing occurs. Note that capacity is **not** the same thing as size. For example, if you declare a dynamic array with initial capacity $30$ and try to edit the element at index $5$, this will throw an error since the size of the dynamic array is still zero.
The following code initializes a dynamic array with initial capacity $30$:
<LanguageSection>
<CPPSection>
```cpp
vector<int> v; v.reserve(30);
```
</CPPSection>
<JavaSection>
```java
ArrayList<Integer> list = new ArrayList<Integer>(30); // one way
ArrayList<Integer> list = new ArrayList<Integer>(); list.ensureCapacity(30); // another way
```
</JavaSection>
</LanguageSection>
We can add and remove at any index of a dynamic array in $O(n)$ time.
<LanguageSection>
<CPPSection>
```cpp
vector<int> v;
v.push_back(2); // [2]
v.push_back(3); // [2, 3]
v.push_back(7); // [2, 3, 7]
v.push_back(5); // [2, 3, 7, 5]
v[1] = 4; // sets element at index 1 to 4 -> [2, 4, 7, 5]
v.erase(v.begin() + 1); // removes element at index 1 -> [2, 7, 5]
// this remove method is O(n); to be avoided
v.push_back(8); // [2, 7, 5, 8]
v.erase(v.end()-1); // [2, 7, 5]
// here, we remove the element from the end of the list; this is O(1).
v.push_back(4); // [2, 7, 5, 4]
v.push_back(4); // [2, 7, 5, 4, 4]
v.push_back(9); // [2, 7, 5, 4, 4, 9]
cout << v[2]; // 5
v.erase(v.begin(), v.begin()+3); // [4, 4, 9]
// this erases the first three elements; O(n)
```
</CPPSection>
<JavaSection>
```java
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(2); // [2]
list.add(3); // [2, 3]
list.add(7); // [2, 3, 7]
list.add(5); // [2, 3, 7, 5]
list.set(1, 4); // sets element at index 1 to 4 -> [2, 4, 7, 5]
list.remove(1); // removes element at index 1 -> [2, 7, 5]
// this remove method is O(n); to be avoided
list.add(8); // [2, 7, 5, 8]
list.remove(list.size()-1); // [2, 7, 5]
// here, we remove the element from the end of the list; this is $O(1)$.
System.out.println(list.get(2)); // 5
```
</JavaSection>
</LanguageSection>
To iterate through a static or dynamic array, we can use either the regular for loop or the for-each loop.
<LanguageSection>
<CPPSection>
```cpp
vector<int> v;
v.push_back(1); v.push_back(7); v.push_back(4); v.push_back(5); v.push_back(2);
int arr[] = {1, 7, 4, 5, 2};
for(int i = 0; i < v.size(); i++){
cout << v[i] << " ";
}
cout << endl;
for(int element : arr){
cout << element << " ";
}
cout << endl;
```
</CPPSection>
<JavaSection>
```java
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(1); list.add(7); list.add(4); list.add(5); list.add(2);
int[] arr = {1, 7, 4, 5, 2};
for(int i = 0; i < list.size(); i++){ // regular
System.out.println(list.get(i));
}
for(int element : arr){ // for-each
System.out.println(element);
}
```
</JavaSection>
</LanguageSection>
In array-based contest problems, we'll use one-, two-, and three-dimensional static arrays most of the time. However, we can also have static arrays of dynamic arrays, dynamic arrays of static arrays, and so on. Usually, the choice between a static array and a dynamic array is just personal preference.
## Iterators
An **iterator** allows you to traverse a container by pointing to an object within the container.
<LanguageSection>
<CPPSection>
However, they are **not** the same thing as pointers. For example, `vector.begin()` returns an iterator pointing to the first element of the vector. Apart from the standard way of traversing a vector (by treating it as an array), you can also use iterators:
```cpp
for (vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it) {
cout << *it;
}
```
C++11 and later versions can automatically infer the type of an object if you use the keyword `auto`, so the following are okay:
```cpp
for (auto it = myvector.begin(); it != myvector.end(); ++it) {
cout << *it;
}
for(auto element : v) {
cout << element;
}
```
</CPPSection>
</LanguageSection>
## Sorting
<Problems problems={problems.bubble} />
**Sorting** refers to arranging items in some particular order. You do not need to know how to sort an array in $O(N\log N)$ time for Bronze, but you should be aware of how to use built-in methods to sort a (possibly dynamic) array.
<LanguageSection>
<CPPSection>
In order to sort a dynamic array, use `sort(v.begin(), v.end())` (or `sort(begin(v),end(v))`), whereas static arrays require `sort(arr, arr + N)` where $N$ is the number of elements to be sorted. The default sort function sorts the array in ascending order.
- [std::sort](https://en.cppreference.com/w/cpp/algorithm/sort)
- [std::stable\_sort](http://www.cplusplus.com/reference/algorithm/stable_sort/)
- [Golovanov399 - C++ Tricks](https://codeforces.com/blog/entry/74684)
- first two related to sorting
</CPPSection>
<JavaSection>
In order to sort a static or dynamic array, use `Arrays.sort(arr)` or `Collections.sort(list)` respectively. The default sort function sorts the array in ascending order.
- [Arrays.sort](https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort(java.lang.Object[]))
- [Collections.sort](https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#sort(java.util.List))
The `Arrays.sort()` function uses quicksort on primitive data types such as `long`s. This is fine for USACO, but in other contests such as CodeForces, it may time out on test cases specifically engineered to trigger worst-case $O(N^2)$ behavior in quicksort. See [here](https://codeforces.com/contest/1324/hacks/625031/) for an example of a solution that was hacked on CodeForces.
Two ways to avoid this:
- Declare the underlying array as an array of objects, for example `Long` instead of `long`. This forces the `Arrays.sort()` function to use mergesort, which is always $O(N \log N)$.
- [Shuffle](https://pastebin.com/k6gCRJDv) the array beforehand.
</JavaSection>
<PySection>
- [Sorting Basics](https://docs.python.org/3/howto/sorting.html)
</PySection>
</LanguageSection>
## Problems
<Problems problems={problems.easy} />