This repository has been archived on 2022-06-22. You can view files and clone it, but cannot push or open issues or pull requests.
usaco-guide/content/3_Silver/Prefix_Sums.mdx
2020-07-18 21:36:48 -04:00

512 lines
16 KiB
Text

---
id: prefix-sums
title: "Prefix Sums"
author: Darren Yao, Eric Wei
description: "Computing range sum queries in constant time over a fixed array."
prerequisites:
- intro-ds
frequency: 3
---
import { Problem } from "../models";
export const metadata = {
problems: {
sample: [
new Problem("YS", "Static Range Sum", "static_range_sum", "Easy", false, [], "equivalent to [CSES Range Sum Queries I](https://cses.fi/problemset/task/1646)"),
],
cum: [
new Problem("LC", "Find Pivot Index", "find-pivot-index", "Easy", false, [], "Calculate all prefix and suffix sums"),
new Problem("Silver", "Breed Counting", "572", "Easy", false, [], "Find separate prefix sums for each breed"),
new Problem("Silver", "Subsequences Summing to Sevens", "595", "Easy", false, [], "Find prefix sums modulo 7, and find a pair of equal (modulo 7) prefix sums as far away from each other as possible"),
new Problem("Silver", "Hoof Paper Scissors", "691", "Easy", false, [], "Find prefix sums for the number of hoofs, papers, and scissors separately, the enumerate where the change takes place and consider all the possibilities of what gesture to use before and after the change"),
new Problem("Silver", "Max Cross", "715", "Easy", false, [], "Find prefix sum of number of broken signals, then enumerate all possible locations of the interval"),
new Problem("Old Bronze", "Haybale Stacking", "104", "Easy", false, [], "**Task:** Given an array of size $N$, do the following operation $Q$ times: add $X$ to the values between $i$ and $j$. Afterwards, print the final array. **Solution:** Consider the array formed by $a_i-a_{i-1}$. When processing a range addition, only two values in this difference array change! At the end, we can recover the original array using prefix sums. (The math is left as an exercise to the reader.)"),
new Problem("CSES", "Subarray Sums II", "1661", "Easy", false, [], "Find prefix sums, then enumerate the right end of the interval, while maintaining a multiset of all prefix sum values that currently exist to the left"),
new Problem("CSES", "Subarray Divisibility", "1662", "Easy", false, [], "Find prefix sums modulo n, then find number of pairs of equal prefix sums"),
new Problem("Gold", "Help Yourself", "1018", "Very Hard", false, [], ""),
],
maxsum: [
new Problem("CSES", "Max Subarray Sum", "1643", "Easy", false, ["Prefix Sums"], "Enumerate the right endpoint while maintaining the lowest prefix sum that currently exists to the left"),
],
maxsumext: [
new Problem("CSES", "Max Subarray Sum II", "1644", "Normal", false, ["Prefix Sums"], "Enumerate the right endpoint while maintaining multiset of prefix sums in the interval where the left endpoint might lie in"),
],
sample2: [
new Problem("CSES", "Forest Queries", "1652", "Easy", false, ["Prefix Sums"], ""),
],
cum2: [
new Problem("Silver", "Painting the Barn", "919", "Easy", false, ["Prefix Sums"], "When adding a rectangle, we need to add one to some corners and subtract one to other corners (think about which) in a temporary 2D array. At the end, take its prefix sum."),
new Problem("Gold", "Painting the Barn", "923", "Hard", false, ["Prefix Sums", "Max Subarray Sum"], "Treat areas that currently have K-1 layers of paint as having a value of 1, those having K layers of paint as having a value of -1, and the rest as having a value of 0. For each pair of x-coordinates, find the maximum subrectangle whose left side is located at the first x-coordinate and the right side is located at the second x-coordinate. This can be done with maximum subarray sum and prefix sums of each row. Do something similar with each pair of y-coordinates (fixing the top and bottom of the rectangle). Also notice that whenever two disjoint rectangles are drawn, there always exists a horizontal or vertical line separating them. Enumerate the location of the line, and find maximum subrectangle of each side using the information that we precalculated."),
],
related: [
new Problem("Silver", "My Cow Ate My Homework", "762", "Easy", false, ["Prefix Sums"], "Enumerate, from right to left, the last question eaten. The problem is reduced to suffix minimums."),
new Problem("CSES", "Range XOR Queries", "1650", "Easy", false, ["Prefix Sums"], "XOR prefix[r] with prefix[l-1]"),
],
complex: [
new Problem("Google KickStart", "Candies (Test Set 1)", "https://codingcompetitions.withgoogle.com/kickstart/round/000000000019ff43/0000000000337b4d", "Easy", false, ["Prefix Sums"], ""),
new Problem("AC", "Multiple of 2019", "https://atcoder.jp/contests/abc164/tasks/abc164_d", "Hard", false, ["Prefix Sums"], "Make use of the fact that adding 0's to the end of a number does not affect whether it is a multiple of 2019 (because 10 and 2019 are coprime)."),
],
}
};
<Resources title="Additional">
<Resource source="IUSACO" title="11 - Prefix Sums" starred>module is based off this</Resource>
<Resource source="CPH" title="9.1 - Sum Queries">rather brief</Resource>
<Resource source="PAPS" title="11.2.1 - Prefix Precomputation">also rather brief</Resource>
</Resources>
## Introduction
<Problems problems={metadata.problems.sample} />
Let's say we have a one-indexed integer array $\texttt{arr}$ of size $N$ and we want to compute the value of
$$
\texttt{arr}[a]+\texttt{arr}[a+1]+\cdots+\texttt{arr}[b]
$$
for $Q$ different pairs $(a,b)$ satisfying $1\le a\le b\le N$. We'll use the following example with $N = 6$:
<center>
| Index $i$ | 1 | 2 | 3 | 4 | 5 | 6 |
| --- | --- | --- | --- | --- | --- | --- |
| $\texttt{arr}[i]$ | 1 | 6 | 4 | 2 | 5 | 3 |
</center>
Naively, for every query, we can iterate through all entries from index $a$ to index $b$ to add them up. Since we have $Q$ queries and each query requires a maximum of $O(N)$ operations to calculate the sum, our total time complexity is $O(NQ)$. For most problems of this nature, the constraints will be $N, Q \leq 10^5$, so $NQ$ is on the order of $10^{10}$. This is **not** acceptable; it will almost certainly exceed the time limit.
Instead, we can use prefix sums to process these array sum queries. We designate a prefix sum array $\texttt{prefix}$. First, because we're 1-indexing the array, set $\texttt{prefix}[0]=0$, then for indices $k$ such that $1 \leq k \leq n$, define the prefix sum array as follows:
$$
\texttt{prefix}[k]=\sum_{i=1}^{k} \texttt{arr}[i]
$$
Basically, what this means is that the element at index $k$ of the prefix sum array stores the sum of all the elements in the original array from index $1$ up to $k$. This can be calculated easily in $O(N)$ by the following formula for each $1\le k\le n$:
$$
\texttt{prefix}[k]=\texttt{prefix}[k-1]+\texttt{arr}[k]
$$
For the example case, our prefix sum array looks like this:
<center>
| Index $i$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| $\texttt{prefix}[i]$ | 0 | 1 | 7 | 11 | 13 | 18 | 21 |
</center>
Now, when we want to query for the sum of the elements of $\texttt{arr}$ between (1-indexed) indices $a$ and $b$ inclusive, we can use the following formula:
$$
\sum_{i=L}^{R} \texttt{arr}[i] = \sum_{i=1}^{R} \texttt{arr}[i] - \sum_{i=1}^{L-1} \texttt{arr}[i]
$$
Using our definition of the elements in the prefix sum array, we have
$$
\sum_{i=L}^{R} \texttt{arr}[i]= \texttt{prefix}[R]-\texttt{prefix}[L-1]
$$
Since we are only querying two elements in the prefix sum array, we can calculate subarray sums in $O(1)$ per query, which is much better than the $O(N)$ per query that we had before. Now, after an $O(N)$ preprocessing to calculate the prefix sum array, each of the $Q$ queries takes $O(1)$ time. Thus, our total time complexity is $O(N+Q)$, which should now pass the time limit.
Let's do an example query and find the subarray sum between indices $a = 2$ and $b = 5$, inclusive, in the 1-indexed $\texttt{arr}$. From looking at the original array, we see that this is
$$
\sum_{i=2}^{5} \texttt{arr}[i] = 6 + 4 + 2 + 5 = 17.
$$
<center>
<table>
<thead>
<tr>
<th>
<!-- extremely dumb formatting should be fixed by better mdx parser in mdx 2.0 -->
Index $ijkl$
</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
</tr>
</thead>
<tbody>
<tr>
<td>
$\texttt{arr}[i]$
</td>
<td>1</td>
<td className="bg-yellow-100">6</td>
<td className="bg-yellow-100">4</td>
<td className="bg-yellow-100">2</td>
<td className="bg-yellow-100">5</td>
<td>3</td>
</tr>
</tbody>
</table>
</center>
Using prefix sums:
$$
\texttt{prefix}[5] - \texttt{prefix}[1] = 18 - 1 = 17.
$$
<center>
<table>
<thead>
<tr>
<th>
Index $i$
</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
</tr>
</thead>
<tbody>
<tr>
<td>
$\texttt{prefix}[i]$
</td>
<td className="bg-red-100">0</td>
<td className="bg-red-100">1</td>
<td className="bg-green-100">7</td>
<td className="bg-green-100">11</td>
<td className="bg-green-100">13</td>
<td className="bg-green-100">18</td>
<td>21</td>
</tr>
</tbody>
</table>
</center>
These are also known as [partial sums](https://mathworld.wolfram.com/PartialSum.html).
### Problems
<Problems problems={metadata.problems.cum} />
## Max Subarray Sum
Now we'll look at some extensions.
<Problems problems={metadata.problems.maxsum} />
This problem has a solution known as [Kadane's Algorithm](https://en.wikipedia.org/wiki/Maximum_subarray_problem#Kadane's_algorithm). Please don't use that solution; try to solve it with prefix sums.
<Spoiler title="Why are the two methods equivalent?">
Consider the desired maximum subarray. As you go along from left to right, the prefix sum solution will mark the start of that subarray as the "current minimum prefix sum". Kadane's Algorithm, on the other hand, will set the current value to 0 at that point. As both solutions iterate through the array, they eventually find the right side of the maximum sum, and they find the answer to the problem at that location. In essence, Kadane's Algorithm stores the maximum sum of a subarray that ends at the current location (which is what the prefix sum solution calculates on each iteration), but it calculates this value greedily instead.
</Spoiler>
## Prefix Minimum, XOR, etc.
Similar to prefix sums, you can also take prefix minimum or maximum; but *you cannot* answer min queries over an arbitrary range with prefix minimum. (This is because minimum doesn't have an inverse operation, the way subtraction is to addition.) On the other hand, XOR is its own inverse operation, meaning that the XOR of any number with itself is zero.
<Problems problems={metadata.problems.related} />
## 2D Prefix Sums
<Problems problems={metadata.problems.sample2} />
Now, what if we wanted to process $Q$ queries for the sum over a subrectangle of a $N$ rows by $M$ columns matrix in two dimensions? Let's assume both rows and columns are 1-indexed, and we use the following matrix as an example:
<center>
<table className="text-center">
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>5</td>
<td>6</td>
<td>11</td>
<td>8</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>7</td>
<td>11</td>
<td>9</td>
<td>4</td>
</tr>
<tr>
<td>0</td>
<td>4</td>
<td>6</td>
<td>1</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
<td>5</td>
<td>4</td>
<td>2</td>
<td>3</td>
</tr>
</tbody>
</table>
</center>
Naively, each sum query would then take $O(NM)$ time, for a total of $O(QNM)$. This is too slow.
Let's take the following example region, which we want to sum:
<center>
<table className="text-center">
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>5</td>
<td>6</td>
<td>11</td>
<td>8</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td className="bg-blue-100">7</td>
<td className="bg-blue-100">11</td>
<td className="bg-blue-100">9</td>
<td>4</td>
</tr>
<tr>
<td>0</td>
<td>4</td>
<td className="bg-blue-100">6</td>
<td className="bg-blue-100">1</td>
<td className="bg-blue-100">3</td>
<td>2</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
<td>5</td>
<td>4</td>
<td>2</td>
<td>3</td>
</tr>
</tbody>
</table>
</center>
Manually summing all the cells, we have a submatrix sum of $7+11+9+6+1+3 = 37$.
The first logical optimization would be to do one-dimensional prefix sums of each row. Then, we'd have the following row-prefix sum matrix. The desired subarray sum of each row in our desired region is simply the green cell minus the red cell in that respective row. We do this for each row to get $(28-1) + (14-4) = 37$.
<center>
<table className="text-center">
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>6</td>
<td>12</td>
<td>23</td>
<td>31</td>
</tr>
<tr>
<td>0</td>
<td className="bg-red-200">1</td>
<td>8</td>
<td>19</td>
<td className="bg-green-200">28</td>
<td>32</td>
</tr>
<tr>
<td>0</td>
<td className="bg-red-200">4</td>
<td>10</td>
<td>11</td>
<td className="bg-green-200">14</td>
<td>16</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
<td>12</td>
<td>16</td>
<td>18</td>
<td>21</td>
</tr>
</tbody>
</table>
</center>
Now, if we wanted to find a submatrix sum, we could break up the submatrix into a subarray for each row, and then add their sums, which would be calculated using the prefix sums method described earlier. Since the matrix has $N$ rows, the time complexity of this is $O(QN)$. This might be fast enough for $Q=10^5$ and $N=10^3$, but we can do better.
In fact, we can do two-dimensional prefix sums. In our two dimensional prefix sum array, we have
$$
\texttt{prefix}[a][b]=\sum_{i=1}^{a} \sum_{j=1}^{b} \texttt{arr}[i][j].
$$
This can be calculated as follows for row index $1 \leq i \leq n$ and column index $1 \leq j \leq m$:
$$
\texttt{prefix}[i][j]
= \texttt{prefix}[i-1][j]
+ \texttt{prefix}[i][j-1]
- \texttt{prefix}[i-1][j-1]
+ \texttt{arr}[i][j]
$$
The submatrix sum between rows $a$ and $A$ and columns $b$ and $B$, can thus be expressed as follows:
$$
\sum_{i=a}^{A} \sum_{j=b}^{B} \texttt{arr}[i][j]=\texttt{prefix}[A][B]
- \texttt{prefix}[a-1][B]
- \texttt{prefix}[A][b-1]
+ \texttt{prefix}[a-1][b-1]
$$
Summing the blue region from above using the 2D prefix sums method, we add the value of the green square, subtract the values of the red squares, and then add the value of the gray square. In this example, we have
$$
65-23-6+1 = 37,
$$
as expected.
<center>
<table className="text-center">
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td className="bg-gray-200">1</td>
<td>6</td>
<td>12</td>
<td className="bg-red-200">23</td>
<td>31</td>
</tr>
<tr>
<td>0</td>
<td>2</td>
<td>14</td>
<td>31</td>
<td>51</td>
<td>63</td>
</tr>
<tr>
<td>0</td>
<td className="bg-red-200">6</td>
<td>24</td>
<td>42</td>
<td className="bg-green-200">65</td>
<td>79</td>
</tr>
<tr>
<td>0</td>
<td>13</td>
<td>36</td>
<td>58</td>
<td>83</td>
<td>100</td>
</tr>
</tbody>
</table>
</center>
Since no matter the size of the submatrix we are summing, we only need to access four values of the 2D prefix sum array, this runs in $O(1)$ per query after an $O(NM)$ preprocessing.
<Problems problems={metadata.problems.cum2} />
## More Complex Applications
Instead of storing just the values themselves, you can also take a prefix sum over $i\cdot a_i$, or $10^i \cdot a_i$, for instance. Some math is usually helpful for these problems; don't be afraid to get dirty with algebra!
For instance, let's see how to quickly answer the following type of query:
> Find $1\cdot a_l+2\cdot a_{l+1}+3\cdot a_{l+2}+\cdots+(r-l+1)\cdot a_{r}$.
First, define the following:
$$
\texttt{ps}[i] = a_1+a_2+a_3+a_4+\cdots+a_i
$$
$$
\texttt{ips}[i] = 1\cdot a_1+2\cdot a_2+\cdots+i\cdot a_i
$$
Then, we have:
$$
l\cdot a_l + (l+1) \cdot a_{l+1} + \cdots + r \cdot a_r = \texttt{ips}[r]-\texttt{ips}[l-1]
$$
$$
(l-1) \cdot a_l + (l-1) \cdot a_{l+1} + \cdot + (l-1) \cdot a_r = (l-1)(\texttt{ps}[r]-\texttt{ps}[l-1])
$$
And so,
$$
1\cdot a_l + 2 \cdot a_{l+1} + \cdots + (r-l+1) \cdot a_r = \texttt{ips}[r]-\texttt{ips}[l-1]-(l-1)(\texttt{ps}[r]-\texttt{ps}[l-1])
$$
Which is what we were looking for!
<Problems problems={metadata.problems.complex} />