diff --git a/.ipynb_checkpoints/nn_tutorial-checkpoint.ipynb b/.ipynb_checkpoints/nn_tutorial-checkpoint.ipynb new file mode 100644 index 0000000..d7e71a3 --- /dev/null +++ b/.ipynb_checkpoints/nn_tutorial-checkpoint.ipynb @@ -0,0 +1,1928 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "What is `torch.nn` *really*?\n", + "============================\n", + "by Jeremy Howard, `fast.ai `_. Thanks to Rachel Thomas and Francisco Ingham.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We recommend running this tutorial as a notebook, not a script. To download the notebook (.ipynb) file,\n", + "click the link at the top of the page.\n", + "\n", + "PyTorch provides the elegantly designed modules and classes `torch.nn `_ ,\n", + "`torch.optim `_ ,\n", + "`Dataset `_ ,\n", + "and `DataLoader `_\n", + "to help you create and train neural networks.\n", + "In order to fully utilize their power and customize\n", + "them for your problem, you need to really understand exactly what they're\n", + "doing. To develop this understanding, we will first train basic neural net\n", + "on the MNIST data set without using any features from these models; we will\n", + "initially only use the most basic PyTorch tensor functionality. Then, we will\n", + "incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n", + "``DataLoader`` at a time, showing exactly what each piece does, and how it\n", + "works to make the code either more concise, or more flexible.\n", + "\n", + "**This tutorial assumes you already have PyTorch installed, and are familiar\n", + "with the basics of tensor operations.** (If you're familiar with Numpy array\n", + "operations, you'll find the PyTorch tensor operations used here nearly identical).\n", + "\n", + "MNIST data setup\n", + "----------------\n", + "\n", + "We will use the classic `MNIST `_ dataset,\n", + "which consists of black-and-white images of hand-drawn digits (between 0 and 9).\n", + "\n", + "We will use `pathlib `_\n", + "for dealing with paths (part of the Python 3 standard library), and will\n", + "download the dataset using\n", + "`requests `_. We will only\n", + "import modules when we use them, so you can see exactly what's being\n", + "used at each point.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import requests\n", + "\n", + "DATA_PATH = Path(\"data\")\n", + "PATH = DATA_PATH / \"mnist\"\n", + "\n", + "PATH.mkdir(parents=True, exist_ok=True)\n", + "\n", + "URL = \"https://github.com/pytorch/tutorials/raw/master/_static/\"\n", + "FILENAME = \"mnist.pkl.gz\"\n", + "\n", + "if not (PATH / FILENAME).exists():\n", + " content = requests.get(URL + FILENAME).content\n", + " (PATH / FILENAME).open(\"wb\").write(content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This dataset is in numpy array format, and has been stored using pickle,\n", + "a python-specific format for serializing data.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import pickle\n", + "import gzip\n", + "\n", + "with gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n", + " ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each image is 28 x 28, and is being stored as a flattened row of length\n", + "784 (=28x28). Let's take a look at one; we need to reshape it to 2d\n", + "first.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(50000, 784)\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAN8klEQVR4nO3df6jVdZ7H8ddrbfojxzI39iZOrWOEUdE6i9nSyjYRTj8o7FYMIzQ0JDl/JDSwyIb7xxSLIVu6rBSDDtXYMus0UJHFMNVm5S6BdDMrs21qoxjlphtmmv1a9b1/3K9xp+75nOs53/PD+34+4HDO+b7P93zffPHl99f53o8jQgAmvj/rdQMAuoOwA0kQdiAJwg4kQdiBJE7o5sJsc+of6LCI8FjT29qy277C9lu237F9ezvfBaCz3Op1dtuTJP1B0gJJOyW9JGlRROwozMOWHeiwTmzZ50l6JyLejYgvJf1G0sI2vg9AB7UT9hmS/jjq/c5q2p+wvcT2kO2hNpYFoE0dP0EXEeskrZPYjQd6qZ0t+y5JZ4x6/51qGoA+1E7YX5J0tu3v2j5R0o8kbaynLQB1a3k3PiIO2V4q6SlJkyQ9EBFv1NYZgFq1fOmtpYVxzA50XEd+VAPg+EHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEi0P2Yzjw6RJk4r1U045paPLX7p0acPaSSedVJx39uzZxfqtt95arN9zzz0Na4sWLSrO+/nnnxfrK1euLNbvvPPOYr0X2gq77fckHZB0WNKhiJhbR1MA6lfHlv3SiPiwhu8B0EEcswNJtBv2kPS07ZdtLxnrA7aX2B6yPdTmsgC0od3d+PkRscv2X0h6xvZ/R8Tm0R+IiHWS1kmS7WhzeQBa1NaWPSJ2Vc97JD0maV4dTQGoX8thtz3Z9pSjryX9QNL2uhoDUK92duMHJD1m++j3/HtE/L6WriaYM888s1g/8cQTi/WLL764WJ8/f37D2tSpU4vzXn/99cV6L+3cubNYX7NmTbE+ODjYsHbgwIHivK+++mqx/sILLxTr/ajlsEfEu5L+qsZeAHQQl96AJAg7kARhB5Ig7EAShB1IwhHd+1HbRP0F3Zw5c4r1TZs2Feudvs20Xx05cqRYv/nmm4v1Tz75pOVlDw8PF+sfffRRsf7WW2+1vOxOiwiPNZ0tO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwXX2GkybNq1Y37JlS7E+a9asOtupVbPe9+3bV6xfeumlDWtffvllcd6svz9oF9fZgeQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJhmyuwd69e4v1ZcuWFetXX311sf7KK68U683+pHLJtm3bivUFCxYU6wcPHizWzzvvvIa12267rTgv6sWWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4H72PnDyyScX682GF167dm3D2uLFi4vz3njjjcX6hg0binX0n5bvZ7f9gO09trePmjbN9jO2366eT62zWQD1G89u/K8kXfG1abdLejYizpb0bPUeQB9rGvaI2Czp678HXShpffV6vaRr620LQN1a/W38QEQcHSzrA0kDjT5oe4mkJS0uB0BN2r4RJiKidOItItZJWidxgg7opVYvve22PV2Squc99bUEoBNaDftGSTdVr2+S9Hg97QDolKa78bY3SPq+pNNs75T0c0krJf3W9mJJ70v6YSebnOj279/f1vwff/xxy/PecsstxfrDDz9crDcbYx39o2nYI2JRg9JlNfcCoIP4uSyQBGEHkiDsQBKEHUiCsANJcIvrBDB58uSGtSeeeKI47yWXXFKsX3nllcX6008/Xayj+xiyGUiOsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dr7BHfWWWcV61u3bi3W9+3bV6w/99xzxfrQ0FDD2n333Vect5v/NicSrrMDyRF2IAnCDiRB2IEkCDuQBGEHkiDsQBJcZ09ucHCwWH/wwQeL9SlTprS87OXLlxfrDz30ULE+PDxcrGfFdXYgOcIOJEHYgSQIO5AEYQeSIOxAEoQdSILr7Cg6//zzi/XVq1cX65dd1vpgv2vXri3WV6xYUazv2rWr5WUfz1q+zm77Adt7bG8fNe0O27tsb6seV9XZLID6jWc3/leSrhhj+r9ExJzq8bt62wJQt6Zhj4jNkvZ2oRcAHdTOCbqltl+rdvNPbfQh20tsD9lu/MfIAHRcq2H/haSzJM2RNCxpVaMPRsS6iJgbEXNbXBaAGrQU9ojYHRGHI+KIpF9KmldvWwDq1lLYbU8f9XZQ0vZGnwXQH5peZ7e9QdL3JZ0mabekn1fv50gKSe9J+mlENL25mOvsE8/UqVOL9WuuuaZhrdm98vaYl4u/smnTpmJ9wYIFxfpE1eg6+wnjmHHRGJPvb7sjAF3Fz2WBJAg7kARhB5Ig7EAShB1Igltc0TNffPFFsX7CCeWLRYcOHSrWL7/88oa1559/vjjv8Yw/JQ0kR9iBJAg7kARhB5Ig7EAShB1IgrADSTS96w25XXDBBcX6DTfcUKxfeOGFDWvNrqM3s2PHjmJ98+bNbX3/RMOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dr7BDd79uxifenSpcX6ddddV6yffvrpx9zTeB0+fLhYHx4u//XyI0eO1NnOcY8tO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwXX240Cza9mLFo010O6IZtfRZ86c2UpLtRgaGirWV6xYUaxv3LixznYmvKZbdttn2H7O9g7bb9i+rZo+zfYztt+unk/tfLsAWjWe3fhDkv4+Is6V9DeSbrV9rqTbJT0bEWdLerZ6D6BPNQ17RAxHxNbq9QFJb0qaIWmhpPXVx9ZLurZDPQKowTEds9ueKel7krZIGoiIoz9O/kDSQIN5lkha0kaPAGow7rPxtr8t6RFJP4uI/aNrMTI65JiDNkbEuoiYGxFz2+oUQFvGFXbb39JI0H8dEY9Wk3fbnl7Vp0va05kWAdSh6W68bUu6X9KbEbF6VGmjpJskrayeH+9IhxPAwMCYRzhfOffcc4v1e++9t1g/55xzjrmnumzZsqVYv/vuuxvWHn+8/E+GW1TrNZ5j9r+V9GNJr9veVk1brpGQ/9b2YknvS/phRzoEUIumYY+I/5I05uDuki6rtx0AncLPZYEkCDuQBGEHkiDsQBKEHUiCW1zHadq0aQ1ra9euLc47Z86cYn3WrFmttFSLF198sVhftWpVsf7UU08V65999tkx94TOYMsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0mkuc5+0UUXFevLli0r1ufNm9ewNmPGjJZ6qsunn37asLZmzZrivHfddVexfvDgwZZ6Qv9hyw4kQdiBJAg7kARhB5Ig7EAShB1IgrADSaS5zj44ONhWvR07duwo1p988sli/dChQ8V66Z7zffv2FedFHmzZgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJR0T5A/YZkh6SNCApJK2LiH+1fYekWyT9b/XR5RHxuybfVV4YgLZFxJijLo8n7NMlTY+IrbanSHpZ0rUaGY/9k4i4Z7xNEHag8xqFfTzjsw9LGq5eH7D9pqTe/mkWAMfsmI7Zbc+U9D1JW6pJS22/ZvsB26c2mGeJ7SHbQ+21CqAdTXfjv/qg/W1JL0haERGP2h6Q9KFGjuP/SSO7+jc3+Q5244EOa/mYXZJsf0vSk5KeiojVY9RnSnoyIs5v8j2EHeiwRmFvuhtv25Lul/Tm6KBXJ+6OGpS0vd0mAXTOeM7Gz5f0n5Jel3Skmrxc0iJJczSyG/+epJ9WJ/NK38WWHeiwtnbj60LYgc5reTcewMRA2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSKLbQzZ/KOn9Ue9Pq6b1o37trV/7kuitVXX29peNCl29n/0bC7eHImJuzxoo6Nfe+rUvid5a1a3e2I0HkiDsQBK9Dvu6Hi+/pF9769e+JHprVVd66+kxO4Du6fWWHUCXEHYgiZ6E3fYVtt+y/Y7t23vRQyO237P9uu1tvR6frhpDb4/t7aOmTbP9jO23q+cxx9jrUW932N5Vrbtttq/qUW9n2H7O9g7bb9i+rZre03VX6Ksr663rx+y2J0n6g6QFknZKeknSoojY0dVGGrD9nqS5EdHzH2DY/jtJn0h66OjQWrb/WdLeiFhZ/Ud5akT8Q5/0doeOcRjvDvXWaJjxn6iH667O4c9b0Yst+zxJ70TEuxHxpaTfSFrYgz76XkRslrT3a5MXSlpfvV6vkX8sXdegt74QEcMRsbV6fUDS0WHGe7ruCn11RS/CPkPSH0e936n+Gu89JD1t+2XbS3rdzBgGRg2z9YGkgV42M4amw3h309eGGe+bddfK8Oft4gTdN82PiL+WdKWkW6vd1b4UI8dg/XTt9BeSztLIGIDDklb1splqmPFHJP0sIvaPrvVy3Y3RV1fWWy/CvkvSGaPef6ea1hciYlf1vEfSYxo57Ognu4+OoFs97+lxP1+JiN0RcTgijkj6pXq47qphxh+R9OuIeLSa3PN1N1Zf3VpvvQj7S5LOtv1d2ydK+pGkjT3o4xtsT65OnMj2ZEk/UP8NRb1R0k3V65skPd7DXv5Evwzj3WiYcfV43fV8+POI6PpD0lUaOSP/P5L+sRc9NOhrlqRXq8cbve5N0gaN7Nb9n0bObSyW9OeSnpX0tqT/kDStj3r7N40M7f2aRoI1vUe9zdfILvprkrZVj6t6ve4KfXVlvfFzWSAJTtABSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBL/DyJ7caZa7LphAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "from matplotlib import pyplot\n", + "import numpy as np\n", + "\n", + "pyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n", + "print(x_train.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\n", + "convert our data.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor([[0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " ...,\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8])\n", + "torch.Size([50000, 784])\n", + "tensor(0) tensor(9)\n" + ] + } + ], + "source": [ + "import torch\n", + "\n", + "x_train, y_train, x_valid, y_valid = map(\n", + " torch.tensor, (x_train, y_train, x_valid, y_valid)\n", + ")\n", + "n, c = x_train.shape\n", + "print(x_train, y_train)\n", + "print(x_train.shape)\n", + "print(y_train.min(), y_train.max())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Neural net from scratch (no torch.nn)\n", + "---------------------------------------------\n", + "\n", + "Let's first create a model using nothing but PyTorch tensor operations. We're assuming\n", + "you're already familiar with the basics of neural networks. (If you're not, you can\n", + "learn them at `course.fast.ai `_).\n", + "\n", + "PyTorch provides methods to create random or zero-filled tensors, which we will\n", + "use to create our weights and bias for a simple linear model. These are just regular\n", + "tensors, with one very special addition: we tell PyTorch that they require a\n", + "gradient. This causes PyTorch to record all of the operations done on the tensor,\n", + "so that it can calculate the gradient during back-propagation *automatically*!\n", + "\n", + "For the weights, we set ``requires_grad`` **after** the initialization, since we\n", + "don't want that step included in the gradient. (Note that a trailing ``_`` in\n", + "PyTorch signifies that the operation is performed in-place.)\n", + "\n", + "

Note

We are initializing the weights here with\n", + " `Xavier initialisation `_\n", + " (by multiplying with 1/sqrt(n)).

\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import math\n", + "\n", + "weights = torch.randn(784, 10) / math.sqrt(784)\n", + "weights.requires_grad_()\n", + "bias = torch.zeros(10, requires_grad=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thanks to PyTorch's ability to calculate gradients automatically, we can\n", + "use any standard Python function (or callable object) as a model! So\n", + "let's just write a plain matrix multiplication and broadcasted addition\n", + "to create a simple linear model. We also need an activation function, so\n", + "we'll write `log_softmax` and use it. Remember: although PyTorch\n", + "provides lots of pre-written loss functions, activation functions, and\n", + "so forth, you can easily write your own using plain python. PyTorch will\n", + "even create fast GPU or vectorized CPU code for your function\n", + "automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def log_softmax(x):\n", + " return x - x.exp().sum(-1).log().unsqueeze(-1)\n", + "\n", + "def model(xb):\n", + " return log_softmax(xb @ weights + bias)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above, the ``@`` stands for the dot product operation. We will call\n", + "our function on one batch of data (in this case, 64 images). This is\n", + "one *forward pass*. Note that our predictions won't be any better than\n", + "random at this stage, since we start with random weights.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor([-2.2680, -1.7434, -2.2746, -2.7562, -2.7793, -2.4086, -2.2656, -2.2761,\n", + " -2.1634, -2.5035], grad_fn=) torch.Size([64, 10])\n" + ] + } + ], + "source": [ + "bs = 64 # batch size\n", + "\n", + "xb = x_train[0:bs] # a mini-batch from x\n", + "preds = model(xb) # predictions\n", + "preds[0], preds.shape\n", + "print(preds[0], preds.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you see, the ``preds`` tensor contains not only the tensor values, but also a\n", + "gradient function. We'll use this later to do backprop.\n", + "\n", + "Let's implement negative log-likelihood to use as the loss function\n", + "(again, we can just use standard Python):\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def nll(input, target):\n", + " return -input[range(target.shape[0]), target].mean()\n", + "\n", + "loss_func = nll" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check our loss with our random model, so we can see if we improve\n", + "after a backprop pass later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.4159, grad_fn=)\n" + ] + } + ], + "source": [ + "yb = y_train[0:bs]\n", + "print(loss_func(preds, yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's also implement a function to calculate the accuracy of our model.\n", + "For each prediction, if the index with the largest value matches the\n", + "target value, then the prediction was correct.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def accuracy(out, yb):\n", + " preds = torch.argmax(out, dim=1)\n", + " return (preds == yb).float().mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check the accuracy of our random model, so we can see if our\n", + "accuracy improves as our loss improves.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0781)\n" + ] + } + ], + "source": [ + "print(accuracy(preds, yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now run a training loop. For each iteration, we will:\n", + "\n", + "- select a mini-batch of data (of size ``bs``)\n", + "- use the model to make predictions\n", + "- calculate the loss\n", + "- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n", + " and ``bias``.\n", + "\n", + "We now use these gradients to update the weights and bias. We do this\n", + "within the ``torch.no_grad()`` context manager, because we do not want these\n", + "actions to be recorded for our next calculation of the gradient. You can read\n", + "more about how PyTorch's Autograd records operations\n", + "`here `_.\n", + "\n", + "We then set the\n", + "gradients to zero, so that we are ready for the next loop.\n", + "Otherwise, our gradients would record a running tally of all the operations\n", + "that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\n", + "already stored, rather than replacing them).\n", + "\n", + ".. tip:: You can use the standard python debugger to step through PyTorch\n", + " code, allowing you to check the various variable values at each step.\n", + " Uncomment ``set_trace()`` below to try it out.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from IPython.core.debugger import set_trace\n", + "\n", + "lr = 0.5 # learning rate\n", + "epochs = 2 # how many epochs to train for\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " # set_trace()\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " with torch.no_grad():\n", + " weights -= weights.grad * lr\n", + " bias -= bias.grad * lr\n", + " weights.grad.zero_()\n", + " bias.grad.zero_()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's it: we've created and trained a minimal neural network (in this case, a\n", + "logistic regression, since we have no hidden layers) entirely from scratch!\n", + "\n", + "Let's check the loss and accuracy and compare those to what we got\n", + "earlier. We expect that the loss will have decreased and accuracy to\n", + "have increased, and they have.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0825, grad_fn=) tensor(1.)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using torch.nn.functional\n", + "------------------------------\n", + "\n", + "We will now refactor our code, so that it does the same thing as before, only\n", + "we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\n", + "and flexible. At each step from here, we should be making our code one or more\n", + "of: shorter, more understandable, and/or more flexible.\n", + "\n", + "The first and easiest step is to make our code shorter by replacing our\n", + "hand-written activation and loss functions with those from ``torch.nn.functional``\n", + "(which is generally imported into the namespace ``F`` by convention). This module\n", + "contains all the functions in the ``torch.nn`` library (whereas other parts of the\n", + "library contain classes). As well as a wide range of loss and activation\n", + "functions, you'll also find here some convenient functions for creating neural\n", + "nets, such as pooling functions. (There are also functions for doing convolutions,\n", + "linear layers, etc, but as we'll see, these are usually better handled using\n", + "other parts of the library.)\n", + "\n", + "If you're using negative log likelihood loss and log softmax activation,\n", + "then Pytorch provides a single function ``F.cross_entropy`` that combines\n", + "the two. So we can even remove the activation function from our model.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import torch.nn.functional as F\n", + "\n", + "loss_func = F.cross_entropy\n", + "\n", + "def model(xb):\n", + " return xb @ weights + bias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\n", + "confirm that our loss and accuracy are the same as before:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0825, grad_fn=) tensor(1.)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using nn.Module\n", + "-----------------------------\n", + "Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\n", + "concise training loop. We subclass ``nn.Module`` (which itself is a class and\n", + "able to keep track of state). In this case, we want to create a class that\n", + "holds our weights, bias, and method for the forward step. ``nn.Module`` has a\n", + "number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\n", + "which we will be using.\n", + "\n", + "

Note

``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n", + " class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n", + " concept of a (lowercase ``m``) `module `_,\n", + " which is a file of Python code that can be imported.

\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch import nn\n", + "\n", + "class Mnist_Logistic(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n", + " self.bias = nn.Parameter(torch.zeros(10))\n", + "\n", + " def forward(self, xb):\n", + " return xb @ self.weights + self.bias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we're now using an object instead of just using a function, we\n", + "first have to instantiate our model:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model = Mnist_Logistic()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can calculate the loss in the same way as before. Note that\n", + "``nn.Module`` objects are used as if they are functions (i.e they are\n", + "*callable*), but behind the scenes Pytorch will call our ``forward``\n", + "method automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.1396, grad_fn=)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously for our training loop we had to update the values for each parameter\n", + "by name, and manually zero out the grads for each parameter separately, like this:\n", + "::\n", + " with torch.no_grad():\n", + " weights -= weights.grad * lr\n", + " bias -= bias.grad * lr\n", + " weights.grad.zero_()\n", + " bias.grad.zero_()\n", + "\n", + "\n", + "Now we can take advantage of model.parameters() and model.zero_grad() (which\n", + "are both defined by PyTorch for ``nn.Module``) to make those steps more concise\n", + "and less prone to the error of forgetting some of our parameters, particularly\n", + "if we had a more complicated model:\n", + "::\n", + " with torch.no_grad():\n", + " for p in model.parameters(): p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "\n", + "We'll wrap our little training loop in a ``fit`` function so we can run it\n", + "again later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def fit():\n", + " for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " with torch.no_grad():\n", + " for p in model.parameters():\n", + " p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "fit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's double-check that our loss has gone down:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0820, grad_fn=)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using nn.Linear\n", + "-------------------------\n", + "\n", + "We continue to refactor our code. Instead of manually defining and\n", + "initializing ``self.weights`` and ``self.bias``, and calculating ``xb @\n", + "self.weights + self.bias``, we will instead use the Pytorch class\n", + "`nn.Linear `_ for a\n", + "linear layer, which does all that for us. Pytorch has many types of\n", + "predefined layers that can greatly simplify our code, and often makes it\n", + "faster too.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Mnist_Logistic(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.lin = nn.Linear(784, 10)\n", + "\n", + " def forward(self, xb):\n", + " return self.lin(xb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We instantiate our model and calculate the loss in the same way as before:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.2840, grad_fn=)\n" + ] + } + ], + "source": [ + "model = Mnist_Logistic()\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are still able to use our same ``fit`` method as before.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0798, grad_fn=)\n" + ] + } + ], + "source": [ + "fit()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using optim\n", + "------------------------------\n", + "\n", + "Pytorch also has a package with various optimization algorithms, ``torch.optim``.\n", + "We can use the ``step`` method from our optimizer to take a forward step, instead\n", + "of manually updating each parameter.\n", + "\n", + "This will let us replace our previous manually coded optimization step:\n", + "::\n", + " with torch.no_grad():\n", + " for p in model.parameters(): p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "and instead use just:\n", + "::\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\n", + "computing the gradient for the next minibatch.)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch import optim" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll define a little function to create our model and optimizer so we\n", + "can reuse it in the future.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.2706, grad_fn=)\n", + "tensor(0.0798, grad_fn=)\n" + ] + } + ], + "source": [ + "def get_model():\n", + " model = Mnist_Logistic()\n", + " return model, optim.SGD(model.parameters(), lr=lr)\n", + "\n", + "model, opt = get_model()\n", + "print(loss_func(model(xb), yb))\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using Dataset\n", + "------------------------------\n", + "\n", + "PyTorch has an abstract Dataset class. A Dataset can be anything that has\n", + "a ``__len__`` function (called by Python's standard ``len`` function) and\n", + "a ``__getitem__`` function as a way of indexing into it.\n", + "`This tutorial `_\n", + "walks through a nice example of creating a custom ``FacialLandmarkDataset`` class\n", + "as a subclass of ``Dataset``.\n", + "\n", + "PyTorch's `TensorDataset `_\n", + "is a Dataset wrapping tensors. By defining a length and way of indexing,\n", + "this also gives us a way to iterate, index, and slice along the first\n", + "dimension of a tensor. This will make it easier to access both the\n", + "independent and dependent variables in the same line as we train.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch.utils.data import TensorDataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\n", + "which will be easier to iterate over and slice.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "train_ds = TensorDataset(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously, we had to iterate through minibatches of x and y values separately:\n", + "::\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + "\n", + "\n", + "Now, we can do these two steps together:\n", + "::\n", + " xb,yb = train_ds[i*bs : i*bs+bs]\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0817, grad_fn=)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " xb, yb = train_ds[i * bs: i * bs + bs]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using DataLoader\n", + "------------------------------\n", + "\n", + "Pytorch's ``DataLoader`` is responsible for managing batches. You can\n", + "create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\n", + "to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\n", + "the DataLoader gives us each minibatch automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch.utils.data import DataLoader\n", + "\n", + "train_ds = TensorDataset(x_train, y_train)\n", + "train_dl = DataLoader(train_ds, batch_size=bs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously, our loop iterated over batches (xb, yb) like this:\n", + "::\n", + " for i in range((n-1)//bs + 1):\n", + " xb,yb = train_ds[i*bs : i*bs+bs]\n", + " pred = model(xb)\n", + "\n", + "Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n", + "::\n", + " for xb,yb in train_dl:\n", + " pred = model(xb)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0815, grad_fn=)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " for xb, yb in train_dl:\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\n", + "our training loop is now dramatically smaller and easier to understand. Let's\n", + "now try to add the basic features necessary to create effective models in practice.\n", + "\n", + "Add validation\n", + "-----------------------\n", + "\n", + "In section 1, we were just trying to get a reasonable training loop set up for\n", + "use on our training data. In reality, you **always** should also have\n", + "a `validation set `_, in order\n", + "to identify if you are overfitting.\n", + "\n", + "Shuffling the training data is\n", + "`important `_\n", + "to prevent correlation between batches and overfitting. On the other hand, the\n", + "validation loss will be identical whether we shuffle the validation set or not.\n", + "Since shuffling takes extra time, it makes no sense to shuffle the validation data.\n", + "\n", + "We'll use a batch size for the validation set that is twice as large as\n", + "that for the training set. This is because the validation set does not\n", + "need backpropagation and thus takes less memory (it doesn't need to\n", + "store the gradients). We take advantage of this to use a larger batch\n", + "size and compute the loss more quickly.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "train_ds = TensorDataset(x_train, y_train)\n", + "train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n", + "\n", + "valid_ds = TensorDataset(x_valid, y_valid)\n", + "valid_dl = DataLoader(valid_ds, batch_size=bs * 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will calculate and print the validation loss at the end of each epoch.\n", + "\n", + "(Note that we always call ``model.train()`` before training, and ``model.eval()``\n", + "before inference, because these are used by layers such as ``nn.BatchNorm2d``\n", + "and ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 tensor(0.3232)\n", + "1 tensor(0.2736)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " model.train()\n", + " for xb, yb in train_dl:\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n", + "\n", + " print(epoch, valid_loss / len(valid_dl))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create fit() and get_data()\n", + "----------------------------------\n", + "\n", + "We'll now do a little refactoring of our own. Since we go through a similar\n", + "process twice of calculating the loss for both the training set and the\n", + "validation set, let's make that into its own function, ``loss_batch``, which\n", + "computes the loss for one batch.\n", + "\n", + "We pass an optimizer in for the training set, and use it to perform\n", + "backprop. For the validation set, we don't pass an optimizer, so the\n", + "method doesn't perform backprop.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def loss_batch(model, loss_func, xb, yb, opt=None):\n", + " loss = loss_func(model(xb), yb)\n", + "\n", + " if opt is not None:\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + " return loss.item(), len(xb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "``fit`` runs the necessary operations to train our model and compute the\n", + "training and validation losses for each epoch.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n", + " for epoch in range(epochs):\n", + " model.train()\n", + " for xb, yb in train_dl:\n", + " loss_batch(model, loss_func, xb, yb, opt)\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " losses, nums = zip(\n", + " *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n", + " )\n", + " val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n", + "\n", + " print(epoch, val_loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "``get_data`` returns dataloaders for the training and validation sets.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def get_data(train_ds, valid_ds, bs):\n", + " return (\n", + " DataLoader(train_ds, batch_size=bs, shuffle=True),\n", + " DataLoader(valid_ds, batch_size=bs * 2),\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, our whole process of obtaining the data loaders and fitting the\n", + "model can be run in 3 lines of code:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.36182342684268953\n", + "1 0.3086622476875782\n" + ] + } + ], + "source": [ + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "model, opt = get_model()\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use these basic 3 lines of code to train a wide variety of models.\n", + "Let's see if we can use them to train a convolutional neural network (CNN)!\n", + "\n", + "Switch to CNN\n", + "-------------\n", + "\n", + "We are now going to build our neural network with three convolutional layers.\n", + "Because none of the functions in the previous section assume anything about\n", + "the model form, we'll be able to use them to train a CNN without any modification.\n", + "\n", + "We will use Pytorch's predefined\n", + "`Conv2d `_ class\n", + "as our convolutional layer. We define a CNN with 3 convolutional layers.\n", + "Each convolution is followed by a ReLU. At the end, we perform an\n", + "average pooling. (Note that ``view`` is PyTorch's version of numpy's\n", + "``reshape``)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Mnist_CNN(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n", + " self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n", + " self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n", + "\n", + " def forward(self, xb):\n", + " xb = xb.view(-1, 1, 28, 28)\n", + " xb = F.relu(self.conv1(xb))\n", + " xb = F.relu(self.conv2(xb))\n", + " xb = F.relu(self.conv3(xb))\n", + " xb = F.avg_pool2d(xb, 4)\n", + " return xb.view(-1, xb.size(1))\n", + "\n", + "lr = 0.1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Momentum `_ is a variation on\n", + "stochastic gradient descent that takes previous updates into account as well\n", + "and generally leads to faster training.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.30878638651371004\n", + "1 0.25200295938253403\n" + ] + } + ], + "source": [ + "model = Mnist_CNN()\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", + "\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "nn.Sequential\n", + "------------------------\n", + "\n", + "``torch.nn`` has another handy class we can use to simplify our code:\n", + "`Sequential `_ .\n", + "A ``Sequential`` object runs each of the modules contained within it, in a\n", + "sequential manner. This is a simpler way of writing our neural network.\n", + "\n", + "To take advantage of this, we need to be able to easily define a\n", + "**custom layer** from a given function. For instance, PyTorch doesn't\n", + "have a `view` layer, and we need to create one for our network. ``Lambda``\n", + "will create a layer that we can then use when defining a network with\n", + "``Sequential``.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Lambda(nn.Module):\n", + " def __init__(self, func):\n", + " super().__init__()\n", + " self.func = func\n", + "\n", + " def forward(self, x):\n", + " return self.func(x)\n", + "\n", + "\n", + "def preprocess(x):\n", + " return x.view(-1, 1, 28, 28)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model created with ``Sequential`` is simply:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.32227418100833893\n", + "1 0.2695485789179802\n" + ] + } + ], + "source": [ + "model = nn.Sequential(\n", + " Lambda(preprocess),\n", + " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.AvgPool2d(4),\n", + " Lambda(lambda x: x.view(x.size(0), -1)),\n", + ")\n", + "\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", + "\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wrapping DataLoader\n", + "-----------------------------\n", + "\n", + "Our CNN is fairly concise, but it only works with MNIST, because:\n", + " - It assumes the input is a 28\\*28 long vector\n", + " - It assumes that the final CNN grid size is 4\\*4 (since that's the average\n", + "pooling kernel size we used)\n", + "\n", + "Let's get rid of these two assumptions, so our model works with any 2d\n", + "single channel image. First, we can remove the initial Lambda layer by\n", + "moving the data preprocessing into a generator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def preprocess(x, y):\n", + " return x.view(-1, 1, 28, 28), y\n", + "\n", + "\n", + "class WrappedDataLoader:\n", + " def __init__(self, dl, func):\n", + " self.dl = dl\n", + " self.func = func\n", + "\n", + " def __len__(self):\n", + " return len(self.dl)\n", + "\n", + " def __iter__(self):\n", + " batches = iter(self.dl)\n", + " for b in batches:\n", + " yield (self.func(*b))\n", + "\n", + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "train_dl = WrappedDataLoader(train_dl, preprocess)\n", + "valid_dl = WrappedDataLoader(valid_dl, preprocess)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\n", + "allows us to define the size of the *output* tensor we want, rather than\n", + "the *input* tensor we have. As a result, our model will work with any\n", + "size input.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model = nn.Sequential(\n", + " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.AdaptiveAvgPool2d(1),\n", + " Lambda(lambda x: x.view(x.size(0), -1)),\n", + ")\n", + "\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try it out:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.3791842395067215\n", + "1 0.26341770286560057\n" + ] + } + ], + "source": [ + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using your GPU\n", + "---------------\n", + "\n", + "If you're lucky enough to have access to a CUDA-capable GPU (you can\n", + "rent one for about $0.50/hour from most cloud providers) you can\n", + "use it to speed up your code. First check that your GPU is working in\n", + "Pytorch:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False\n" + ] + } + ], + "source": [ + "print(torch.cuda.is_available())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And then create a device object for it:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "dev = torch.device(\n", + " \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's update ``preprocess`` to move batches to the GPU:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def preprocess(x, y):\n", + " return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n", + "\n", + "\n", + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "train_dl = WrappedDataLoader(train_dl, preprocess)\n", + "valid_dl = WrappedDataLoader(valid_dl, preprocess)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can move our model to the GPU.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model.to(dev)\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should find it runs faster now:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Closing thoughts\n", + "-----------------\n", + "\n", + "We now have a general data pipeline and training loop which you can use for\n", + "training many types of models using Pytorch. To see how simple training a model\n", + "can now be, take a look at the `mnist_sample` sample notebook.\n", + "\n", + "Of course, there are many things you'll want to add, such as data augmentation,\n", + "hyperparameter tuning, monitoring training, transfer learning, and so forth.\n", + "These features are available in the fastai library, which has been developed\n", + "using the same design approach shown in this tutorial, providing a natural\n", + "next step for practitioners looking to take their models further.\n", + "\n", + "We promised at the start of this tutorial we'd explain through example each of\n", + "``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\n", + "what we've seen:\n", + "\n", + " - **torch.nn**\n", + "\n", + " + ``Module``: creates a callable which behaves like a function, but can also\n", + " contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n", + " contains and can zero all their gradients, loop through them for weight updates, etc.\n", + " + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n", + " that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n", + " + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n", + " which contains activation functions, loss functions, etc, as well as non-stateful\n", + " versions of layers such as convolutional and linear layers.\n", + " - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n", + " of ``Parameter`` during the backward step\n", + " - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n", + " including classes provided with Pytorch such as ``TensorDataset``\n", + " - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/.ipynb_checkpoints/quickstart_tutorial-checkpoint.ipynb b/.ipynb_checkpoints/quickstart_tutorial-checkpoint.ipynb index 3ea080e..3ccf563 100644 --- a/.ipynb_checkpoints/quickstart_tutorial-checkpoint.ipynb +++ b/.ipynb_checkpoints/quickstart_tutorial-checkpoint.ipynb @@ -45,7 +45,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": { "collapsed": false, "jupyter": { diff --git a/nn_tutorial.ipynb b/nn_tutorial.ipynb new file mode 100644 index 0000000..d7e71a3 --- /dev/null +++ b/nn_tutorial.ipynb @@ -0,0 +1,1928 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "What is `torch.nn` *really*?\n", + "============================\n", + "by Jeremy Howard, `fast.ai `_. Thanks to Rachel Thomas and Francisco Ingham.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We recommend running this tutorial as a notebook, not a script. To download the notebook (.ipynb) file,\n", + "click the link at the top of the page.\n", + "\n", + "PyTorch provides the elegantly designed modules and classes `torch.nn `_ ,\n", + "`torch.optim `_ ,\n", + "`Dataset `_ ,\n", + "and `DataLoader `_\n", + "to help you create and train neural networks.\n", + "In order to fully utilize their power and customize\n", + "them for your problem, you need to really understand exactly what they're\n", + "doing. To develop this understanding, we will first train basic neural net\n", + "on the MNIST data set without using any features from these models; we will\n", + "initially only use the most basic PyTorch tensor functionality. Then, we will\n", + "incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n", + "``DataLoader`` at a time, showing exactly what each piece does, and how it\n", + "works to make the code either more concise, or more flexible.\n", + "\n", + "**This tutorial assumes you already have PyTorch installed, and are familiar\n", + "with the basics of tensor operations.** (If you're familiar with Numpy array\n", + "operations, you'll find the PyTorch tensor operations used here nearly identical).\n", + "\n", + "MNIST data setup\n", + "----------------\n", + "\n", + "We will use the classic `MNIST `_ dataset,\n", + "which consists of black-and-white images of hand-drawn digits (between 0 and 9).\n", + "\n", + "We will use `pathlib `_\n", + "for dealing with paths (part of the Python 3 standard library), and will\n", + "download the dataset using\n", + "`requests `_. We will only\n", + "import modules when we use them, so you can see exactly what's being\n", + "used at each point.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import requests\n", + "\n", + "DATA_PATH = Path(\"data\")\n", + "PATH = DATA_PATH / \"mnist\"\n", + "\n", + "PATH.mkdir(parents=True, exist_ok=True)\n", + "\n", + "URL = \"https://github.com/pytorch/tutorials/raw/master/_static/\"\n", + "FILENAME = \"mnist.pkl.gz\"\n", + "\n", + "if not (PATH / FILENAME).exists():\n", + " content = requests.get(URL + FILENAME).content\n", + " (PATH / FILENAME).open(\"wb\").write(content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This dataset is in numpy array format, and has been stored using pickle,\n", + "a python-specific format for serializing data.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import pickle\n", + "import gzip\n", + "\n", + "with gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n", + " ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each image is 28 x 28, and is being stored as a flattened row of length\n", + "784 (=28x28). Let's take a look at one; we need to reshape it to 2d\n", + "first.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(50000, 784)\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAN8klEQVR4nO3df6jVdZ7H8ddrbfojxzI39iZOrWOEUdE6i9nSyjYRTj8o7FYMIzQ0JDl/JDSwyIb7xxSLIVu6rBSDDtXYMus0UJHFMNVm5S6BdDMrs21qoxjlphtmmv1a9b1/3K9xp+75nOs53/PD+34+4HDO+b7P93zffPHl99f53o8jQgAmvj/rdQMAuoOwA0kQdiAJwg4kQdiBJE7o5sJsc+of6LCI8FjT29qy277C9lu237F9ezvfBaCz3Op1dtuTJP1B0gJJOyW9JGlRROwozMOWHeiwTmzZ50l6JyLejYgvJf1G0sI2vg9AB7UT9hmS/jjq/c5q2p+wvcT2kO2hNpYFoE0dP0EXEeskrZPYjQd6qZ0t+y5JZ4x6/51qGoA+1E7YX5J0tu3v2j5R0o8kbaynLQB1a3k3PiIO2V4q6SlJkyQ9EBFv1NYZgFq1fOmtpYVxzA50XEd+VAPg+EHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEi0P2Yzjw6RJk4r1U045paPLX7p0acPaSSedVJx39uzZxfqtt95arN9zzz0Na4sWLSrO+/nnnxfrK1euLNbvvPPOYr0X2gq77fckHZB0WNKhiJhbR1MA6lfHlv3SiPiwhu8B0EEcswNJtBv2kPS07ZdtLxnrA7aX2B6yPdTmsgC0od3d+PkRscv2X0h6xvZ/R8Tm0R+IiHWS1kmS7WhzeQBa1NaWPSJ2Vc97JD0maV4dTQGoX8thtz3Z9pSjryX9QNL2uhoDUK92duMHJD1m++j3/HtE/L6WriaYM888s1g/8cQTi/WLL764WJ8/f37D2tSpU4vzXn/99cV6L+3cubNYX7NmTbE+ODjYsHbgwIHivK+++mqx/sILLxTr/ajlsEfEu5L+qsZeAHQQl96AJAg7kARhB5Ig7EAShB1IwhHd+1HbRP0F3Zw5c4r1TZs2Feudvs20Xx05cqRYv/nmm4v1Tz75pOVlDw8PF+sfffRRsf7WW2+1vOxOiwiPNZ0tO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwXX2GkybNq1Y37JlS7E+a9asOtupVbPe9+3bV6xfeumlDWtffvllcd6svz9oF9fZgeQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJhmyuwd69e4v1ZcuWFetXX311sf7KK68U683+pHLJtm3bivUFCxYU6wcPHizWzzvvvIa12267rTgv6sWWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4H72PnDyyScX682GF167dm3D2uLFi4vz3njjjcX6hg0binX0n5bvZ7f9gO09trePmjbN9jO2366eT62zWQD1G89u/K8kXfG1abdLejYizpb0bPUeQB9rGvaI2Czp678HXShpffV6vaRr620LQN1a/W38QEQcHSzrA0kDjT5oe4mkJS0uB0BN2r4RJiKidOItItZJWidxgg7opVYvve22PV2Squc99bUEoBNaDftGSTdVr2+S9Hg97QDolKa78bY3SPq+pNNs75T0c0krJf3W9mJJ70v6YSebnOj279/f1vwff/xxy/PecsstxfrDDz9crDcbYx39o2nYI2JRg9JlNfcCoIP4uSyQBGEHkiDsQBKEHUiCsANJcIvrBDB58uSGtSeeeKI47yWXXFKsX3nllcX6008/Xayj+xiyGUiOsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dr7BHfWWWcV61u3bi3W9+3bV6w/99xzxfrQ0FDD2n333Vect5v/NicSrrMDyRF2IAnCDiRB2IEkCDuQBGEHkiDsQBJcZ09ucHCwWH/wwQeL9SlTprS87OXLlxfrDz30ULE+PDxcrGfFdXYgOcIOJEHYgSQIO5AEYQeSIOxAEoQdSILr7Cg6//zzi/XVq1cX65dd1vpgv2vXri3WV6xYUazv2rWr5WUfz1q+zm77Adt7bG8fNe0O27tsb6seV9XZLID6jWc3/leSrhhj+r9ExJzq8bt62wJQt6Zhj4jNkvZ2oRcAHdTOCbqltl+rdvNPbfQh20tsD9lu/MfIAHRcq2H/haSzJM2RNCxpVaMPRsS6iJgbEXNbXBaAGrQU9ojYHRGHI+KIpF9KmldvWwDq1lLYbU8f9XZQ0vZGnwXQH5peZ7e9QdL3JZ0mabekn1fv50gKSe9J+mlENL25mOvsE8/UqVOL9WuuuaZhrdm98vaYl4u/smnTpmJ9wYIFxfpE1eg6+wnjmHHRGJPvb7sjAF3Fz2WBJAg7kARhB5Ig7EAShB1Igltc0TNffPFFsX7CCeWLRYcOHSrWL7/88oa1559/vjjv8Yw/JQ0kR9iBJAg7kARhB5Ig7EAShB1IgrADSTS96w25XXDBBcX6DTfcUKxfeOGFDWvNrqM3s2PHjmJ98+bNbX3/RMOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dr7BDd79uxifenSpcX6ddddV6yffvrpx9zTeB0+fLhYHx4u//XyI0eO1NnOcY8tO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwXX240Cza9mLFo010O6IZtfRZ86c2UpLtRgaGirWV6xYUaxv3LixznYmvKZbdttn2H7O9g7bb9i+rZo+zfYztt+unk/tfLsAWjWe3fhDkv4+Is6V9DeSbrV9rqTbJT0bEWdLerZ6D6BPNQ17RAxHxNbq9QFJb0qaIWmhpPXVx9ZLurZDPQKowTEds9ueKel7krZIGoiIoz9O/kDSQIN5lkha0kaPAGow7rPxtr8t6RFJP4uI/aNrMTI65JiDNkbEuoiYGxFz2+oUQFvGFXbb39JI0H8dEY9Wk3fbnl7Vp0va05kWAdSh6W68bUu6X9KbEbF6VGmjpJskrayeH+9IhxPAwMCYRzhfOffcc4v1e++9t1g/55xzjrmnumzZsqVYv/vuuxvWHn+8/E+GW1TrNZ5j9r+V9GNJr9veVk1brpGQ/9b2YknvS/phRzoEUIumYY+I/5I05uDuki6rtx0AncLPZYEkCDuQBGEHkiDsQBKEHUiCW1zHadq0aQ1ra9euLc47Z86cYn3WrFmttFSLF198sVhftWpVsf7UU08V65999tkx94TOYMsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0mkuc5+0UUXFevLli0r1ufNm9ewNmPGjJZ6qsunn37asLZmzZrivHfddVexfvDgwZZ6Qv9hyw4kQdiBJAg7kARhB5Ig7EAShB1IgrADSaS5zj44ONhWvR07duwo1p988sli/dChQ8V66Z7zffv2FedFHmzZgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJR0T5A/YZkh6SNCApJK2LiH+1fYekWyT9b/XR5RHxuybfVV4YgLZFxJijLo8n7NMlTY+IrbanSHpZ0rUaGY/9k4i4Z7xNEHag8xqFfTzjsw9LGq5eH7D9pqTe/mkWAMfsmI7Zbc+U9D1JW6pJS22/ZvsB26c2mGeJ7SHbQ+21CqAdTXfjv/qg/W1JL0haERGP2h6Q9KFGjuP/SSO7+jc3+Q5244EOa/mYXZJsf0vSk5KeiojVY9RnSnoyIs5v8j2EHeiwRmFvuhtv25Lul/Tm6KBXJ+6OGpS0vd0mAXTOeM7Gz5f0n5Jel3Skmrxc0iJJczSyG/+epJ9WJ/NK38WWHeiwtnbj60LYgc5reTcewMRA2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSKLbQzZ/KOn9Ue9Pq6b1o37trV/7kuitVXX29peNCl29n/0bC7eHImJuzxoo6Nfe+rUvid5a1a3e2I0HkiDsQBK9Dvu6Hi+/pF9769e+JHprVVd66+kxO4Du6fWWHUCXEHYgiZ6E3fYVtt+y/Y7t23vRQyO237P9uu1tvR6frhpDb4/t7aOmTbP9jO23q+cxx9jrUW932N5Vrbtttq/qUW9n2H7O9g7bb9i+rZre03VX6Ksr663rx+y2J0n6g6QFknZKeknSoojY0dVGGrD9nqS5EdHzH2DY/jtJn0h66OjQWrb/WdLeiFhZ/Ud5akT8Q5/0doeOcRjvDvXWaJjxn6iH667O4c9b0Yst+zxJ70TEuxHxpaTfSFrYgz76XkRslrT3a5MXSlpfvV6vkX8sXdegt74QEcMRsbV6fUDS0WHGe7ruCn11RS/CPkPSH0e936n+Gu89JD1t+2XbS3rdzBgGRg2z9YGkgV42M4amw3h309eGGe+bddfK8Oft4gTdN82PiL+WdKWkW6vd1b4UI8dg/XTt9BeSztLIGIDDklb1splqmPFHJP0sIvaPrvVy3Y3RV1fWWy/CvkvSGaPef6ea1hciYlf1vEfSYxo57Ognu4+OoFs97+lxP1+JiN0RcTgijkj6pXq47qphxh+R9OuIeLSa3PN1N1Zf3VpvvQj7S5LOtv1d2ydK+pGkjT3o4xtsT65OnMj2ZEk/UP8NRb1R0k3V65skPd7DXv5Evwzj3WiYcfV43fV8+POI6PpD0lUaOSP/P5L+sRc9NOhrlqRXq8cbve5N0gaN7Nb9n0bObSyW9OeSnpX0tqT/kDStj3r7N40M7f2aRoI1vUe9zdfILvprkrZVj6t6ve4KfXVlvfFzWSAJTtABSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBL/DyJ7caZa7LphAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "from matplotlib import pyplot\n", + "import numpy as np\n", + "\n", + "pyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n", + "print(x_train.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\n", + "convert our data.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor([[0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " ...,\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.],\n", + " [0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8])\n", + "torch.Size([50000, 784])\n", + "tensor(0) tensor(9)\n" + ] + } + ], + "source": [ + "import torch\n", + "\n", + "x_train, y_train, x_valid, y_valid = map(\n", + " torch.tensor, (x_train, y_train, x_valid, y_valid)\n", + ")\n", + "n, c = x_train.shape\n", + "print(x_train, y_train)\n", + "print(x_train.shape)\n", + "print(y_train.min(), y_train.max())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Neural net from scratch (no torch.nn)\n", + "---------------------------------------------\n", + "\n", + "Let's first create a model using nothing but PyTorch tensor operations. We're assuming\n", + "you're already familiar with the basics of neural networks. (If you're not, you can\n", + "learn them at `course.fast.ai `_).\n", + "\n", + "PyTorch provides methods to create random or zero-filled tensors, which we will\n", + "use to create our weights and bias for a simple linear model. These are just regular\n", + "tensors, with one very special addition: we tell PyTorch that they require a\n", + "gradient. This causes PyTorch to record all of the operations done on the tensor,\n", + "so that it can calculate the gradient during back-propagation *automatically*!\n", + "\n", + "For the weights, we set ``requires_grad`` **after** the initialization, since we\n", + "don't want that step included in the gradient. (Note that a trailing ``_`` in\n", + "PyTorch signifies that the operation is performed in-place.)\n", + "\n", + "

Note

We are initializing the weights here with\n", + " `Xavier initialisation `_\n", + " (by multiplying with 1/sqrt(n)).

\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import math\n", + "\n", + "weights = torch.randn(784, 10) / math.sqrt(784)\n", + "weights.requires_grad_()\n", + "bias = torch.zeros(10, requires_grad=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thanks to PyTorch's ability to calculate gradients automatically, we can\n", + "use any standard Python function (or callable object) as a model! So\n", + "let's just write a plain matrix multiplication and broadcasted addition\n", + "to create a simple linear model. We also need an activation function, so\n", + "we'll write `log_softmax` and use it. Remember: although PyTorch\n", + "provides lots of pre-written loss functions, activation functions, and\n", + "so forth, you can easily write your own using plain python. PyTorch will\n", + "even create fast GPU or vectorized CPU code for your function\n", + "automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def log_softmax(x):\n", + " return x - x.exp().sum(-1).log().unsqueeze(-1)\n", + "\n", + "def model(xb):\n", + " return log_softmax(xb @ weights + bias)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above, the ``@`` stands for the dot product operation. We will call\n", + "our function on one batch of data (in this case, 64 images). This is\n", + "one *forward pass*. Note that our predictions won't be any better than\n", + "random at this stage, since we start with random weights.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor([-2.2680, -1.7434, -2.2746, -2.7562, -2.7793, -2.4086, -2.2656, -2.2761,\n", + " -2.1634, -2.5035], grad_fn=) torch.Size([64, 10])\n" + ] + } + ], + "source": [ + "bs = 64 # batch size\n", + "\n", + "xb = x_train[0:bs] # a mini-batch from x\n", + "preds = model(xb) # predictions\n", + "preds[0], preds.shape\n", + "print(preds[0], preds.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you see, the ``preds`` tensor contains not only the tensor values, but also a\n", + "gradient function. We'll use this later to do backprop.\n", + "\n", + "Let's implement negative log-likelihood to use as the loss function\n", + "(again, we can just use standard Python):\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def nll(input, target):\n", + " return -input[range(target.shape[0]), target].mean()\n", + "\n", + "loss_func = nll" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check our loss with our random model, so we can see if we improve\n", + "after a backprop pass later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.4159, grad_fn=)\n" + ] + } + ], + "source": [ + "yb = y_train[0:bs]\n", + "print(loss_func(preds, yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's also implement a function to calculate the accuracy of our model.\n", + "For each prediction, if the index with the largest value matches the\n", + "target value, then the prediction was correct.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def accuracy(out, yb):\n", + " preds = torch.argmax(out, dim=1)\n", + " return (preds == yb).float().mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check the accuracy of our random model, so we can see if our\n", + "accuracy improves as our loss improves.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0781)\n" + ] + } + ], + "source": [ + "print(accuracy(preds, yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now run a training loop. For each iteration, we will:\n", + "\n", + "- select a mini-batch of data (of size ``bs``)\n", + "- use the model to make predictions\n", + "- calculate the loss\n", + "- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n", + " and ``bias``.\n", + "\n", + "We now use these gradients to update the weights and bias. We do this\n", + "within the ``torch.no_grad()`` context manager, because we do not want these\n", + "actions to be recorded for our next calculation of the gradient. You can read\n", + "more about how PyTorch's Autograd records operations\n", + "`here `_.\n", + "\n", + "We then set the\n", + "gradients to zero, so that we are ready for the next loop.\n", + "Otherwise, our gradients would record a running tally of all the operations\n", + "that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\n", + "already stored, rather than replacing them).\n", + "\n", + ".. tip:: You can use the standard python debugger to step through PyTorch\n", + " code, allowing you to check the various variable values at each step.\n", + " Uncomment ``set_trace()`` below to try it out.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from IPython.core.debugger import set_trace\n", + "\n", + "lr = 0.5 # learning rate\n", + "epochs = 2 # how many epochs to train for\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " # set_trace()\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " with torch.no_grad():\n", + " weights -= weights.grad * lr\n", + " bias -= bias.grad * lr\n", + " weights.grad.zero_()\n", + " bias.grad.zero_()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's it: we've created and trained a minimal neural network (in this case, a\n", + "logistic regression, since we have no hidden layers) entirely from scratch!\n", + "\n", + "Let's check the loss and accuracy and compare those to what we got\n", + "earlier. We expect that the loss will have decreased and accuracy to\n", + "have increased, and they have.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0825, grad_fn=) tensor(1.)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using torch.nn.functional\n", + "------------------------------\n", + "\n", + "We will now refactor our code, so that it does the same thing as before, only\n", + "we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\n", + "and flexible. At each step from here, we should be making our code one or more\n", + "of: shorter, more understandable, and/or more flexible.\n", + "\n", + "The first and easiest step is to make our code shorter by replacing our\n", + "hand-written activation and loss functions with those from ``torch.nn.functional``\n", + "(which is generally imported into the namespace ``F`` by convention). This module\n", + "contains all the functions in the ``torch.nn`` library (whereas other parts of the\n", + "library contain classes). As well as a wide range of loss and activation\n", + "functions, you'll also find here some convenient functions for creating neural\n", + "nets, such as pooling functions. (There are also functions for doing convolutions,\n", + "linear layers, etc, but as we'll see, these are usually better handled using\n", + "other parts of the library.)\n", + "\n", + "If you're using negative log likelihood loss and log softmax activation,\n", + "then Pytorch provides a single function ``F.cross_entropy`` that combines\n", + "the two. So we can even remove the activation function from our model.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import torch.nn.functional as F\n", + "\n", + "loss_func = F.cross_entropy\n", + "\n", + "def model(xb):\n", + " return xb @ weights + bias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\n", + "confirm that our loss and accuracy are the same as before:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0825, grad_fn=) tensor(1.)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using nn.Module\n", + "-----------------------------\n", + "Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\n", + "concise training loop. We subclass ``nn.Module`` (which itself is a class and\n", + "able to keep track of state). In this case, we want to create a class that\n", + "holds our weights, bias, and method for the forward step. ``nn.Module`` has a\n", + "number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\n", + "which we will be using.\n", + "\n", + "

Note

``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n", + " class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n", + " concept of a (lowercase ``m``) `module `_,\n", + " which is a file of Python code that can be imported.

\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch import nn\n", + "\n", + "class Mnist_Logistic(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n", + " self.bias = nn.Parameter(torch.zeros(10))\n", + "\n", + " def forward(self, xb):\n", + " return xb @ self.weights + self.bias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we're now using an object instead of just using a function, we\n", + "first have to instantiate our model:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model = Mnist_Logistic()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can calculate the loss in the same way as before. Note that\n", + "``nn.Module`` objects are used as if they are functions (i.e they are\n", + "*callable*), but behind the scenes Pytorch will call our ``forward``\n", + "method automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.1396, grad_fn=)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously for our training loop we had to update the values for each parameter\n", + "by name, and manually zero out the grads for each parameter separately, like this:\n", + "::\n", + " with torch.no_grad():\n", + " weights -= weights.grad * lr\n", + " bias -= bias.grad * lr\n", + " weights.grad.zero_()\n", + " bias.grad.zero_()\n", + "\n", + "\n", + "Now we can take advantage of model.parameters() and model.zero_grad() (which\n", + "are both defined by PyTorch for ``nn.Module``) to make those steps more concise\n", + "and less prone to the error of forgetting some of our parameters, particularly\n", + "if we had a more complicated model:\n", + "::\n", + " with torch.no_grad():\n", + " for p in model.parameters(): p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "\n", + "We'll wrap our little training loop in a ``fit`` function so we can run it\n", + "again later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def fit():\n", + " for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " with torch.no_grad():\n", + " for p in model.parameters():\n", + " p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "fit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's double-check that our loss has gone down:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0820, grad_fn=)\n" + ] + } + ], + "source": [ + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using nn.Linear\n", + "-------------------------\n", + "\n", + "We continue to refactor our code. Instead of manually defining and\n", + "initializing ``self.weights`` and ``self.bias``, and calculating ``xb @\n", + "self.weights + self.bias``, we will instead use the Pytorch class\n", + "`nn.Linear `_ for a\n", + "linear layer, which does all that for us. Pytorch has many types of\n", + "predefined layers that can greatly simplify our code, and often makes it\n", + "faster too.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Mnist_Logistic(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.lin = nn.Linear(784, 10)\n", + "\n", + " def forward(self, xb):\n", + " return self.lin(xb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We instantiate our model and calculate the loss in the same way as before:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.2840, grad_fn=)\n" + ] + } + ], + "source": [ + "model = Mnist_Logistic()\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are still able to use our same ``fit`` method as before.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0798, grad_fn=)\n" + ] + } + ], + "source": [ + "fit()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using optim\n", + "------------------------------\n", + "\n", + "Pytorch also has a package with various optimization algorithms, ``torch.optim``.\n", + "We can use the ``step`` method from our optimizer to take a forward step, instead\n", + "of manually updating each parameter.\n", + "\n", + "This will let us replace our previous manually coded optimization step:\n", + "::\n", + " with torch.no_grad():\n", + " for p in model.parameters(): p -= p.grad * lr\n", + " model.zero_grad()\n", + "\n", + "and instead use just:\n", + "::\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\n", + "computing the gradient for the next minibatch.)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch import optim" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll define a little function to create our model and optimizer so we\n", + "can reuse it in the future.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(2.2706, grad_fn=)\n", + "tensor(0.0798, grad_fn=)\n" + ] + } + ], + "source": [ + "def get_model():\n", + " model = Mnist_Logistic()\n", + " return model, optim.SGD(model.parameters(), lr=lr)\n", + "\n", + "model, opt = get_model()\n", + "print(loss_func(model(xb), yb))\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " start_i = i * bs\n", + " end_i = start_i + bs\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using Dataset\n", + "------------------------------\n", + "\n", + "PyTorch has an abstract Dataset class. A Dataset can be anything that has\n", + "a ``__len__`` function (called by Python's standard ``len`` function) and\n", + "a ``__getitem__`` function as a way of indexing into it.\n", + "`This tutorial `_\n", + "walks through a nice example of creating a custom ``FacialLandmarkDataset`` class\n", + "as a subclass of ``Dataset``.\n", + "\n", + "PyTorch's `TensorDataset `_\n", + "is a Dataset wrapping tensors. By defining a length and way of indexing,\n", + "this also gives us a way to iterate, index, and slice along the first\n", + "dimension of a tensor. This will make it easier to access both the\n", + "independent and dependent variables in the same line as we train.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch.utils.data import TensorDataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\n", + "which will be easier to iterate over and slice.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "train_ds = TensorDataset(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously, we had to iterate through minibatches of x and y values separately:\n", + "::\n", + " xb = x_train[start_i:end_i]\n", + " yb = y_train[start_i:end_i]\n", + "\n", + "\n", + "Now, we can do these two steps together:\n", + "::\n", + " xb,yb = train_ds[i*bs : i*bs+bs]\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0817, grad_fn=)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " for i in range((n - 1) // bs + 1):\n", + " xb, yb = train_ds[i * bs: i * bs + bs]\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refactor using DataLoader\n", + "------------------------------\n", + "\n", + "Pytorch's ``DataLoader`` is responsible for managing batches. You can\n", + "create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\n", + "to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\n", + "the DataLoader gives us each minibatch automatically.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from torch.utils.data import DataLoader\n", + "\n", + "train_ds = TensorDataset(x_train, y_train)\n", + "train_dl = DataLoader(train_ds, batch_size=bs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Previously, our loop iterated over batches (xb, yb) like this:\n", + "::\n", + " for i in range((n-1)//bs + 1):\n", + " xb,yb = train_ds[i*bs : i*bs+bs]\n", + " pred = model(xb)\n", + "\n", + "Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n", + "::\n", + " for xb,yb in train_dl:\n", + " pred = model(xb)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tensor(0.0815, grad_fn=)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " for xb, yb in train_dl:\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + "print(loss_func(model(xb), yb))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\n", + "our training loop is now dramatically smaller and easier to understand. Let's\n", + "now try to add the basic features necessary to create effective models in practice.\n", + "\n", + "Add validation\n", + "-----------------------\n", + "\n", + "In section 1, we were just trying to get a reasonable training loop set up for\n", + "use on our training data. In reality, you **always** should also have\n", + "a `validation set `_, in order\n", + "to identify if you are overfitting.\n", + "\n", + "Shuffling the training data is\n", + "`important `_\n", + "to prevent correlation between batches and overfitting. On the other hand, the\n", + "validation loss will be identical whether we shuffle the validation set or not.\n", + "Since shuffling takes extra time, it makes no sense to shuffle the validation data.\n", + "\n", + "We'll use a batch size for the validation set that is twice as large as\n", + "that for the training set. This is because the validation set does not\n", + "need backpropagation and thus takes less memory (it doesn't need to\n", + "store the gradients). We take advantage of this to use a larger batch\n", + "size and compute the loss more quickly.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "train_ds = TensorDataset(x_train, y_train)\n", + "train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n", + "\n", + "valid_ds = TensorDataset(x_valid, y_valid)\n", + "valid_dl = DataLoader(valid_ds, batch_size=bs * 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will calculate and print the validation loss at the end of each epoch.\n", + "\n", + "(Note that we always call ``model.train()`` before training, and ``model.eval()``\n", + "before inference, because these are used by layers such as ``nn.BatchNorm2d``\n", + "and ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 tensor(0.3232)\n", + "1 tensor(0.2736)\n" + ] + } + ], + "source": [ + "model, opt = get_model()\n", + "\n", + "for epoch in range(epochs):\n", + " model.train()\n", + " for xb, yb in train_dl:\n", + " pred = model(xb)\n", + " loss = loss_func(pred, yb)\n", + "\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n", + "\n", + " print(epoch, valid_loss / len(valid_dl))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create fit() and get_data()\n", + "----------------------------------\n", + "\n", + "We'll now do a little refactoring of our own. Since we go through a similar\n", + "process twice of calculating the loss for both the training set and the\n", + "validation set, let's make that into its own function, ``loss_batch``, which\n", + "computes the loss for one batch.\n", + "\n", + "We pass an optimizer in for the training set, and use it to perform\n", + "backprop. For the validation set, we don't pass an optimizer, so the\n", + "method doesn't perform backprop.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def loss_batch(model, loss_func, xb, yb, opt=None):\n", + " loss = loss_func(model(xb), yb)\n", + "\n", + " if opt is not None:\n", + " loss.backward()\n", + " opt.step()\n", + " opt.zero_grad()\n", + "\n", + " return loss.item(), len(xb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "``fit`` runs the necessary operations to train our model and compute the\n", + "training and validation losses for each epoch.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n", + " for epoch in range(epochs):\n", + " model.train()\n", + " for xb, yb in train_dl:\n", + " loss_batch(model, loss_func, xb, yb, opt)\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " losses, nums = zip(\n", + " *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n", + " )\n", + " val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n", + "\n", + " print(epoch, val_loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "``get_data`` returns dataloaders for the training and validation sets.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def get_data(train_ds, valid_ds, bs):\n", + " return (\n", + " DataLoader(train_ds, batch_size=bs, shuffle=True),\n", + " DataLoader(valid_ds, batch_size=bs * 2),\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, our whole process of obtaining the data loaders and fitting the\n", + "model can be run in 3 lines of code:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.36182342684268953\n", + "1 0.3086622476875782\n" + ] + } + ], + "source": [ + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "model, opt = get_model()\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use these basic 3 lines of code to train a wide variety of models.\n", + "Let's see if we can use them to train a convolutional neural network (CNN)!\n", + "\n", + "Switch to CNN\n", + "-------------\n", + "\n", + "We are now going to build our neural network with three convolutional layers.\n", + "Because none of the functions in the previous section assume anything about\n", + "the model form, we'll be able to use them to train a CNN without any modification.\n", + "\n", + "We will use Pytorch's predefined\n", + "`Conv2d `_ class\n", + "as our convolutional layer. We define a CNN with 3 convolutional layers.\n", + "Each convolution is followed by a ReLU. At the end, we perform an\n", + "average pooling. (Note that ``view`` is PyTorch's version of numpy's\n", + "``reshape``)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Mnist_CNN(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n", + " self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n", + " self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n", + "\n", + " def forward(self, xb):\n", + " xb = xb.view(-1, 1, 28, 28)\n", + " xb = F.relu(self.conv1(xb))\n", + " xb = F.relu(self.conv2(xb))\n", + " xb = F.relu(self.conv3(xb))\n", + " xb = F.avg_pool2d(xb, 4)\n", + " return xb.view(-1, xb.size(1))\n", + "\n", + "lr = 0.1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Momentum `_ is a variation on\n", + "stochastic gradient descent that takes previous updates into account as well\n", + "and generally leads to faster training.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.30878638651371004\n", + "1 0.25200295938253403\n" + ] + } + ], + "source": [ + "model = Mnist_CNN()\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", + "\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "nn.Sequential\n", + "------------------------\n", + "\n", + "``torch.nn`` has another handy class we can use to simplify our code:\n", + "`Sequential `_ .\n", + "A ``Sequential`` object runs each of the modules contained within it, in a\n", + "sequential manner. This is a simpler way of writing our neural network.\n", + "\n", + "To take advantage of this, we need to be able to easily define a\n", + "**custom layer** from a given function. For instance, PyTorch doesn't\n", + "have a `view` layer, and we need to create one for our network. ``Lambda``\n", + "will create a layer that we can then use when defining a network with\n", + "``Sequential``.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "class Lambda(nn.Module):\n", + " def __init__(self, func):\n", + " super().__init__()\n", + " self.func = func\n", + "\n", + " def forward(self, x):\n", + " return self.func(x)\n", + "\n", + "\n", + "def preprocess(x):\n", + " return x.view(-1, 1, 28, 28)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model created with ``Sequential`` is simply:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.32227418100833893\n", + "1 0.2695485789179802\n" + ] + } + ], + "source": [ + "model = nn.Sequential(\n", + " Lambda(preprocess),\n", + " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.AvgPool2d(4),\n", + " Lambda(lambda x: x.view(x.size(0), -1)),\n", + ")\n", + "\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", + "\n", + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wrapping DataLoader\n", + "-----------------------------\n", + "\n", + "Our CNN is fairly concise, but it only works with MNIST, because:\n", + " - It assumes the input is a 28\\*28 long vector\n", + " - It assumes that the final CNN grid size is 4\\*4 (since that's the average\n", + "pooling kernel size we used)\n", + "\n", + "Let's get rid of these two assumptions, so our model works with any 2d\n", + "single channel image. First, we can remove the initial Lambda layer by\n", + "moving the data preprocessing into a generator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def preprocess(x, y):\n", + " return x.view(-1, 1, 28, 28), y\n", + "\n", + "\n", + "class WrappedDataLoader:\n", + " def __init__(self, dl, func):\n", + " self.dl = dl\n", + " self.func = func\n", + "\n", + " def __len__(self):\n", + " return len(self.dl)\n", + "\n", + " def __iter__(self):\n", + " batches = iter(self.dl)\n", + " for b in batches:\n", + " yield (self.func(*b))\n", + "\n", + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "train_dl = WrappedDataLoader(train_dl, preprocess)\n", + "valid_dl = WrappedDataLoader(valid_dl, preprocess)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\n", + "allows us to define the size of the *output* tensor we want, rather than\n", + "the *input* tensor we have. As a result, our model will work with any\n", + "size input.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model = nn.Sequential(\n", + " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", + " nn.ReLU(),\n", + " nn.AdaptiveAvgPool2d(1),\n", + " Lambda(lambda x: x.view(x.size(0), -1)),\n", + ")\n", + "\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try it out:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 0.3791842395067215\n", + "1 0.26341770286560057\n" + ] + } + ], + "source": [ + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using your GPU\n", + "---------------\n", + "\n", + "If you're lucky enough to have access to a CUDA-capable GPU (you can\n", + "rent one for about $0.50/hour from most cloud providers) you can\n", + "use it to speed up your code. First check that your GPU is working in\n", + "Pytorch:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False\n" + ] + } + ], + "source": [ + "print(torch.cuda.is_available())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And then create a device object for it:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "dev = torch.device(\n", + " \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's update ``preprocess`` to move batches to the GPU:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "def preprocess(x, y):\n", + " return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n", + "\n", + "\n", + "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", + "train_dl = WrappedDataLoader(train_dl, preprocess)\n", + "valid_dl = WrappedDataLoader(valid_dl, preprocess)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can move our model to the GPU.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "model.to(dev)\n", + "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should find it runs faster now:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Closing thoughts\n", + "-----------------\n", + "\n", + "We now have a general data pipeline and training loop which you can use for\n", + "training many types of models using Pytorch. To see how simple training a model\n", + "can now be, take a look at the `mnist_sample` sample notebook.\n", + "\n", + "Of course, there are many things you'll want to add, such as data augmentation,\n", + "hyperparameter tuning, monitoring training, transfer learning, and so forth.\n", + "These features are available in the fastai library, which has been developed\n", + "using the same design approach shown in this tutorial, providing a natural\n", + "next step for practitioners looking to take their models further.\n", + "\n", + "We promised at the start of this tutorial we'd explain through example each of\n", + "``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\n", + "what we've seen:\n", + "\n", + " - **torch.nn**\n", + "\n", + " + ``Module``: creates a callable which behaves like a function, but can also\n", + " contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n", + " contains and can zero all their gradients, loop through them for weight updates, etc.\n", + " + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n", + " that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n", + " + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n", + " which contains activation functions, loss functions, etc, as well as non-stateful\n", + " versions of layers such as convolutional and linear layers.\n", + " - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n", + " of ``Parameter`` during the backward step\n", + " - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n", + " including classes provided with Pytorch such as ``TensorDataset``\n", + " - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/quickstart_tutorial.ipynb b/quickstart_tutorial.ipynb index 3ea080e..3ccf563 100644 --- a/quickstart_tutorial.ipynb +++ b/quickstart_tutorial.ipynb @@ -45,7 +45,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": { "collapsed": false, "jupyter": {