1951 lines
65 KiB
Text
1951 lines
65 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"\n",
|
|
"What is `torch.nn` *really*?\n",
|
|
"============================\n",
|
|
"by Jeremy Howard, `fast.ai <https://www.fast.ai>`_. Thanks to Rachel Thomas and Francisco Ingham.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We recommend running this tutorial as a notebook, not a script. To download the notebook (.ipynb) file,\n",
|
|
"click the link at the top of the page.\n",
|
|
"\n",
|
|
"PyTorch provides the elegantly designed modules and classes `torch.nn <https://pytorch.org/docs/stable/nn.html>`_ ,\n",
|
|
"`torch.optim <https://pytorch.org/docs/stable/optim.html>`_ ,\n",
|
|
"`Dataset <https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset>`_ ,\n",
|
|
"and `DataLoader <https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader>`_\n",
|
|
"to help you create and train neural networks.\n",
|
|
"In order to fully utilize their power and customize\n",
|
|
"them for your problem, you need to really understand exactly what they're\n",
|
|
"doing. To develop this understanding, we will first train basic neural net\n",
|
|
"on the MNIST data set without using any features from these models; we will\n",
|
|
"initially only use the most basic PyTorch tensor functionality. Then, we will\n",
|
|
"incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n",
|
|
"``DataLoader`` at a time, showing exactly what each piece does, and how it\n",
|
|
"works to make the code either more concise, or more flexible.\n",
|
|
"\n",
|
|
"**This tutorial assumes you already have PyTorch installed, and are familiar\n",
|
|
"with the basics of tensor operations.** (If you're familiar with Numpy array\n",
|
|
"operations, you'll find the PyTorch tensor operations used here nearly identical).\n",
|
|
"\n",
|
|
"MNIST data setup\n",
|
|
"----------------\n",
|
|
"\n",
|
|
"We will use the classic `MNIST <http://deeplearning.net/data/mnist/>`_ dataset,\n",
|
|
"which consists of black-and-white images of hand-drawn digits (between 0 and 9).\n",
|
|
"\n",
|
|
"We will use `pathlib <https://docs.python.org/3/library/pathlib.html>`_\n",
|
|
"for dealing with paths (part of the Python 3 standard library), and will\n",
|
|
"download the dataset using\n",
|
|
"`requests <http://docs.python-requests.org/en/master/>`_. We will only\n",
|
|
"import modules when we use them, so you can see exactly what's being\n",
|
|
"used at each point.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from pathlib import Path\n",
|
|
"import requests\n",
|
|
"\n",
|
|
"DATA_PATH = Path(\"data\")\n",
|
|
"PATH = DATA_PATH / \"mnist\"\n",
|
|
"\n",
|
|
"PATH.mkdir(parents=True, exist_ok=True)\n",
|
|
"\n",
|
|
"URL = \"https://github.com/pytorch/tutorials/raw/master/_static/\"\n",
|
|
"FILENAME = \"mnist.pkl.gz\"\n",
|
|
"\n",
|
|
"if not (PATH / FILENAME).exists():\n",
|
|
" content = requests.get(URL + FILENAME).content\n",
|
|
" (PATH / FILENAME).open(\"wb\").write(content)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"This dataset is in numpy array format, and has been stored using pickle,\n",
|
|
"a python-specific format for serializing data.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"import pickle\n",
|
|
"import gzip\n",
|
|
"\n",
|
|
"with gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n",
|
|
" ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[[0. 0. 0. ... 0. 0. 0.]\n",
|
|
" [0. 0. 0. ... 0. 0. 0.]\n",
|
|
" [0. 0. 0. ... 0. 0. 0.]\n",
|
|
" ...\n",
|
|
" [0. 0. 0. ... 0. 0. 0.]\n",
|
|
" [0. 0. 0. ... 0. 0. 0.]\n",
|
|
" [0. 0. 0. ... 0. 0. 0.]]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(x_valid)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Each image is 28 x 28, and is being stored as a flattened row of length\n",
|
|
"784 (=28x28). Let's take a look at one; we need to reshape it to 2d\n",
|
|
"first.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"(50000, 784)\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAz0AAAM2CAYAAADcr+22AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAWJQAAFiUBSVIk8AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de6yldX3v8c/3MFWORAb1VEkvFuFUSLFKQUWhkVtQaasVhRNNbDktmraBKFZNG6sttLWxab0hHiW1hSOeFBtNtbZUTQQExdowFonBGxVKaFELlPvFDvM7f+w17XTce5g9s+ZZm+9+vZKdZ/azLt9fdOVh3vOs9awaYwQAAKCr/7boBQAAAOxJogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoLUNi17AnlBVNyTZN8mNC14KAAAwHwckuWuM8ZTVPrDGGPNfzoJV1W1JHr/odQAAAPM1xqjVPqbr29tuXPQCAACAtaFr9AAAACQRPQAAQHOiBwAAaE30AAAArYkeAACgtYVGT1X9SFX9WVX9S1U9WFU3VtW7qupxi1wXAADQx8K+nLSqDkpyVZInJvl4kq8leXaS1yZ5YVUdPca4bVHrAwAAeljkmZ7/k6Xgec0Y4yVjjN8cYxyf5J1JDk7y1gWuDQAAaKLGGNMPrTowyT9m6UtEDxpjbNnmtscmuSVJJXniGOPeXXj+TUkOn89qAQCAtWKMUat9zKLO9Bw/23562+BJkjHG3Uk+n+QxSZ4z9cIAAIBeFvWZnoNn22+scPs3kzw/yVOTfGalJ5md0VnOIbu+NAAAoJNFnenZONveucLtW/fvN8FaAACAxhZ29baHsfV9ejv8wNEY44hlH+wzPQAAwMyizvRsPZOzcYXb993ufgAAALtkUdHz9dn2qSvc/uOz7Uqf+QEAANgpi4qey2bb51fVf1nD7JLVRye5P8nfTb0wAACgl4VEzxjjH5N8OskBSc7Y7uZzkuyT5IO78h09AAAA21rIl5MmSVUdlOSqJE9M8vEkX01yZJLjsvS2tqPGGLft4nO7kAEAADT0SPpy0q1ne56Z5MIsxc7rkxyU5Nwkz93V4AEAANjWws707EnO9AAAQE+PqDM9AAAAUxA9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWtuw6AUAMI299tpr0nkbN26cdF5HZ5555qTzHvOYx0w26+CDD55s1hlnnDHZrD/+4z+ebNYrXvGKyWYlyQMPPDDZrLe97W2TzTrnnHMmm8XiONMDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArS0seqrqxqoaK/x8e1HrAgAAeln0l5PemeRdy+y/Z+qFAAAAPS06eu4YY5y94DUAAACN+UwPAADQ2qLP9Dy6ql6Z5MlJ7k1ybZIrxhgPLXZZAABAF4uOnv2TXLTdvhuq6pfGGJ99uAdX1aYVbjpkt1cGAAC0sMi3t12Q5IQshc8+SX4yyflJDkjyt1X1jMUtDQAA6GJhZ3rGGOdst+srSX61qu5J8vokZyc5+WGe44jl9s/OAB0+h2UCAACPcGvxQgbvn22ft9BVAAAALazF6PnubLvPQlcBAAC0sBaj57mz7bcWugoAAKCFhURPVR1aVY9fZv+PJTlv9uuHpl0VAADQ0aIuZHBqkt+sqsuS3JDk7iQHJfnZJHsnuSTJHy9obQAAQCOLip7Lkhyc5Key9Ha2fZLckeRzWfrenovGGGNBawMAABpZSPTMvnj0Yb98FAAAYHetxQsZAAAAzI3oAQAAWhM9AABAa6IHAABobVFXbwMaePKTnzzZrEc96lGTzTrqqKMmm/XTP/3Tk83ab7/9JpuVJC972csmnccjy8033zzZrHPPPXeyWSeffPJks+6+++7JZiXJl7/85clmffazrnfFfDnTAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1mqMseg1zF1VbUpy+KLXAVM77LDDJp136aWXTjZr48aNk82C9WrLli2TzfrlX/7lyWbdc889k82a0i233DLpvH/7t3+bbNbXv/71yWbxyDPGqNU+xpkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWNix6AcD83HTTTZPOu+222yabtXHjxslm8cjzxS9+cbJZd9xxx2SzjjvuuMlmJcn3vve9yWZddNFFk80CcKYHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtbVj0AoD5uf322yed98Y3vnGyWT/3cz832ax/+Id/mGzWueeeO9msqV1zzTWTzTrxxBMnm3XvvfdONuvQQw+dbFaSvPa1r510HsBUnOkBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrNcZY9Brmrqo2JTl80esA5mffffedbNbdd9892azzzz9/slmnn376ZLOS5JWvfOVks/78z/98slkALNYYo1b7GGd6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoLW5RE9VnVJV76mqK6vqrqoaVfWhh3nMUVV1SVXdXlX3VdW1VXVWVe01jzUBAAAkyYY5Pc+bkzwjyT1Jbk5yyI7uXFU/n+SjSR5I8uEktyd5UZJ3Jjk6yalzWhcAALDOzevtba9L8tQk+yb5tR3dsar2TfInSR5KcuwY4/QxxhuTHJbkC0lOqaqXz2ldAADAOjeX6BljXDbG+OYYY+zE3U9J8oNJLh5jXL3NczyQpTNGycOEEwAAwM5axIUMjp9tP7nMbVckuS/JUVX16OmWBAAAdDWvz/SsxsGz7Te2v2GMsbmqbkhyaJIDk3x1R09UVZtWuGmHnykCAADWj0Wc6dk42965wu1b9+83wVoAAIDmFnGm5+HUbPuwnw8aYxyx7BMsnQE6fJ6LAgAAHpkWcaZn65mcjSvcvu929wMAANhli4ier8+2T93+hqrakOQpSTYn+daUiwIAAHpaRPRcOtu+cJnbnpfkMUmuGmM8ON2SAACArhYRPR9JcmuSl1fVM7furKq9k/z+7Nf3LWBdAABAQ3O5kEFVvSTJS2a/7j/bPreqLpz9+dYxxhuSZIxxV1W9Okvxc3lVXZzk9iQvztLlrD+S5MPzWBcAAMC8rt52WJLTttt34OwnSf4pyRu23jDG+FhVHZPkt5K8LMneSa5P8utJzh1jPOyV2wAAAHbGXKJnjHF2krNX+ZjPJ/mZecwHAABYySI+0wMAADAZ0QMAALQmegAAgNZEDwAA0Nq8rt4GsEfdddddi17CHnHnnXcuegl7zKtf/erJZn34w9N908GWLVsmmwXAfDjTAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQWo0xFr2GuauqTUkOX/Q6AB7OPvvsM9msT3ziE5PNSpJjjjlmslknnXTSZLM+/elPTzYLgO83xqjVPsaZHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtFZjjEWvYe6qalOSwxe9DoC15KCDDpp03pe+9KXJZt1xxx2Tzbrssssmm3X11VdPNitJ3vve9042q+PfP4BpjDFqtY9xpgcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK3VGGPRa5i7qtqU5PBFrwNgPTv55JMnm3XBBRdMNuuxj33sZLOm9qY3vWmyWR/84Acnm3XLLbdMNgvY88YYtdrHONMDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWaoyx6DXMXVVtSnL4otcBwDSe9rSnTTbrHe94x2SzTjjhhMlmTe3888+fbNZb3/rWyWb98z//82SzYL0aY9RqH+NMDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQ2l+ipqlOq6j1VdWVV3VVVo6o+tMJ9D5jdvtLPxfNYEwAAQJJsmNPzvDnJM5Lck+TmJIfsxGO+nORjy+z/ypzWBAAAMLfoeV2WYuf6JMckuWwnHnPNGOPsOc0HAABY1lyiZ4zxH5FTVfN4SgAAgLmY15meXfFDVfUrSZ6Q5LYkXxhjXLvA9QAAAA0tMnpOnP38h6q6PMlpY4ybduYJqmrTCjftzGeKAACAdWARl6y+L8nvJTkiyeNmP1s/B3Rsks9U1T4LWBcAANDQ5Gd6xhjfTfLb2+2+oqqen+RzSY5M8qok796J5zpiuf2zM0CH7+ZSAQCABtbMl5OOMTYn+cDs1+ctci0AAEAfayZ6Zv51tvX2NgAAYC7WWvQ8Z7b91kJXAQAAtDF59FTVkVX1qGX2H5+lLzlNkg9NuyoAAKCruVzIoKpekuQls1/3n22fW1UXzv586xjjDbM//2GSQ2eXp755tu/pSY6f/fktY4yr5rEuAACAeV297bAkp22378DZT5L8U5Kt0XNRkpOTPCvJSUl+IMl3kvxFkvPGGFfOaU0AAADziZ4xxtlJzt7J+/5pkj+dx1wAAICHs9YuZAAAADBXogcAAGhN9AAAAK2JHgAAoLUaYyx6DXNXVZuSHL7odQDQz3777TfZrBe96EWTzUqSCy64YLJZVTXZrEsvvXSyWSeeeOJks2C9GmOs+gDiTA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgNdEDAAC0JnoAAIDWRA8AANCa6AEAAFoTPQAAQGs1xlj0GuauqjYlOXzR6wCAR5IHH3xwslkbNmyYbNbmzZsnm/WCF7xgslmXX375ZLNgLRlj1Gof40wPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABa27DoBQDA7nr6058+2axTTjllslnPetazJpuVJBs29PxrwXXXXTfZrCuuuGKyWcDOc6YHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtbVj0AgCYxsEHHzzpvDPPPHOyWS996Usnm7X//vtPNquzhx56aLJZt9xyy2SztmzZMtksYOc50wMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNY2LHoBAGvN/vvvP9msV7ziFZPNOvPMMyeblSQHHHDApPPYfVdfffVks9761rdONuuv/uqvJpsFrE3O9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrux09VfWEqnpVVf1lVV1fVfdX1Z1V9bmqOr2qlp1RVUdV1SVVdXtV3VdV11bVWVW11+6uCQAAYKt5fDnpqUnel+SWJJcluSnJk5K8NMkHkpxUVaeOMcbWB1TVzyf5aJIHknw4ye1JXpTknUmOnj0nAADAbptH9HwjyYuT/M0YY8vWnVX1piR/n+RlWQqgj87275vkT5I8lOTYMcbVs/1vSXJpklOq6uVjjIvnsDYAAGCd2+23t40xLh1jfGLb4Jnt/3aS989+PXabm05J8oNJLt4aPLP7P5DkzbNff2131wUAAJDs+QsZ/Ptsu3mbfcfPtp9c5v5XJLkvyVFV9eg9uTAAAGB9mMfb25ZVVRuS/OLs120D5+DZ9hvbP2aMsbmqbkhyaJIDk3z1YWZsWuGmQ1a3WgAAoKs9eabnbUmeluSSMcanttm/cba9c4XHbd2/355aGAAAsH7skTM9VfWaJK9P8rUkv7Dah8+2Y4f3SjLGOGKF+ZuSHL7KuQAAQENzP9NTVWckeXeS65IcN8a4fbu7bD2TszHL23e7+wEAAOyyuUZPVZ2V5LwkX8lS8Hx7mbt9fbZ96jKP35DkKVm68MG35rk2AABgfZpb9FTVb2Tpy0WvyVLwfHeFu146275wmduel+QxSa4aYzw4r7UBAADr11yiZ/bFom9LsinJCWOMW3dw948kuTXJy6vqmds8x95Jfn/26/vmsS4AAIDdvpBBVZ2W5HeTPJTkyiSvqart73bjGOPCJBlj3FVVr85S/FxeVRcnuT3Ji7N0OeuPJPnw7q4LAAAgmc/V254y2+6V5KwV7vPZJBdu/WWM8bGqOibJbyV5WZK9k1yf5NeTnDvGeNgrtwEAAOyM3Y6eMcbZSc7ehcd9PsnP7O58AACAHdmTX04KAACwcKIHAABoTfQAAACtiR4AAKC1eVy9DVinnvSkJ0026yd+4icmm3XeeedNNuuQQw6ZbBbz8cUvfnGyWX/0R3802awk+fjHPz7ZrC1btkw2C8CZHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1jYsegHQ3eMf//jJZp1//vmTzUqSww47bLJZBx544GSzmI+rrrpqsllvf/vbJ5v1qU99arJZ999//2SzADpzpgcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK1tWPQCYKsjjzxysllvfOMbJ5v17Gc/e7JZP/zDPzzZLObjvvvum2zWueeeO9msJPmDP/iDyWbde++9k80C4JHHmR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALS2YdELgK1OPvnklrM6u+666yab9dd//deTzdq8efNks97+9rdPNuuOO+6YbBYArCXO9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoLUaYyx6DXNXVZuSHL7odQAAAPM1xqjVPsaZHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhtt6Onqp5QVa+qqr+squur6v6qurOqPldVp1fVf9vu/gdU1djBz8W7uyYAAICtNszhOU5N8r4ktyS5LMlNSZ6U5KVJPpDkpKo6dXz/t6B+OcnHlnm+r8xhTQAAAEnmEz3fSPLiJH8zxtiydWdVvSnJ3yd5WZYC6KPbPe6aMcbZc5gPAACwot1+e9sY49Ixxie2DZ7Z/m8nef/s12N3dw4AAMCumMeZnh3599l28zK3/VBV/UqSJyS5LckXxhjX7uH1AAAA68wei56q2pDkF2e/fnKZu5w4+9n2MZcnOW2McdNOzti0wk2H7OQyAQCA5vbkJavfluRpSS4ZY3xqm/33Jfm9JEckedzs55gsXQTh2CSfqap99uC6AACAdaS+/6Jqc3jSqtckeXeSryU5eoxx+048ZkOSzyU5MslZY4x378b8TUkO39XHAwAAa9MYo1b7mLmf6amqM7IUPNclOW5ngidJxhibs3SJ6yR53rzXBQAArE9zjZ6qOivJeVn6rp3jZldwW41/nW29vQ0AAJiLuUVPVf1GkncmuSZLwfPdXXia58y235rXugAAgPVtLtFTVW/J0oULNiU5YYxx6w7ue2RVPWqZ/ccned3s1w/NY10AAAC7fcnqqjotye8meSjJlUleU/V9ny26cYxx4ezPf5jk0NnlqW+e7Xt6kuNnf37LGOOq3V0XAABAMp/v6XnKbLtXkrNWuM9nk1w4+/NFSU5O8qwkJyX5gSTfSfIXSc4bY1w5hzUBAAAk2UOXrF40l6wGAICe1sQlqwEAANYS0QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0JroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALQmegAAgNZEDwAA0Co+k40AAAd+SURBVJroAQAAWhM9AABAa6IHAABoTfQAAACtiR4AAKA10QMAALTWNXoOWPQCAACAtWHDohewh9w12964isccMtt+bb5LoQmvD3bE64Md8fpgR7w+2BGvj//qgPzn3/NXpcYY813KI1RVbUqSMcYRi14La4/XBzvi9cGOeH2wI14f7IjXx/x0fXsbAABAEtEDAAA0J3oAAIDWRA8AANCa6AEAAFpz9TYAAKA1Z3oAAIDWRA8AANCa6AEAAFoTPQAAQGuiBwAAaE30AAAArYkeAACgtXUfPVX1I1X1Z1X1L1X1YFXdWFXvqqrHLXptLN7s9TBW+Pn2otfHnldVp1TVe6rqyqq6a/b//Yce5jFHVdUlVXV7Vd1XVddW1VlVtddU62Yaq3l9VNUBOziejKq6eOr1s+dU1ROq6lVV9ZdVdX1V3V9Vd1bV56rq9Kpa9u9gjh/rw2pfH44fu2/DohewSFV1UJKrkjwxyceTfC3Js5O8NskLq+roMcZtC1wia8OdSd61zP57pl4IC/HmJM/I0v/fNyc5ZEd3rqqfT/LRJA8k+XCS25O8KMk7kxyd5NQ9uVgmt6rXx8yXk3xsmf1fmeO6WLxTk7wvyS1JLktyU5InJXlpkg8kOamqTh3bfEu848e6surXx4zjxy6q7//fcv2oqk8leX6S14wx3rPN/nckeV2S88cYv7qo9bF4VXVjkowxDljsSliUqjouS3+ZvT7JMVn6j9P/G2O8cpn77ju738YkR48xrp7t3zvJpUmem+QVYwz/ItfEKl8fByS5Icn/HWP87+lWySJU1fFJ9knyN2OMLdvs3z/J3yf50SSnjDE+Otvv+LGO7MLr44A4fuyWdfv2tqo6MEvBc2OS92538+8kuTfJL1TVPhMvDVhDxhiXjTG+ucy/ti3nlCQ/mOTirX9hmT3HA1k6I5Akv7YHlsmCrPL1wToyxrh0jPGJbf9CO9v/7STvn/167DY3OX6sI7vw+mA3ree3tx0/2356mRfc3VX1+SxF0XOSfGbqxbGmPLqqXpnkyVmK4WuTXDHGeGixy2IN2npc+eQyt12R5L4kR1XVo8cYD063LNaYH6qqX0nyhCS3JfnCGOPaBa+Jaf37bLt5m32OH2y13OtjK8ePXbSeo+fg2fYbK9z+zSxFz1Mjeta7/ZNctN2+G6rql8YYn13EglizVjyujDE2V9UNSQ5NcmCSr065MNaUE2c//6GqLk9y2hjjpoWsiMlU1YYkvzj7ddvAcfxgR6+PrRw/dtG6fXtblt4zmyx9SH05W/fvN8FaWLsuSHJClsJnnyQ/meT8JAck+duqesbilsYa5LjCjtyX5PeSHJHkcbOfrZ8DOjbJZ7ylel14W5KnJblkjPGpbfY7fpCs/Ppw/NhN6zl6Hk7Ntt6nvY6NMc6Zve/2O2OM+8YYX5ld3OIdSf57krMXu0IeYRxX1rExxnfHGL89xvjSGOOO2c8VWXpXwReT/M8kr1rsKtmTquo1SV6fpavF/sJqHz7bOn40taPXh+PH7lvP0bP1X0w2rnD7vtvdD7a19UOGz1voKlhrHFdYtTHG5ixdojZxTGmrqs5I8u4k1yU5boxx+3Z3cfxYx3bi9bEsx4+dt56j5+uz7VNXuP3HZ9uVPvPD+vbd2dapZLa14nFl9j7tp2Tpg6nfmnJRPCL862zrmNJQVZ2V5LwsfZfKcbMrdG3P8WOd2snXx444fuyE9Rw9l822z1/mW28fm6UvAbs/yd9NvTAeEZ472/qPD9u6dLZ94TK3PS/JY5Jc5cpLLOM5s61jSjNV9RtZ+nLRa7L0F9rvrnBXx491aBWvjx1x/NgJ6zZ6xhj/mOTTWfpA+hnb3XxOlmr5g2OMeydeGmtEVR1aVY9fZv+PZelfZJLkQ9OuijXuI0luTfLyqnrm1p2zLxf8/dmv71vEwli8qjqyqh61zP7js/SF2IljSitV9ZYsfTB9U5ITxhi37uDujh/rzGpeH44fu6/W8/epVdVBSa5K8sQkH8/SJSCPTHJclt7WdtQY47bFrZBFqqqzk/xmls4K3pDk7iQHJfnZJHsnuSTJyWOM7y1qjex5VfWSJC+Z/bp/khdk6V/Trpztu3WM8Ybt7v+RJA8kuTjJ7UlenKXL0X4kyf/yRZZ9rOb1Mbus7KFJLk9y8+z2p+c/v5/lLWOMrX+55RGuqk5LcmGSh5K8J8t/FufGMcaF2zzG8WOdWO3rw/Fj963r6EmSqvrRJL+bpdPJT0hyS5KPJTlnZz9ERk9VdUySX03yU/nPS1bfkaVT0Bcluch/fPqbxe/v7OAu/zTGOGC7xxyd5Ley9DbIvZNcn+TPkpzrS217Wc3ro6pOT3Jyli5H+z+S/ECS7yT5QpLzxhhXrvQkPPLsxGsjST47xjh2u8c5fqwDq319OH7svnUfPQAAQG/r9jM9AADA+iB6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtCZ6AACA1kQPAADQmugBAABaEz0AAEBrogcAAGhN9AAAAK2JHgAAoDXRAwAAtPb/AZlgB+Ge770sAAAAAElFTkSuQmCC",
|
|
"text/plain": [
|
|
"<Figure size 864x504 with 1 Axes>"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"image/png": {
|
|
"height": 411,
|
|
"width": 414
|
|
},
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from matplotlib import pyplot\n",
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"pyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n",
|
|
"print(x_train.shape)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\n",
|
|
"convert our data.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"tensor([[0., 0., 0., ..., 0., 0., 0.],\n",
|
|
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
|
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
|
" ...,\n",
|
|
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
|
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
|
" [0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8])\n",
|
|
"torch.Size([50000, 784])\n",
|
|
"tensor(0) tensor(9)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import torch\n",
|
|
"\n",
|
|
"x_train, y_train, x_valid, y_valid = map(\n",
|
|
" torch.tensor, (x_train, y_train, x_valid, y_valid)\n",
|
|
")\n",
|
|
"n, c = x_train.shape\n",
|
|
"print(x_train, y_train)\n",
|
|
"print(x_train.shape)\n",
|
|
"print(y_train.min(), y_train.max())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Neural net from scratch (no torch.nn)\n",
|
|
"---------------------------------------------\n",
|
|
"\n",
|
|
"Let's first create a model using nothing but PyTorch tensor operations. We're assuming\n",
|
|
"you're already familiar with the basics of neural networks. (If you're not, you can\n",
|
|
"learn them at `course.fast.ai <https://course.fast.ai>`_).\n",
|
|
"\n",
|
|
"PyTorch provides methods to create random or zero-filled tensors, which we will\n",
|
|
"use to create our weights and bias for a simple linear model. These are just regular\n",
|
|
"tensors, with one very special addition: we tell PyTorch that they require a\n",
|
|
"gradient. This causes PyTorch to record all of the operations done on the tensor,\n",
|
|
"so that it can calculate the gradient during back-propagation *automatically*!\n",
|
|
"\n",
|
|
"For the weights, we set ``requires_grad`` **after** the initialization, since we\n",
|
|
"don't want that step included in the gradient. (Note that a trailing ``_`` in\n",
|
|
"PyTorch signifies that the operation is performed in-place.)\n",
|
|
"\n",
|
|
"<div class=\"alert alert-info\"><h4>Note</h4><p>We are initializing the weights here with\n",
|
|
" `Xavier initialisation <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_\n",
|
|
" (by multiplying with 1/sqrt(n)).</p></div>\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"import math\n",
|
|
"\n",
|
|
"weights = torch.randn(784, 10) / math.sqrt(784)\n",
|
|
"weights.requires_grad_()\n",
|
|
"bias = torch.zeros(10, requires_grad=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Thanks to PyTorch's ability to calculate gradients automatically, we can\n",
|
|
"use any standard Python function (or callable object) as a model! So\n",
|
|
"let's just write a plain matrix multiplication and broadcasted addition\n",
|
|
"to create a simple linear model. We also need an activation function, so\n",
|
|
"we'll write `log_softmax` and use it. Remember: although PyTorch\n",
|
|
"provides lots of pre-written loss functions, activation functions, and\n",
|
|
"so forth, you can easily write your own using plain python. PyTorch will\n",
|
|
"even create fast GPU or vectorized CPU code for your function\n",
|
|
"automatically.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def log_softmax(x):\n",
|
|
" return x - x.exp().sum(-1).log().unsqueeze(-1)\n",
|
|
"\n",
|
|
"def model(xb):\n",
|
|
" return log_softmax(xb @ weights + bias)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"In the above, the ``@`` stands for the dot product operation. We will call\n",
|
|
"our function on one batch of data (in this case, 64 images). This is\n",
|
|
"one *forward pass*. Note that our predictions won't be any better than\n",
|
|
"random at this stage, since we start with random weights.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"bs = 64 # batch size\n",
|
|
"\n",
|
|
"xb = x_train[0:bs] # a mini-batch from x\n",
|
|
"preds = model(xb) # predictions\n",
|
|
"preds[0], preds.shape\n",
|
|
"print(preds[0], preds.shape)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"As you see, the ``preds`` tensor contains not only the tensor values, but also a\n",
|
|
"gradient function. We'll use this later to do backprop.\n",
|
|
"\n",
|
|
"Let's implement negative log-likelihood to use as the loss function\n",
|
|
"(again, we can just use standard Python):\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def nll(input, target):\n",
|
|
" return -input[range(target.shape[0]), target].mean()\n",
|
|
"\n",
|
|
"loss_func = nll"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's check our loss with our random model, so we can see if we improve\n",
|
|
"after a backprop pass later.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"yb = y_train[0:bs]\n",
|
|
"print(loss_func(preds, yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's also implement a function to calculate the accuracy of our model.\n",
|
|
"For each prediction, if the index with the largest value matches the\n",
|
|
"target value, then the prediction was correct.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def accuracy(out, yb):\n",
|
|
" preds = torch.argmax(out, dim=1)\n",
|
|
" return (preds == yb).float().mean()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's check the accuracy of our random model, so we can see if our\n",
|
|
"accuracy improves as our loss improves.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(accuracy(preds, yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We can now run a training loop. For each iteration, we will:\n",
|
|
"\n",
|
|
"- select a mini-batch of data (of size ``bs``)\n",
|
|
"- use the model to make predictions\n",
|
|
"- calculate the loss\n",
|
|
"- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n",
|
|
" and ``bias``.\n",
|
|
"\n",
|
|
"We now use these gradients to update the weights and bias. We do this\n",
|
|
"within the ``torch.no_grad()`` context manager, because we do not want these\n",
|
|
"actions to be recorded for our next calculation of the gradient. You can read\n",
|
|
"more about how PyTorch's Autograd records operations\n",
|
|
"`here <https://pytorch.org/docs/stable/notes/autograd.html>`_.\n",
|
|
"\n",
|
|
"We then set the\n",
|
|
"gradients to zero, so that we are ready for the next loop.\n",
|
|
"Otherwise, our gradients would record a running tally of all the operations\n",
|
|
"that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\n",
|
|
"already stored, rather than replacing them).\n",
|
|
"\n",
|
|
".. tip:: You can use the standard python debugger to step through PyTorch\n",
|
|
" code, allowing you to check the various variable values at each step.\n",
|
|
" Uncomment ``set_trace()`` below to try it out.\n",
|
|
"\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from IPython.core.debugger import set_trace\n",
|
|
"\n",
|
|
"lr = 0.5 # learning rate\n",
|
|
"epochs = 2 # how many epochs to train for\n",
|
|
"\n",
|
|
"for epoch in range(epochs):\n",
|
|
" for i in range((n - 1) // bs + 1):\n",
|
|
" # set_trace()\n",
|
|
" start_i = i * bs\n",
|
|
" end_i = start_i + bs\n",
|
|
" xb = x_train[start_i:end_i]\n",
|
|
" yb = y_train[start_i:end_i]\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" with torch.no_grad():\n",
|
|
" weights -= weights.grad * lr\n",
|
|
" bias -= bias.grad * lr\n",
|
|
" weights.grad.zero_()\n",
|
|
" bias.grad.zero_()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"That's it: we've created and trained a minimal neural network (in this case, a\n",
|
|
"logistic regression, since we have no hidden layers) entirely from scratch!\n",
|
|
"\n",
|
|
"Let's check the loss and accuracy and compare those to what we got\n",
|
|
"earlier. We expect that the loss will have decreased and accuracy to\n",
|
|
"have increased, and they have.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Using torch.nn.functional\n",
|
|
"------------------------------\n",
|
|
"\n",
|
|
"We will now refactor our code, so that it does the same thing as before, only\n",
|
|
"we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\n",
|
|
"and flexible. At each step from here, we should be making our code one or more\n",
|
|
"of: shorter, more understandable, and/or more flexible.\n",
|
|
"\n",
|
|
"The first and easiest step is to make our code shorter by replacing our\n",
|
|
"hand-written activation and loss functions with those from ``torch.nn.functional``\n",
|
|
"(which is generally imported into the namespace ``F`` by convention). This module\n",
|
|
"contains all the functions in the ``torch.nn`` library (whereas other parts of the\n",
|
|
"library contain classes). As well as a wide range of loss and activation\n",
|
|
"functions, you'll also find here some convenient functions for creating neural\n",
|
|
"nets, such as pooling functions. (There are also functions for doing convolutions,\n",
|
|
"linear layers, etc, but as we'll see, these are usually better handled using\n",
|
|
"other parts of the library.)\n",
|
|
"\n",
|
|
"If you're using negative log likelihood loss and log softmax activation,\n",
|
|
"then Pytorch provides a single function ``F.cross_entropy`` that combines\n",
|
|
"the two. So we can even remove the activation function from our model.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"import torch.nn.functional as F\n",
|
|
"\n",
|
|
"loss_func = F.cross_entropy\n",
|
|
"\n",
|
|
"def model(xb):\n",
|
|
" return xb @ weights + bias"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\n",
|
|
"confirm that our loss and accuracy are the same as before:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Refactor using nn.Module\n",
|
|
"-----------------------------\n",
|
|
"Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\n",
|
|
"concise training loop. We subclass ``nn.Module`` (which itself is a class and\n",
|
|
"able to keep track of state). In this case, we want to create a class that\n",
|
|
"holds our weights, bias, and method for the forward step. ``nn.Module`` has a\n",
|
|
"number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\n",
|
|
"which we will be using.\n",
|
|
"\n",
|
|
"<div class=\"alert alert-info\"><h4>Note</h4><p>``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n",
|
|
" class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n",
|
|
" concept of a (lowercase ``m``) `module <https://docs.python.org/3/tutorial/modules.html>`_,\n",
|
|
" which is a file of Python code that can be imported.</p></div>\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from torch import nn\n",
|
|
"\n",
|
|
"class Mnist_Logistic(nn.Module):\n",
|
|
" def __init__(self):\n",
|
|
" super().__init__()\n",
|
|
" self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n",
|
|
" self.bias = nn.Parameter(torch.zeros(10))\n",
|
|
"\n",
|
|
" def forward(self, xb):\n",
|
|
" return xb @ self.weights + self.bias"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Since we're now using an object instead of just using a function, we\n",
|
|
"first have to instantiate our model:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model = Mnist_Logistic()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Now we can calculate the loss in the same way as before. Note that\n",
|
|
"``nn.Module`` objects are used as if they are functions (i.e they are\n",
|
|
"*callable*), but behind the scenes Pytorch will call our ``forward``\n",
|
|
"method automatically.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Previously for our training loop we had to update the values for each parameter\n",
|
|
"by name, and manually zero out the grads for each parameter separately, like this:\n",
|
|
"::\n",
|
|
" with torch.no_grad():\n",
|
|
" weights -= weights.grad * lr\n",
|
|
" bias -= bias.grad * lr\n",
|
|
" weights.grad.zero_()\n",
|
|
" bias.grad.zero_()\n",
|
|
"\n",
|
|
"\n",
|
|
"Now we can take advantage of model.parameters() and model.zero_grad() (which\n",
|
|
"are both defined by PyTorch for ``nn.Module``) to make those steps more concise\n",
|
|
"and less prone to the error of forgetting some of our parameters, particularly\n",
|
|
"if we had a more complicated model:\n",
|
|
"::\n",
|
|
" with torch.no_grad():\n",
|
|
" for p in model.parameters(): p -= p.grad * lr\n",
|
|
" model.zero_grad()\n",
|
|
"\n",
|
|
"\n",
|
|
"We'll wrap our little training loop in a ``fit`` function so we can run it\n",
|
|
"again later.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def fit():\n",
|
|
" for epoch in range(epochs):\n",
|
|
" for i in range((n - 1) // bs + 1):\n",
|
|
" start_i = i * bs\n",
|
|
" end_i = start_i + bs\n",
|
|
" xb = x_train[start_i:end_i]\n",
|
|
" yb = y_train[start_i:end_i]\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" with torch.no_grad():\n",
|
|
" for p in model.parameters():\n",
|
|
" p -= p.grad * lr\n",
|
|
" model.zero_grad()\n",
|
|
"\n",
|
|
"fit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's double-check that our loss has gone down:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Refactor using nn.Linear\n",
|
|
"-------------------------\n",
|
|
"\n",
|
|
"We continue to refactor our code. Instead of manually defining and\n",
|
|
"initializing ``self.weights`` and ``self.bias``, and calculating ``xb @\n",
|
|
"self.weights + self.bias``, we will instead use the Pytorch class\n",
|
|
"`nn.Linear <https://pytorch.org/docs/stable/nn.html#linear-layers>`_ for a\n",
|
|
"linear layer, which does all that for us. Pytorch has many types of\n",
|
|
"predefined layers that can greatly simplify our code, and often makes it\n",
|
|
"faster too.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"class Mnist_Logistic(nn.Module):\n",
|
|
" def __init__(self):\n",
|
|
" super().__init__()\n",
|
|
" self.lin = nn.Linear(784, 10)\n",
|
|
"\n",
|
|
" def forward(self, xb):\n",
|
|
" return self.lin(xb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We instantiate our model and calculate the loss in the same way as before:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model = Mnist_Logistic()\n",
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We are still able to use our same ``fit`` method as before.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"fit()\n",
|
|
"\n",
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Refactor using optim\n",
|
|
"------------------------------\n",
|
|
"\n",
|
|
"Pytorch also has a package with various optimization algorithms, ``torch.optim``.\n",
|
|
"We can use the ``step`` method from our optimizer to take a forward step, instead\n",
|
|
"of manually updating each parameter.\n",
|
|
"\n",
|
|
"This will let us replace our previous manually coded optimization step:\n",
|
|
"::\n",
|
|
" with torch.no_grad():\n",
|
|
" for p in model.parameters(): p -= p.grad * lr\n",
|
|
" model.zero_grad()\n",
|
|
"\n",
|
|
"and instead use just:\n",
|
|
"::\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
"(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\n",
|
|
"computing the gradient for the next minibatch.)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from torch import optim"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We'll define a little function to create our model and optimizer so we\n",
|
|
"can reuse it in the future.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def get_model():\n",
|
|
" model = Mnist_Logistic()\n",
|
|
" return model, optim.SGD(model.parameters(), lr=lr)\n",
|
|
"\n",
|
|
"model, opt = get_model()\n",
|
|
"print(loss_func(model(xb), yb))\n",
|
|
"\n",
|
|
"for epoch in range(epochs):\n",
|
|
" for i in range((n - 1) // bs + 1):\n",
|
|
" start_i = i * bs\n",
|
|
" end_i = start_i + bs\n",
|
|
" xb = x_train[start_i:end_i]\n",
|
|
" yb = y_train[start_i:end_i]\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Refactor using Dataset\n",
|
|
"------------------------------\n",
|
|
"\n",
|
|
"PyTorch has an abstract Dataset class. A Dataset can be anything that has\n",
|
|
"a ``__len__`` function (called by Python's standard ``len`` function) and\n",
|
|
"a ``__getitem__`` function as a way of indexing into it.\n",
|
|
"`This tutorial <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`_\n",
|
|
"walks through a nice example of creating a custom ``FacialLandmarkDataset`` class\n",
|
|
"as a subclass of ``Dataset``.\n",
|
|
"\n",
|
|
"PyTorch's `TensorDataset <https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#TensorDataset>`_\n",
|
|
"is a Dataset wrapping tensors. By defining a length and way of indexing,\n",
|
|
"this also gives us a way to iterate, index, and slice along the first\n",
|
|
"dimension of a tensor. This will make it easier to access both the\n",
|
|
"independent and dependent variables in the same line as we train.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from torch.utils.data import TensorDataset"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\n",
|
|
"which will be easier to iterate over and slice.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"train_ds = TensorDataset(x_train, y_train)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Previously, we had to iterate through minibatches of x and y values separately:\n",
|
|
"::\n",
|
|
" xb = x_train[start_i:end_i]\n",
|
|
" yb = y_train[start_i:end_i]\n",
|
|
"\n",
|
|
"\n",
|
|
"Now, we can do these two steps together:\n",
|
|
"::\n",
|
|
" xb,yb = train_ds[i*bs : i*bs+bs]\n",
|
|
"\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model, opt = get_model()\n",
|
|
"\n",
|
|
"for epoch in range(epochs):\n",
|
|
" for i in range((n - 1) // bs + 1):\n",
|
|
" xb, yb = train_ds[i * bs: i * bs + bs]\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Refactor using DataLoader\n",
|
|
"------------------------------\n",
|
|
"\n",
|
|
"Pytorch's ``DataLoader`` is responsible for managing batches. You can\n",
|
|
"create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\n",
|
|
"to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\n",
|
|
"the DataLoader gives us each minibatch automatically.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"from torch.utils.data import DataLoader\n",
|
|
"\n",
|
|
"train_ds = TensorDataset(x_train, y_train)\n",
|
|
"train_dl = DataLoader(train_ds, batch_size=bs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Previously, our loop iterated over batches (xb, yb) like this:\n",
|
|
"::\n",
|
|
" for i in range((n-1)//bs + 1):\n",
|
|
" xb,yb = train_ds[i*bs : i*bs+bs]\n",
|
|
" pred = model(xb)\n",
|
|
"\n",
|
|
"Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n",
|
|
"::\n",
|
|
" for xb,yb in train_dl:\n",
|
|
" pred = model(xb)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model, opt = get_model()\n",
|
|
"\n",
|
|
"for epoch in range(epochs):\n",
|
|
" for xb, yb in train_dl:\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
"print(loss_func(model(xb), yb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\n",
|
|
"our training loop is now dramatically smaller and easier to understand. Let's\n",
|
|
"now try to add the basic features necessary to create effective models in practice.\n",
|
|
"\n",
|
|
"Add validation\n",
|
|
"-----------------------\n",
|
|
"\n",
|
|
"In section 1, we were just trying to get a reasonable training loop set up for\n",
|
|
"use on our training data. In reality, you **always** should also have\n",
|
|
"a `validation set <https://www.fast.ai/2017/11/13/validation-sets/>`_, in order\n",
|
|
"to identify if you are overfitting.\n",
|
|
"\n",
|
|
"Shuffling the training data is\n",
|
|
"`important <https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks>`_\n",
|
|
"to prevent correlation between batches and overfitting. On the other hand, the\n",
|
|
"validation loss will be identical whether we shuffle the validation set or not.\n",
|
|
"Since shuffling takes extra time, it makes no sense to shuffle the validation data.\n",
|
|
"\n",
|
|
"We'll use a batch size for the validation set that is twice as large as\n",
|
|
"that for the training set. This is because the validation set does not\n",
|
|
"need backpropagation and thus takes less memory (it doesn't need to\n",
|
|
"store the gradients). We take advantage of this to use a larger batch\n",
|
|
"size and compute the loss more quickly.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"train_ds = TensorDataset(x_train, y_train)\n",
|
|
"train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n",
|
|
"\n",
|
|
"valid_ds = TensorDataset(x_valid, y_valid)\n",
|
|
"valid_dl = DataLoader(valid_ds, batch_size=bs * 2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"We will calculate and print the validation loss at the end of each epoch.\n",
|
|
"\n",
|
|
"(Note that we always call ``model.train()`` before training, and ``model.eval()``\n",
|
|
"before inference, because these are used by layers such as ``nn.BatchNorm2d``\n",
|
|
"and ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model, opt = get_model()\n",
|
|
"\n",
|
|
"for epoch in range(epochs):\n",
|
|
" model.train()\n",
|
|
" for xb, yb in train_dl:\n",
|
|
" pred = model(xb)\n",
|
|
" loss = loss_func(pred, yb)\n",
|
|
"\n",
|
|
" loss.backward()\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
" model.eval()\n",
|
|
" with torch.no_grad():\n",
|
|
" valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n",
|
|
"\n",
|
|
" print(epoch, valid_loss / len(valid_dl))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Create fit() and get_data()\n",
|
|
"----------------------------------\n",
|
|
"\n",
|
|
"We'll now do a little refactoring of our own. Since we go through a similar\n",
|
|
"process twice of calculating the loss for both the training set and the\n",
|
|
"validation set, let's make that into its own function, ``loss_batch``, which\n",
|
|
"computes the loss for one batch.\n",
|
|
"\n",
|
|
"We pass an optimizer in for the training set, and use it to perform\n",
|
|
"backprop. For the validation set, we don't pass an optimizer, so the\n",
|
|
"method doesn't perform backprop.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def loss_batch(model, loss_func, xb, yb, opt=None):\n",
|
|
" loss = loss_func(model(xb), yb)\n",
|
|
"\n",
|
|
" if opt is not None:\n",
|
|
" loss.backward()\n",
|
|
" opt.step()\n",
|
|
" opt.zero_grad()\n",
|
|
"\n",
|
|
" return loss.item(), len(xb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"``fit`` runs the necessary operations to train our model and compute the\n",
|
|
"training and validation losses for each epoch.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n",
|
|
" for epoch in range(epochs):\n",
|
|
" model.train()\n",
|
|
" for xb, yb in train_dl:\n",
|
|
" loss_batch(model, loss_func, xb, yb, opt)\n",
|
|
"\n",
|
|
" model.eval()\n",
|
|
" with torch.no_grad():\n",
|
|
" losses, nums = zip(\n",
|
|
" *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n",
|
|
" )\n",
|
|
" val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n",
|
|
"\n",
|
|
" print(epoch, val_loss)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"``get_data`` returns dataloaders for the training and validation sets.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def get_data(train_ds, valid_ds, bs):\n",
|
|
" return (\n",
|
|
" DataLoader(train_ds, batch_size=bs, shuffle=True),\n",
|
|
" DataLoader(valid_ds, batch_size=bs * 2),\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Now, our whole process of obtaining the data loaders and fitting the\n",
|
|
"model can be run in 3 lines of code:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
|
|
"model, opt = get_model()\n",
|
|
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"You can use these basic 3 lines of code to train a wide variety of models.\n",
|
|
"Let's see if we can use them to train a convolutional neural network (CNN)!\n",
|
|
"\n",
|
|
"Switch to CNN\n",
|
|
"-------------\n",
|
|
"\n",
|
|
"We are now going to build our neural network with three convolutional layers.\n",
|
|
"Because none of the functions in the previous section assume anything about\n",
|
|
"the model form, we'll be able to use them to train a CNN without any modification.\n",
|
|
"\n",
|
|
"We will use Pytorch's predefined\n",
|
|
"`Conv2d <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>`_ class\n",
|
|
"as our convolutional layer. We define a CNN with 3 convolutional layers.\n",
|
|
"Each convolution is followed by a ReLU. At the end, we perform an\n",
|
|
"average pooling. (Note that ``view`` is PyTorch's version of numpy's\n",
|
|
"``reshape``)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"class Mnist_CNN(nn.Module):\n",
|
|
" def __init__(self):\n",
|
|
" super().__init__()\n",
|
|
" self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n",
|
|
" self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n",
|
|
" self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n",
|
|
"\n",
|
|
" def forward(self, xb):\n",
|
|
" xb = xb.view(-1, 1, 28, 28)\n",
|
|
" xb = F.relu(self.conv1(xb))\n",
|
|
" xb = F.relu(self.conv2(xb))\n",
|
|
" xb = F.relu(self.conv3(xb))\n",
|
|
" xb = F.avg_pool2d(xb, 4)\n",
|
|
" return xb.view(-1, xb.size(1))\n",
|
|
"\n",
|
|
"lr = 0.1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"`Momentum <https://cs231n.github.io/neural-networks-3/#sgd>`_ is a variation on\n",
|
|
"stochastic gradient descent that takes previous updates into account as well\n",
|
|
"and generally leads to faster training.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model = Mnist_CNN()\n",
|
|
"opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
|
|
"\n",
|
|
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"nn.Sequential\n",
|
|
"------------------------\n",
|
|
"\n",
|
|
"``torch.nn`` has another handy class we can use to simplify our code:\n",
|
|
"`Sequential <https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential>`_ .\n",
|
|
"A ``Sequential`` object runs each of the modules contained within it, in a\n",
|
|
"sequential manner. This is a simpler way of writing our neural network.\n",
|
|
"\n",
|
|
"To take advantage of this, we need to be able to easily define a\n",
|
|
"**custom layer** from a given function. For instance, PyTorch doesn't\n",
|
|
"have a `view` layer, and we need to create one for our network. ``Lambda``\n",
|
|
"will create a layer that we can then use when defining a network with\n",
|
|
"``Sequential``.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"class Lambda(nn.Module):\n",
|
|
" def __init__(self, func):\n",
|
|
" super().__init__()\n",
|
|
" self.func = func\n",
|
|
"\n",
|
|
" def forward(self, x):\n",
|
|
" return self.func(x)\n",
|
|
"\n",
|
|
"\n",
|
|
"def preprocess(x):\n",
|
|
" return x.view(-1, 1, 28, 28)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"The model created with ``Sequential`` is simply:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model = nn.Sequential(\n",
|
|
" Lambda(preprocess),\n",
|
|
" nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.AvgPool2d(4),\n",
|
|
" Lambda(lambda x: x.view(x.size(0), -1)),\n",
|
|
")\n",
|
|
"\n",
|
|
"opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
|
|
"\n",
|
|
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Wrapping DataLoader\n",
|
|
"-----------------------------\n",
|
|
"\n",
|
|
"Our CNN is fairly concise, but it only works with MNIST, because:\n",
|
|
" - It assumes the input is a 28\\*28 long vector\n",
|
|
" - It assumes that the final CNN grid size is 4\\*4 (since that's the average\n",
|
|
"pooling kernel size we used)\n",
|
|
"\n",
|
|
"Let's get rid of these two assumptions, so our model works with any 2d\n",
|
|
"single channel image. First, we can remove the initial Lambda layer by\n",
|
|
"moving the data preprocessing into a generator:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def preprocess(x, y):\n",
|
|
" return x.view(-1, 1, 28, 28), y\n",
|
|
"\n",
|
|
"\n",
|
|
"class WrappedDataLoader:\n",
|
|
" def __init__(self, dl, func):\n",
|
|
" self.dl = dl\n",
|
|
" self.func = func\n",
|
|
"\n",
|
|
" def __len__(self):\n",
|
|
" return len(self.dl)\n",
|
|
"\n",
|
|
" def __iter__(self):\n",
|
|
" batches = iter(self.dl)\n",
|
|
" for b in batches:\n",
|
|
" yield (self.func(*b))\n",
|
|
"\n",
|
|
"train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
|
|
"train_dl = WrappedDataLoader(train_dl, preprocess)\n",
|
|
"valid_dl = WrappedDataLoader(valid_dl, preprocess)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\n",
|
|
"allows us to define the size of the *output* tensor we want, rather than\n",
|
|
"the *input* tensor we have. As a result, our model will work with any\n",
|
|
"size input.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model = nn.Sequential(\n",
|
|
" nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n",
|
|
" nn.ReLU(),\n",
|
|
" nn.AdaptiveAvgPool2d(1),\n",
|
|
" Lambda(lambda x: x.view(x.size(0), -1)),\n",
|
|
")\n",
|
|
"\n",
|
|
"opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's try it out:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Using your GPU\n",
|
|
"---------------\n",
|
|
"\n",
|
|
"If you're lucky enough to have access to a CUDA-capable GPU (you can\n",
|
|
"rent one for about $0.50/hour from most cloud providers) you can\n",
|
|
"use it to speed up your code. First check that your GPU is working in\n",
|
|
"Pytorch:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"print(torch.cuda.is_available())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"And then create a device object for it:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"dev = torch.device(\n",
|
|
" \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Let's update ``preprocess`` to move batches to the GPU:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"def preprocess(x, y):\n",
|
|
" return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n",
|
|
"\n",
|
|
"\n",
|
|
"train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
|
|
"train_dl = WrappedDataLoader(train_dl, preprocess)\n",
|
|
"valid_dl = WrappedDataLoader(valid_dl, preprocess)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Finally, we can move our model to the GPU.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"model.to(dev)\n",
|
|
"opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"You should find it runs faster now:\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 0,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
],
|
|
"source": [
|
|
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"Closing thoughts\n",
|
|
"-----------------\n",
|
|
"\n",
|
|
"We now have a general data pipeline and training loop which you can use for\n",
|
|
"training many types of models using Pytorch. To see how simple training a model\n",
|
|
"can now be, take a look at the `mnist_sample` sample notebook.\n",
|
|
"\n",
|
|
"Of course, there are many things you'll want to add, such as data augmentation,\n",
|
|
"hyperparameter tuning, monitoring training, transfer learning, and so forth.\n",
|
|
"These features are available in the fastai library, which has been developed\n",
|
|
"using the same design approach shown in this tutorial, providing a natural\n",
|
|
"next step for practitioners looking to take their models further.\n",
|
|
"\n",
|
|
"We promised at the start of this tutorial we'd explain through example each of\n",
|
|
"``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\n",
|
|
"what we've seen:\n",
|
|
"\n",
|
|
" - **torch.nn**\n",
|
|
"\n",
|
|
" + ``Module``: creates a callable which behaves like a function, but can also\n",
|
|
" contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n",
|
|
" contains and can zero all their gradients, loop through them for weight updates, etc.\n",
|
|
" + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n",
|
|
" that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n",
|
|
" + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n",
|
|
" which contains activation functions, loss functions, etc, as well as non-stateful\n",
|
|
" versions of layers such as convolutional and linear layers.\n",
|
|
" - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n",
|
|
" of ``Parameter`` during the backward step\n",
|
|
" - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n",
|
|
" including classes provided with Pytorch such as ``TensorDataset``\n",
|
|
" - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (Ubuntu Linux)",
|
|
"language": "python",
|
|
"name": "python3-ubuntu",
|
|
"resource_dir": "/usr/local/share/jupyter/kernels/python3-ubuntu"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.10"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |