mirror of https://github.com/davrot/pytutorial.git synced 2025-06-05 19:00:02 +02:00

History

David Rotermund 4373fc8872 Create README.md Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>		2024-01-02 20:51:54 +01:00
..
README.md	Create README.md	2024-01-02 20:51:54 +01:00

Train the network

{:.no_toc}

* TOC {:toc}

Top

Training the weights and biases of the network.

Questions to David Rotermund

An example setup

Network

import torch

# Some parameters
input_number_of_channel: int = 1
input_dim_x: int = 24
input_dim_y: int = 24

number_of_output_channels_conv1: int = 32
number_of_output_channels_conv2: int = 64
number_of_output_channels_flatten1: int = 576
number_of_output_channels_full1: int = 10

kernel_size_conv1: tuple[int, int] = (5, 5)
kernel_size_pool1: tuple[int, int] = (2, 2)
kernel_size_conv2: tuple[int, int] = (5, 5)
kernel_size_pool2: tuple[int, int] = (2, 2)

stride_conv1: tuple[int, int] = (1, 1)
stride_pool1: tuple[int, int] = (2, 2)
stride_conv2: tuple[int, int] = (1, 1)
stride_pool2: tuple[int, int] = (2, 2)

padding_conv1: int = 0
padding_pool1: int = 0
padding_conv2: int = 0
padding_pool2: int = 0

network = torch.nn.Sequential(
    torch.nn.Conv2d(
        in_channels=input_number_of_channel,
        out_channels=number_of_output_channels_conv1,
        kernel_size=kernel_size_conv1,
        stride=stride_conv1,
        padding=padding_conv1,
    ),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(
        kernel_size=kernel_size_pool1, stride=stride_pool1, padding=padding_pool1
    ),
    torch.nn.Conv2d(
        in_channels=number_of_output_channels_conv1,
        out_channels=number_of_output_channels_conv2,
        kernel_size=kernel_size_conv2,
        stride=stride_conv2,
        padding=padding_conv2,
    ),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(
        kernel_size=kernel_size_pool2, stride=stride_pool2, padding=padding_pool2
    ),
    torch.nn.Flatten(
        start_dim=1,
    ),
    torch.nn.Linear(
        in_features=number_of_output_channels_flatten1,
        out_features=number_of_output_channels_full1,
        bias=True,
    ),
    torch.nn.Softmax(dim=1),
)

Data augmentation

import torchvision

test_processing_chain = torchvision.transforms.Compose(
    transforms=[torchvision.transforms.CenterCrop((24, 24))],
)

train_processing_chain = torchvision.transforms.Compose(
    transforms=[torchvision.transforms.RandomCrop((24, 24))],
)

What makes it learn?

Optimizer Algorithms

This is just a small selection of optimizers (i.e. the algorithm that learns the weights based on a loss). Nevertheless, typically Adam or SGD will be the first algorithm you will try.


Adagrad	Implements Adagrad algorithm.
Adam	Implements Adam algorithm.
ASGD	Implements Averaged Stochastic Gradient Descent.
RMSprop	Implements RMSprop algorithm.
Rprop	Implements the resilient backpropagation algorithm.
SGD https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD	Implements stochastic gradient descent (optionally with momentum).

Learning rate scheduler

"torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs."

Why do you want to reduce the learning rate: Well, typically you want to start with a large learning rate for jumping over local minima but later you want to anneal the learning rate because otherwise the optimizer will jump over / oscillate around the minima.

A non-representative selection is


lr_scheduler.StepLR	Decays the learning rate of each parameter group by gamma every step_size epochs.
lr_scheduler.MultiStepLR	Decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones.
lr_scheduler.ConstantLR	Decays the learning rate of each parameter group by a small constant factor until the number of epoch reaches a pre-defined milestone: total_iters.
lr_scheduler.LinearLR	Decays the learning rate of each parameter group by linearly changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: total_iters.
lr_scheduler.ExponentialLR	Decays the learning rate of each parameter group by gamma every epoch.
lr_scheduler.ReduceLROnPlateau	Reduce learning rate when a metric has stopped improving.

However, typically I only use lr_scheduler.ReduceLROnPlateau.

README.md