Welcome to torch-train’s documentation!¶
torch-train provides a wrapper around the torch.nn.Module to provide scikit-learn like fit()
and predict()
methods.
Installation¶
The most straigtforward way of installing torch-train is via pip
pip install torch-train
If you wish to stay up to date with the latest development version, you can instead download the source code. In this case, make sure that you have all the required dependencies installed.
Dependencies¶
torch-train requires the following python packages to be installed:
- pytorch: https://pytorch.org/
All dependencies should be automatically downloaded if you install torch-train via pip. However, should you want to install these libraries manually, you can install the dependencies using the requirements.txt file
pip install -r requirements.txt
Or you can install these libraries yourself
pip install -U torch
Usage¶
This section gives a high-level overview of the modules implemented by torch-train. We also include a working example to guide users through the code. For detailed documentation of individual methods, we refer to the Reference guide.
Overview¶
This section explains the design of torch-train on a high level.
torch-train is a network that is implemented as an extension of torch.nn.Module
.
This means it can be trained and used as any neural network module in the pytorch library.
In addition, we provide automatic methods to train and predict given data tensors using our Module extension.
This follows a scikit-learn
approach with fit()
, predict()
and fit_predict()
methods.
We refer to the Reference for a detailed description.
Code¶
To use torch-train into your own project, you can use it in place of the torch.nn.Module. Here we show some simple examples on how to use the torch-train Module in your own python code. For a complete documentation we refer to the Reference guide.
Working example¶
In this example, we create a basic torch Module and use its fit()
and predict()
methods to train and test.
First we import torch
and the torchtrain
Module
# imports
import torch
import torch.nn as nn
from torchtrain import Module
Next we create our simple network consisting of 2 layers and a softmax output function.
Note
We extend the torchtrain.Module
instead of the torch.nn.Module
like you normally would.
Furthermore we implement the forward()
method to propagate through the network.
class MyNetwork(Module):
def __init__(self, size_input, size_hidden, size_output):
"""Create simple network"""
# Initialise super
super().__init__()
# Set layers
self.layer_1 = nn.Linear(size_input , size_hidden)
self.layer_2 = nn.Linear(size_hidden, size_output)
self.softmax = nn.LogSoftmax(dim=1)
def forward(X):
"""Forward through network"""
# Propagate layer 1
out = self.layer_1(X)
# Propagate layer 2
out = self.layer_2(out)
# Propagate softmax layer
out = self.softmax(out)
# Return result
return out
Now that we have created our network, we generate some random training and testing data.
# Generate random data
X_train = torch.rand((1024, 10))
y_train = (torch.rand(1024)*10).to(torch.int64)
X_test = torch.rand((1024, 10))
y_test = (torch.rand(1024)*10).to(torch.int64)
Finally, we create the network and invoke its fit()
and predict()
methods.
# Create network
net = MyNetwork(10, 128, 10)
# Fit network
net.fit(X_train, y_train,
epochs = 10,
batch_size = 32,
learning_rate = 0.01,
criterion = nn.NLLLoss(),
optimizer = optim.SGD,
variable = False,
verbose = True
)
# Predict network
y_pred = net.predict(X_test,
batch_size = 32,
variable = False,
verbose = True
)
Reference¶
This is the reference documentation for the classes and methods objects provided by the torch-train module.
Module¶
The Module class is an extension of the torch.nn.Module object.
This class implements scikit-learn-like fit()
and predict()
methods to automatically use nn.Module
objects for training and predicting labels.
This module also automatically keeps track of the progress during fitting and predicting.
-
class
module.
Module
(*args, **kwargs)[source]¶ Extention of nn.Module that adds fit and predict methods Can be used for automatic training.
-
progress
¶ Used to track progress of fit and predict methods
Type: Progress()
-
Initialization¶
Fit¶
To train a Module, we use the fit()
method.
-
Module.
fit
(X, y, epochs=10, batch_size=32, learning_rate=0.01, criterion=<sphinx.ext.autodoc.importer._MockObject object>, optimizer=<sphinx.ext.autodoc.importer._MockObject object>, variable=False, verbose=True, **kwargs)[source]¶ Train the module with given parameters
Parameters: - X (torch.Tensor) – Tensor to train with
- y (torch.Tensor) – Target tensor
- epochs (int, default=10) – Number of epochs to train with
- batch_size (int, default=32) – Default batch size to use for training
- learning_rate (float, default=0.01) – Learning rate to use for optimizer
- criterion (nn.Loss, default=nn.NLLLoss()) – Loss function to use
- optimizer (optim.Optimizer, default=optim.SGD) – Optimizer to use for training
- variable (boolean, default=False) – If True, accept inputs of variable length
- verbose (boolean, default=True) – If True, prints training progress
Returns: result – Returns self
Return type: self
Predict¶
A module can also predict()
outputs for given inputs.
Usually this method is called after fitting.
-
Module.
predict
(X, batch_size=32, variable=False, verbose=True, **kwargs)[source]¶ Makes prediction based on input data X. Default implementation just uses the module forward(X) method, often the predict method will be overwritten to fit the specific needs of the module.
Parameters: - X (torch.Tensor) – Tensor from which to make prediction
- batch_size (int, default=32) – Batch size in which to predict items in X
- variable (boolean, default=False) – If True, accept inputs of variable length
- verbose (boolean, default=True) – If True, print progress of prediction
Returns: result – Resulting prediction
Return type: torch.Tensor
Fit-Predict¶
Sometimes we want to fit()
and predict()
on the same data.
This can easily be achieved using the fit_predict()
method.
-
Module.
fit_predict
(X, y, epochs=10, batch_size=32, learning_rate=0.01, criterion=<sphinx.ext.autodoc.importer._MockObject object>, optimizer=<sphinx.ext.autodoc.importer._MockObject object>, variable=False, verbose=True, **kwargs)[source]¶ Train the module with given parameters
Parameters: - X (torch.Tensor) – Tensor to train with
- y (torch.Tensor) – Target tensor
- epochs (int, default=10) – Number of epochs to train with
- batch_size (int, default=32) – Default batch size to use for training
- learning_rate (float, default=0.01) – Learning rate to use for optimizer
- criterion (nn.Loss, default=nn.NLLLoss) – Loss function to use
- optimizer (optim.Optimizer, default=optim.SGD) – Optimizer to use for training
- variable (boolean, default=False) – If True, accept inputs of variable length
- verbose (boolean, default=True) – If True, prints training progress
Returns: result – Resulting prediction
Return type: torch.Tensor
Progress¶
The Progress
class is used to track the progress of training and prediction.
Initialization¶
Reset¶
To restart the Progress, we use the reset()
method.
This sets the amount of items we expect to train with and the number of epochs we use for training.
Update¶
A module will update using the update()
method, which automatically prints the progress.
Second, when we move to the next epoch we use the update_epoch()
method.
Variable Data Loader¶
The Variable Data Loader class is a different implementation of the iterable torch.utils.data.DataLoader class that can handle inputs of variable lengths.
-
class
variable_data_loader.
VariableDataLoader
(X, y, index=False, batch_size=1, shuffle=True)[source]¶ Load data from variable length inputs
-
lengths
¶ Dictionary of input-length -> input samples
Type: dict()
-
index
¶ If True, also returns original index
Type: boolean, default=False
-
batch_size
¶ Size of each batch to output
Type: int, default=1
-
shuffle
¶ If True, shuffle the data randomly, each yielded batch contains only input items of the same length
Type: boolean, default=True
-
Initialization¶
-
VariableDataLoader.
__init__
(X, y, index=False, batch_size=1, shuffle=True)[source]¶ Load data from variable length inputs
Parameters: - X (iterable of shape=(n_samples,)) – Input sequences Each item in iterable should be a sequence (of variable length)
- y (iterable of shape=(n_samples,)) – Labels corresponding to X
- index (boolean, default=False) – If True, also returns original index
- batch_size (int, default=1) – Size of each batch to output
- shuffle (boolean, default=True) – If True, shuffle the data randomly, each yielded batch contains only input items of the same length
Iterable¶
The VariableDataLoader is an iterable object that iterates through the entire dataset.
The same object can be called multiple times due to the reset method, which automatically resets the iterable after a complete iteration.
Note that it can also be set manually using the reset()
method.