nerva_numpy.datasets
In-memory data loader helpers and one-hot conversions.
The DataLoader defined here mirrors a small subset of the PyTorch DataLoader API but operates on in-memory tensors loaded from .npz files.
Functions
|
Creates a data loader from a file containing a dictionary with Xtrain, Ttrain, Xtest and Ttest tensors. |
|
Convert one-hot encoded rows to class index tensor. |
|
Infer total number of classes from targets. |
|
Return the maximum element of X as a Python scalar. |
|
Convert class index tensor to one-hot matrix with num_classes columns. |
Classes
|
A minimal in-memory data loader with an interface similar to torch.utils.data.DataLoader. |
- nerva_numpy.datasets.to_one_hot(x: numpy.ndarray, num_classes: int)[source]
Convert class index tensor to one-hot matrix with num_classes columns.
- nerva_numpy.datasets.from_one_hot(one_hot: numpy.ndarray) numpy.ndarray [source]
Convert one-hot encoded rows to class index tensor.
- class nerva_numpy.datasets.DataLoader(Xdata: numpy.ndarray, Tdata: numpy.ndarray, batch_size: int, num_classes=0)[source]
Bases:
object
A minimal in-memory data loader with an interface similar to torch.utils.data.DataLoader.
Notes / Warning:
When Tdata contains class indices (shape (N,) or (N,1)), this loader will one-hot encode the labels. If num_classes is not provided, it will be inferred as max(Tdata) + 1.
On small datasets or subsets where some classes are absent, this inference can underestimate the true number of classes and produce one-hot targets with too few columns. This may cause dimension mismatches with the model output during training/evaluation.
To avoid this, pass num_classes explicitly whenever you know the total number of classes.
- property dataset_size
Total number of examples.
- nerva_numpy.datasets.max_(X: numpy.ndarray) int | float [source]
Return the maximum element of X as a Python scalar.
- nerva_numpy.datasets.infer_num_classes(Ttrain: numpy.ndarray, Ttest: numpy.ndarray) int [source]
Infer total number of classes from targets.
If either Ttrain or Ttest is one-hot encoded (2D with width > 1), use that width.
Otherwise assume class indices and return max over both + 1.
- nerva_numpy.datasets.create_npz_dataloaders(filename: str, batch_size: int = True) Tuple[DataLoader, DataLoader] [source]
Creates a data loader from a file containing a dictionary with Xtrain, Ttrain, Xtest and Ttest tensors.