nerva_numpy.optimizers

Optimizers used to adjusts the model’s parameters based on the gradients.

Only SGD, Momentum and Nesterov variants are provided. The parser creates factory callables from textual specifications like “Momentum(mu=0.9)”.

Functions

parse_optimizer(text)

Parse a textual optimizer specification into a factory function.

Classes

CompositeOptimizer(optimizers)

Combines multiple optimizers to update different parameter groups.

GradientDescentOptimizer(x, Dx)

Standard gradient descent optimizer: x -= eta * grad.

MomentumOptimizer(x, Dx, mu)

Gradient descent with momentum for accelerated convergence.

NesterovOptimizer(x, Dx, mu)

Nesterov accelerated gradient descent optimizer.

Optimizer()

Minimal optimizer interface used by layers to update parameters.

class nerva_numpy.optimizers.Optimizer[source]

Bases: object

Minimal optimizer interface used by layers to update parameters.

update(eta)[source]
class nerva_numpy.optimizers.CompositeOptimizer(optimizers: List[Optimizer])[source]

Bases: Optimizer

Combines multiple optimizers to update different parameter groups.

update(eta)[source]

Update all contained optimizers with the given learning rate.

class nerva_numpy.optimizers.GradientDescentOptimizer(x, Dx)[source]

Bases: Optimizer

Standard gradient descent optimizer: x -= eta * grad.

update(eta)[source]

Apply gradient descent update step.

class nerva_numpy.optimizers.MomentumOptimizer(x, Dx, mu)[source]

Bases: GradientDescentOptimizer

Gradient descent with momentum for accelerated convergence.

update(eta)[source]

Apply momentum update step.

class nerva_numpy.optimizers.NesterovOptimizer(x, Dx, mu)[source]

Bases: MomentumOptimizer

Nesterov accelerated gradient descent optimizer.

update(eta)[source]

Apply Nesterov accelerated gradient update step.

nerva_numpy.optimizers.parse_optimizer(text: str) Callable[[Any, Any], Optimizer][source]

Parse a textual optimizer specification into a factory function.

Returns a callable that takes (x, Dx) and produces an Optimizer. Supported names: GradientDescent, Momentum(mu=…), Nesterov(mu=…).