SGD in PyTorch

hyperkai

Super Kai (Kazuya Ito)

Posted on October 4, 2024

SGD in PyTorch

Buy Me a Coffee

*Memos:

  • My post explains CGD(Classic Gradient Descent), Momentum and Nesterov's Momentum.
  • My post explains Module().

SGD() can do the basic gradient descent with or without Momentum or Nesterov's Momentum as shown below. *SGD() in PyTorch is Classic(Basic) Gradient Descent(CGD) but not Stochastic Gradient Descent(SGD):

*Memos:

  • The 1st argument for initialization is params(Required-Type:generator).
  • The 2nd argument for initialization is lr(Optional-Default:0.001-Type:int or float). *It must be 0 <= x.
  • The 3rd argument for initialization is momentum(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 4th argument for initialization is dampening(Optional-Default:0-Type:int or float).
  • The 5th argument for initialization is weight_decay(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 6th argument for initialization is nesterov(Optional-Default:False-Type:bool). *If it's True, Nesterov's Momentum is used while if it's False, Momentum is used.
  • There is maximize argument for initialization(Optional-Default:False-Type:bool). *maximize= must be used.
  • There is foreach argument for initialization(Optional-Default:None-Type:bool). *foreach= must be used.
  • There is differentiable argument for initialization(Optional-Default:False-Type:bool). *differentiable= must be used.
  • There is fused argument for initialization(Optional-Default:None-Type:bool). *fused= must be used.
  • Both foreach and fused cannot be True.
  • Both differentiable and fused cannot be True.
  • step() can update parameters.
  • zero_grad() can reset gradients.
from torch import nn
from torch import optim

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=4, out_features=5)

    def forward(self, x):
        return self.linear_layer(x)

mymodel = MyModel()

sgd = optim.SGD(params=mymodel.parameters())
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )

sgd.state_dict()
# {'state': {},
#  'param_groups': [{'lr': 0.001,
#    'momentum': 0,
#    'dampening': 0,
#    'weight_decay': 0,
#    'nesterov': False,
#    'maximize': False,
#    'foreach': None,
#    'differentiable': False,
#    'fused': None,
#    'params': [0, 1]}]}

sgd.step()
sgd.zero_grad()
# None

sgd = optim.SGD(params=mymodel.parameters(), lr=0.001, momentum=0, 
            dampening=0, weight_decay=0, nesterov=False, maximize=False, 
            foreach=None, differentiable=False, fused=None)
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
hyperkai
Super Kai (Kazuya Ito)

Posted on October 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

SGD in PyTorch
python SGD in PyTorch

October 4, 2024