累积梯度
默认情况是每个batch 之后都更新一次梯度,当然也可以N个batch后再更新,这样就有了大batch size 更新的效果了,例如当你内存很小,训练的batch size 设置的很小,这时候就可以采用累积梯度:
1 2
| trainer = Trainer(accumulate_grad_batches=1)
|
ModelCheckpoint
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| from pytorch_lightning import Trainer, ModelCheckpoint
checkpoint_callback = ModelCheckpoint( dirpath='checkpoints/', filename='my_model_{epoch:02d}_step{global_step:05d}_{val_loss:.4f}', save_top_k=1, monitor='val_loss', mode='min' )
trainer = Trainer(callbacks=[checkpoint_callback])
trainer.fit(model)
|
恢复训练
1 2 3 4 5 6 7
| from pytorch_lightning import Trainer
trainer = Trainer(resume_from_checkpoint='checkpoints/my_model_09_val_loss_0.4567.ckpt')
trainer.fit(model)
|
版本
lighting |
torch |
|
2.2 |
[1.13, 2.2] |
|
2.1 |
[1.12, 2.1] |
|
2.0 |
[1.11, 2.0] |
|
1.4.2 |
[1.6, 1.9] |
|
REF
Pytorch Lightning框架:使用笔记【LightningModule、LightningDataModule、Trainer、ModelCheckpoint】_from lightning.pytorch import trainer-CSDN博客
pytorch lightning–ModelCheckpoint_pytorch_lightning.callbacks.modelcheckpoint-CSDN博客
Pytorch-Lightning中的训练器–Trainer_pytorch_lightning.trainer-CSDN博客