pytorch_pfn_extras.nn.parallel.DistributedDataParallel¶
- class pytorch_pfn_extras.nn.parallel.DistributedDataParallel(module, broadcast_buffers=True, negotiate_grads=True, process_group=None, reduce_function=None, broadcast_function=None, **kwargs)¶
Module for distributed data parallelism
This class synchronizes the gradients and the buffers after backward computations.
- Parameters
module (torch.nn.modules.module.Module) – torch.nn.Module object to be trained
broadcast_buffers (bool) – Boolean flag to broadcast buffers after backward computations. Broadcasting buffers may be helpful when the module includes BatchNormalization. However, it will degrade training throughput. (default: True)
negotiate_grads (bool) – Boolean flag to choose gradients to be sent before all-reduce. This flag is necessary when the computation graph of the module is dynamic. (default: True)
process_group (Optional[torch._C._distributed_c10d.ProcessGroup]) – Process group used for broadcasting and reducing. (default: torch.distributed.group.WORLD)
reduce_function (Optional[Callable[[Sequence[torch.Tensor], Optional[torch._C._distributed_c10d.ProcessGroup]], None]]) – All-reduce function
broadcast_function (Optional[Callable[[Sequence[torch.Tensor], Optional[torch._C._distributed_c10d.ProcessGroup]], None]]) – Broadcast function
kwargs (Any) –
- Return type
None
- __init__(module, broadcast_buffers=True, negotiate_grads=True, process_group=None, reduce_function=None, broadcast_function=None, **kwargs)¶
This module receives keyword arguments for the compatibility with torch.nn.parallel.DistributedDataParallel. It shows a warning when setting the ignored arguments.
- Parameters
module (torch.nn.modules.module.Module) –
broadcast_buffers (bool) –
negotiate_grads (bool) –
process_group (Optional[torch._C._distributed_c10d.ProcessGroup]) –
reduce_function (Optional[Callable[[Sequence[torch.Tensor], Optional[torch._C._distributed_c10d.ProcessGroup]], None]]) –
broadcast_function (Optional[Callable[[Sequence[torch.Tensor], Optional[torch._C._distributed_c10d.ProcessGroup]], None]]) –
kwargs (Any) –
- Return type
None
Methods
__init__
(module[, broadcast_buffers, …])This module receives keyword arguments for the compatibility with torch.nn.parallel.DistributedDataParallel.
add_module
(name, module)Adds a child module to the current module.
apply
(fn)Applies
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Returns an iterator over module buffers.
children
()Returns an iterator over immediate children modules.
cpu
()Moves all model parameters and buffers to the CPU.
cuda
([device])Moves all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Sets the module in evaluation mode.
extra_repr
()Set the extra representation of the module
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(*args, **kwargs)Defines the computation performed at every call.
get_buffer
(target)Returns the buffer given by
target
if it exists, otherwise throws an error.get_parameter
(target)Returns the parameter given by
target
if it exists, otherwise throws an error.get_submodule
(target)Returns the submodule given by
target
if it exists, otherwise throws an error.half
()Casts all floating point parameters and buffers to
half
datatype.load_state_dict
(state_dict[, strict])Copies parameters and buffers from
state_dict
into this module and its descendants.modules
()Returns an iterator over all modules in the network.
named_buffers
([prefix, recurse])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
no_sync
()A context manager to disable synchronization after backward
parameters
([recurse])Returns an iterator over module parameters.
register_backward_hook
(hook)Registers a backward hook on the module.
register_buffer
(name, tensor[, persistent])Adds a buffer to the module.
register_comm_hook
(hook)Registers a hook function.
register_forward_hook
(hook)Registers a forward hook on the module.
register_forward_pre_hook
(hook)Registers a forward pre-hook on the module.
register_full_backward_hook
(hook)Registers a backward hook on the module.
register_parameter
(name, param)Adds a parameter to the module.
requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
share_memory
()See
torch.Tensor.share_memory_()
state_dict
()Returns a dictionary containing a whole state of the module.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
to_empty
(*, device)Moves the parameters and buffers to the specified device without copying storage.
train
([mode])Sets the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Moves all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Sets gradients of all model parameters to zero.
Attributes
T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping
[str
,torch.Tensor
])dump_patches
This allows better BC support for
load_state_dict()
.