Chainer: 一个用于深度学习的神经网络灵活框架
Chainer: 一个用于深度学习的神经网络灵活框架v4.0.0b2
niboshi released this
This is the release of v4.0.0b2. See here for the complete list of solved issues and merged PRs.

In this release, you can set up an optimizer with a simpler syntax.
In previous versions, the code would be written asoptimizer = chainer.optimizer.SGD() optimizer.setup(model)
We now also allow it to be written more concisely as
optimizer = chainer.optimizers.SGD(link=model)
The
link
argument should be specified as a keyword argument. Otherwise, some optimizers could wrongly interpret it as a hyperparameter (e.g.lr
). We will enforce a keyword argument from the next release. 
We introduced a check for mixed use of CuPy arrays and NumPy arrays in outputs returned from functions. Even though we have previously forbidden this, such functions may have worked without any errors. With the introduction of this check, however, those functions can begin raising errors.
Known Issues
 Grouped convolution/deconvolution does not work in CPU mode with NumPy 1.9 (#4081). This issue is planned to be resolved in the next release.
Changes without compatibility
 Check for mixed use of CuPy/NumPy ndarrays in functions (#4029)
New Features
 Overlap data transfer and GPU kernels (#3336, thanks @anaruse!)
 Add early stopping (#3351, thanks @himkt!)
 Enable optimizer model setup with instantiation (#3488)
 Grouped convolution (#3494, thanks @anaruse!)
 Add
extensions
as a trainer argument (#3528, thanks @nekanat!)  Support parameter update in FP32 (#3708, thanks @anaruse!)
sign
function (#3678) Add more Functions with doublebackprop support: maximum (#3533), im2col (#3587), batch_l2_norm_squared (#3642), expm1 function (#3644), linear_interpolate (#3663), mean_absolute_error (#3672), squared_error (#3691), sigmoid_cross_entropy (#3705), absolute_error (#3707), Gaussian (#3759), det, batch_det (#3767), cross_covariance (#3866), normalize (#3870), bilinear (#3917), negative_sampling (#3992)
Improvements
 Userfriendly error checks for pooling input (#3555)
 Verbose error messages in gradient check (#3833)
 Verbose error messages in basic_math (#3839)
 Refactor (de)convolution_2d (#3848)
 Allow
to_cpu
andto_gpu
to accept list, tuple andNone
(#3850)  Move
should_use_cudnn
andshould_use_cudnn_tensor_core
tochainer.cuda
(#3851)  Skip unnecessary array util ops (#3932)
 Avoid unnecessary
hasattr
(#3952)  Add
chainer.backends
subpackage (#3974)  Fix cuda import path (#4036)
 Remove an unused function (#4051)
Bug Fixes
 Fix backprop dim for unused lstm states on gpu (#3042, thanks @andreasgrv!)
 Forget inputs as Variables (#3788)
 Skip cuDNN in deconvolution_2d if dilate != 1 and deterministic (#3875)
 Fix debug_print() with empty Variable (#4018)
 Avoid mixing cupy.ndarray and numpy.ndarray in n_step_xxx links (#4030)
 Fix F.convolution_2d and F.deconvolution_2d to work without cuDNN (#4062)
 Fix test failure with cuDNN v6 (#4078)
Examples
 Add
noplot
option in MNIST example (#3925)
Documentation
 Add word2vec tutorial (#3040)
 Add ptb tutorial (#3073)
 Replace array in type lists with numpy or cupy ndarray (#3259)
 Improve documents of the debug mode (#3347)
 Function references in docs to point to
FunctionNode
(#3626)  Fix the example in the documentation of
Reporter
(#3795)  Improve documentation of
GRU
(#3858)  Fix documentation in
n_step_gru
,n_step_bigru
,n_step_bilstm
,n_step_rnn
andn_step_birnn
(#3859)  Fix CuPy requirment version (#3899)
 Add
expm1
to the documentation. (#3900)  Fix a formula in the tutorial (#3909, thanks @keisukenakata!)
 Fix typo (#3914, thanks @okayu9!)
 Fix example code in the trainer tutorial (#3926, thanks @keisukenakata!)
 Fix doctest in the trainer tutorial (#3942)
 Fix
GlorotUniform
documentation (#3953, thanks @FTag!)  Add
StatefulZoneoutLSTM
to documentation (#3957)  Small fix for the seq2seq example (#3964)
 Add
ConcatWithAsyncTransfer
to the reference manual (#3975, #3979)  Fix a code fragment in contribution guide (#3982, thanks @anaruse!)
 Fix documentations of negative sampling (#3988)
 Add dilate argument to documentation (#4011)
 Fix broken link in
chainer.function.pad
documentation (#4028)
Tests
 Add ability to check nondifferentiable inputs in
gradient_check.numerical_grad
(#3551, #4003)  Refactor unit tests for various backend configuration (#3862)
 Catch all exceptions in parameterized test (#3876)
 Avoid NaN in test of
F.classification_summary
(#3927)  Avoid NaN error in
test_pad_sequence
in debug mode (#3946)  Show original traceback in
testing.parameterized
(#3954)  Fix macOS test in Travis (#3990)
 Simplify
to_gpu
in RNN tests (#4046)  Adjust numerical tolerances:
convolution_nd
(#3910),im2col
(#3933),triplet
(#3939),linear_interpolate
(#3944)
This is a minor release. See the list for the complete list of solved issues and merged PRs.
Spotlight features
 A lot of new doublebackpropable functions have been added.
 Autotuner for cuDNN convolution functions is now available. Just add this one line
chainer.global_config.autotune = True
for optimizing your ConvNets.
New Features
 New functions: F.fix (#3834)
 Functions with doublebackprop support: where (#3505), softplus (#3593), clipped_relu (#3594), broadcast (#3650), hstack (#3666), dstack (#3890), square (#3681), ELU (#3730), minmax (#3732), log, log2, log10 (#3733), abs (#3734), sqrt (#3738), inv, batch_inv (#3743), div (#3750), pow (#3804), clip (#3805), resize_images (#3806), PReLU (#3814), minimum (#3815), triplet (#3817), floor (#3819), squared_difference (#3823), fliplr (#3827), flipud (#3828), fmod (#3834), pad_sequence (#3835), log1p (#3847), hard_sigmoid (#3849), CReLU (#3852), rdiv (#3857), ceil (#3860), logsumexp (#3877), cosh, sinh (#3879), depth2space, space2depth (#3880), sin, cos, tan, arcsin, arccos, arctan, arctan2 (#3881), tile (#3825), pad (#3855)
 cuDNN Convolution functions Autotuner (#3841)
Improvements
 Relax int type restriction (#3700)
 Allow to_gpu and to_cpu to accept NumPy scalars (#3748)
 Support filelike object in npz serializer (#3758, #3882)
 Avoid zerodivision warning in F.r2_score (#3777)
 Raise userfriendly error when FunctionNode is used like Function object (#3780)
 fix F.inv to raise exception when input has singular matrices (#3784)
 Remove unnecessary branch in minimum forward. (#3836)
 Check too small eps in Adam and RMSprop optimizers (#3783)
Bug fixes
 Prevent ZeroDivisionError in softmax_cross_entropy when input size is 0 (#3656, thanks @knorth55!)
 Fix xxx_pooling_nd causes CUDNN_STATUS_NOT_SUPPORTED for dims > 3 (#3722)
 Fix LSTM bias initialization (#3731)
 Fix the problem with resuming training when switching the freezing layers (#3800, thanks @jinjiren!)
 Avoid zero division error in linear init call (#3885)
Documents
 Add tutorials
 Improve TupleDataset documentation (#3438)
 Documentation fix in FunctionNode (#3444)
 Improve docs of n_step_lstm (#3471)
 Fix dead links to modules in tutorial (#3501)
 Improve doc of sum (#3502, thanks @akitotakeki!)
 Add get_conv_outsize and get_deconv_outsize to doc (#3597)
 Improve docs of huber_loss (#3605, thanks @naoto0804!)
 Improve docs of sigmoid_cross_entropy (#3606, thanks @naoto0804!)
 Improve docs of contrastive and triplet (#3607, thanks @naoto0804!)
 Fix documention error in Function (#3637)
 Add experimental warning in docstring (#3648)
 Add a note to the doc of Evaluator.evaluate (#3667)
 Fix CuPy intersphinx mapping (#3687)
 Document get_svhn (#3690)
 Fix CuPy overview link not working (#3695)
 Add CUDAProfileHook and CupyMemoryProfileHook to the reference (#3709, thanks @ronekko!)
 Fix split_axis documentation (#3712)
 Improve doc of context managers (#3719)
 Improve doc of configuration flags (#3720)
 Fix contribution guide for test framework change (#3726)
 Fix case in doc (#3749)
 Fix doc in Forget (#3773)
 Improve docs of F.forget (#3791)
 Document initializer criteria (#3801)
 Sort out navigation menu (#3812)
 Fix doctest failure in trainer tutorial (#3888)
 Fix typo (#3635, #3638, #3639)
 Fix doctest (#3647, #3651)
Tests
 Move to PyTest
 Use Python 3.4.4 on Travis OSX Python 3.4 case (#3629)
 Fix test_init_docstring (#3636)
 Fix math function testing helper to support new style functions. (#3665)
 Run OS X test only on master/stable branch to avoid delay (#3676)
 Always cast all inputs to given dtype in gradient check (#3679)
 Fix decorators to allow users to filter test cases by number of GPUs (#3683)
 Fix to skip GPU tests on AppVeyor (#3693)
 Fix math function test helper to support double backward test of linear functions(#3706)
 Richer gradient check output (#3713)
 Check deprecation warining in Travis (#3721)
 Fix decorators to allow users to filter test cases by number of GPUs (#3723)
 Use python 3.5 for doctest (#3727)
 Fix normalization warning in F.average test (#3729)
 Fix F.inv test does not test type error as expected (#3775)
 Directional derivative (#3790)
 Add doublebackward test for F.inv and F.batch_inv (#3820)
 Fix test condition in function tutorial (#3873)
 Setup random of Python library in testing/random (#3655)
 Fix coveragerc to measure branch coverage and only target chainer module (#3710)
 Test stability fix
Others
 Warn about vecLib on Mac OS X (#3692)
 Update stable version link in README (#3746)
 Improve version embedding (#3739)
 Rename plot > plt (#3714, thanks @Hakuyume!)
Install
 Remove requirements for unit testing (#3682)
Downloads
This is a major release of Chainer v3.0.0. All the updates from the previous major version (v2.0.0) are found in the release notes below:
 v3.0.0a1 (https://github.com/chainer/chainer/releases/tag/v3.0.0a1)
 v3.0.0b1 (https://github.com/chainer/chainer/releases/tag/v3.0.0b1)
 v3.0.0rc1 (https://github.com/chainer/chainer/releases/tag/v3.0.0rc1)
 v3.0.0 (this document)
The biggest change is the introduction of newstyle differentiable functions and resulting support for double backward (gradient of gradient) in many functions. The details are linked below:
 The newstyle differentiable function (see the details in the v3.0.0b1 release notes)
 Double backward support for many functions (see the list of almost all functions which support double backward in the v3.0.0rc1 release notes, and the others are listed below.)
As for the backward compatibility, most users of v2.x are not affected by the introduction of newstyle function FunctionNode
because the conventional Function
is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function
, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the newstyle function, e.g. double backprop.
The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.
Examples of grad of grad in Chainer
Usage of the grad
function
You can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad
function with enable_double_backprop=True
option.
# Both x and y are chainer.Variable objects
y = x * x * x / 3 # Construct a computational graph
gx, = chainer.grad([y], [x], enable_double_backprop=True)
ggx, = chainer.grad([gx], [x], enable_double_backprop=True)
Here, the above calculation of ggx
is equal to:
gx.backward()
x.grad_var # => This is equal to the above ggx
Of course, one more differentiation gives us 2:
gggx, = chainer.grad([ggx], [x], enable_double_backprop=True)
print(gggx) #=> variable([ 2.])
The loss function of WGANGP
WGANGP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat
between a generated point x_tilde
and a real example x
. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward()
method with enable_double_backprop=True
option:
# G (generator) and D (discriminator) should be implemented somewhere else
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
D(x_hat).backward(enable_double_backprop=True)
gradient_penalty = lambda * (x_hat.grad_var – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
You can also implement it using grad()
, which may be faster because it omits the computation of gradients w.r.t. parameters.
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True)
gradient_penalty = lambda * (gx_hat – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
[1]: I. Gulrajani, et. al. “Improved Training of Wasserstein GANs,” https://arxiv.org/abs/1704.00028
Here are some simple comparisons of grad of grad in Chainer and other frameworks:
https://gist.github.com/delta2323/9bbca950ee32c523c7aec2e02ad7f85a
New features
 Add
F.flip
function (#3532)  Functions with doublebackprop support:
F.swapaxis
(#3480),F.permutate
(#3481),F.transpose_sequence
(#3525)
Bug fixes
 Workaround for NumPy dot operation bug on noncontiguous arrays (#3478)
 Fix
KeyError
when using evaluator without target 'main' (#3460)  Fix
AttributeError
for missinginv_std
inF.fixed_batch_normalization
backward (#3479, thanks @zaburoch!)
Improvements
 Remove unused
invoke_before_training
argument fromTrainer.extend
(#3516)  Improve performance of
MultiprocessIterator
for non tuple/dict datasets (#3413, thanks @yuyu2172!)  Type check in
chainer.grad
(#3514)
Documentation
 Document deprecation of stream option of
to_gpu
(#3519)  Add documentation for
ParameterStatistics
extension (#3323)  Fix typos: (#3414, thanks @knorth55!), (#3455, thanks @HusainZafar!),
 Fix source links for functions defined with
contextlib.contextmanager
(#3567)  Improve or fix documentation:
F.swapaxes
,F.squeeze
,F.transpose
(#3415, thanks @naoto0804!),F.separate
,F.select_item
, andF.permutate
(#3417, thanks @naoto0804!), Constant initializer (#3560),init_scope
(#3520),F.reshape
(#3515), ConvNet tutorial (#3509)  Add documentation of links for framework compatibility (#3476)
 Fix documentation warnings (#3490)
 Intoroduce docstring checker and fix markup of “returns” sections (#3510)
 Remove obsolete statement about copy between devices in
to_gpu
(#3517)  Document deprecation of stream option of
to_gpu
(#3519)  Fix typecheck reference (#3521)
 Improve style of deprecation notification (#3522)
 Avoid horizontal scroll of tables (#3538)
 Add/modify supported versions of dependencies in the installation guide (#3580)
Tests
 Skip multiprocess interrupt tests (#3412)
 Add tests for
__delattr__
inLink
andChain
(#3416, thanks @naoto0804!)  Improve
numerical_grad
accuracy (#3495)  Improve test mode of VAE example (#3431)
 Delete redundant test settings for
F.get_item
(#3469, thanks @yuyu2172!)  Avoid unwanted output of
assert_allclose
failure (#3518)  Stabilization of stochastic numerical errors
Downloads
This is the release candidate (RC) of v3.0.0. See here for the complete list of solved issues and merged PRs.
CuPy has also been updated to v2.0.0 RC. Please see the release notes for CuPy.
Changes that break compatibility
use_cudnn
argument is removed fromspatial_transformer_grid
andspatial_transformer_sampler
(#2955). You can usechainer.using_config(‘use_cudnn’, ‘auto’)
to enable cuDNN in these functions. Almost no users will be affected by the following changes.
 The code for supporting protobuf 2 is removed (#3090). Note that the support of protobuf 2 has been already removed in Chainer v2.
Variable.__hash__
is removed (#2961). Note thatVariable
does not support__eq__
, so it was already not hashable.cache_download
now raisesOSError
instead ofRuntimeError
on a file system error (#2839, thanks @Hakuyume!)
New features
 Newstyle function with double backprop support
 Array:
transpose
(#3144),reshape
expand_dims
broadcast_to
sum
(#3188),concat
split_axis
(#3189),flatten
(#3190),cast
(#3145),rollaxis
(#3306),select_item
(#3308),__getitem__
(#3243)  Connection:
linear
(#3099),convolution_2d
deconvolution_2d
(#3163),embed_id
(#3183),lstm
(#3206),  Activation:
sigmoid
(#3119),relu
(#3175),leaky_relu
(#3177),softmax
(#3213),log_softmax
(#3217)  Pooling:
max_pooling_2d
average_pooling_2d
upsampling_2d
unpooling_2d
spatial_pyramid_pooling_2d
(#3257)  Math: unary

(#3142), binary
(#3143),tanh
(#3200),exp
(#3254)  Loss:
mean_squared_error
(#3194),softmax_cross_entropy
(#3296)  Noise:
dropout
(#3356, thanks @bonprosoft!)  Normalization:
layer_normalization
(#3219),batch_normalization
andfixed_batch_normalization
(#3275)
 Array:
 New Functions and Links
 New core features
chainer.as_variable()
is added (#3218). It can be used to enforce the type of a value to beVariable
.Variable.array
property is added (#3223). It is equivalent toVariable.data
, but.array
is safer; when you mixed upVariable
withndarray
,.array
immediately raises an error while.data
does not.chainer.FunctionHook
, which is an alias tochainer.function_hook.FunctionHook
, is added (#3152, #3153)grad
function (#3015). This function takes input and output variables and compute the gradient of outputs w.r.t. the inputs.check_double_backward
utility (#3096, #3268). It can be used to numerically check if the double backprop is consistent with the firstorder gradient.
 Other features
 The
axis
argument ofaverage
now supports tuple values (#3118)  The performance of
numerical_grad
is improved (#2966). It now performs numerical check of a randomly chosen directional derivative instead of the full gradient check. This change reduces the number of forward computations run for numerical gradient to a constant of the input dimensionality.  Make double backprop support optional in
Variable.backward()
(#3298). To enable double backprop, you have to explicitly passenable_double_backprop=True
. Note that when you do not need double backprop, it is better to turn off this option, thenbackward()
skips constructing the computational graph of backpropagation so that the performance overhead (esp. the memory consumption) is saved.  Cupy memory profiler with cupy memory hook (#2979)
 Add
rgb_format
option toget_mnist
(#3263)
 The
Bug fixes
 Dynamically import
matplotlib.pyplot
inPlotReport
(#2740)  Fix
_make_npz
forResNetLayers
(#3062, thanks @Hakuyume!)  Support noncontiguous array in cuDNN path of
softmax
(#3072) andlog_softmax
(#3310, thanks @knorth55!)  Deny assigning links in
ChainList.init_scope()
(#3129)  Avoid running certain hooks on uninitialized params (#3170)
 Call
VariableNode.data
fromParameter.initialize
(#3204, thanks @bonprosoft!)  Fix nan check (#3208)
 Fix
DictDataset
to work in Python 3 (#3237, thanks @bonprosoft!)  Always return variable in
dropout
(#3239, thanks @naoto0804!) CaffeFunction
to take BatchNorm scaling factor into account (#3261, thanks @hvy!) Fix
params
option ofcheck_double_backward
(#3268, see the previous section)  Check the input type of
to_gpu
(#3269)  Fix
fix_random
(#3330)  Use test mode in predict methods of ResNet, VGG, GoogLeNet (#3201)
Improvements
 Improve
MultiprocessIterator
performance, functionality and stability, usingPool
(#3076, thanks @grafitt!)  Check the range of the dropout ratio (#3100)
 Add function name in the debug message of NaN check (#3161)
 Fix typo in the name of a kernel used in
roi_pooling_2d
(#3185, thanks @knorth55!)  Make
F.cast
skipFunctionNode
application if no cast is needed (#3191)  Fix
Variable.backward
for manually editedrequires_grad
(#3192)  Avoid using deprecated stream option in to_gpu (#3278)
 Always raise warning when
stream
option is specified into_gpu
(#3282)  Speedup
upsampling_2d
on CPU (#3316)  Remove unnecessary use of
enumerate
(#3326)  Optimize backward of
log2
andlog10
(#3352)  Fix warning message for cuDNN (#3227)
 Reduce copy in check_backward (#3312)
 Small improvement for
transpose
backward (#3154)  Modules related to
IntervalTrigger
are slightly reorganized (#2990, thanks @Hakuyume!).
Examples
 New example of machine translation with seq2seq (#2070)
 Avoid to import matplotlib to set its backend Agg in code (#3043)
 Remove deprecated
get_device
from examples (#3122, thanks @naoto0804!)
Documentation
 Improves "Introduction to Chainer" (#1879)
 Add "How to write a training loop in Chainer" tutorial (#2736)
 Add minor version policy and feature backport policy (#3297)
 Update coding guidelines on shortcut aliases (#3198)
 Add warnings about preprocessing for dataset with both grayscale and RGB images. (#3093, thanks @jinjiren!)
 Hide source link for alien objects (#3110)
 Add missing items to the reference manual:
FunctionNode
FunctionAdapter
(#3117),initializers.NaN
(#3293)  Remove “Edit on GitHub” link (#3080)
 Treat Sphinx warnings as errors (#3069)
 Fix example code: RNN tutorial (#3149, thanks @fiarabbit!), fixed doctest failures (#3114, #3247)
 Fix typos: README (#3156, thanks @lc0!),
gradient_check
(#3158), Configuration documentation (#3166, thanks @kristofbc!),Variable.grad
(#3265),Hyperparameter
(#3248),BatchNormalization
(#3137)  Fix docs:
clipped_relu
(#3178),leaky_relu
(#3179),Variable.__getitem__
(#3180),linear
(#3224, thanks @bonprosoft!), link document (#3240),GRU
stateless/stateful (#3340),Updater
(#3084, thanks @fiarabbit!), backslash escaping (#3174), summary markup with periods (#3235), fix for warnings (#3068)  Improve docs:
dropout
(#3184, thanks @fiarabbit!) (#3116, thanks @naoto0804),where
(#3301, thanks @naoto0804!),transpose
(#3302, thanks @naoto0804!),GRU
(#3089), doctest code in training loop tutorial (#3249), ‘hinge’ (#3108), ‘softmax_cross_entropy(#3105),
LSTM(#3104),
Linear(#3103),
binary_accuracy(#3102),
embed_id` (#3091)
Test
 Check
DeprecationWarning
inVariable
(#2932)  Dump more info on
assert_allclose
failure (#2936)  Check deprecated method in tests of
Link
(#3155)  Avoid using deprecated
stream
option into_gpu
(#3278)  Insert
assert_warns
to ignore warnings (#3280)  Stabilize numerical tests:
relu
(#3299),tanh
(#3305), exponentials (#3354),unpooling_2d
(#3341),local_response_normalization
(#3355)  Ignore warnings for
to_gpu
(#3322)  Improve activation function tests (#3332)
 Replace
get_device
(#3363)
Others
Downloads
This is the v3 beta release. See here for the complete list of solved issues and merged PRs.
CuPy has also been updated to v2.0.0b1. Please see the release notes for CuPy. In particular, the updates on memory allocator may be relevant to many existing users of Chainer.
Changes without compatibility
The newstyle differentiable function (#2970)
This change provides the core API support of writing functions that support
 Differentiable backprop (a.k.a. gradient of gradients, higher order differentiation)
 Economical backprop (i.e.,
backward
can skip computation of unnecessary input gradients)
You can write your own function node by implementing a subclass of FunctionNode
. The following is a simple example of writing an elementwise multiplication function (which is already provided by this beta version):
class ElementwiseAdd(chainer.FunctionNode):
def check_type_forward(self, in_types): ...
def forward(self, inputs):
lhs, rhs = inputs
self.retain_inputs((0, 1)) # Newstyle function does not retain inputs by default!!!
return lhs * rhs,
def backward(self, indexes, grad_outputs):
grad_out, = grad_outputs
lhs, rhs = self.get_retained_inputs()
return rhs * grad_out, lhs * grad_out
There are mainly three differences from the conventional definition using Function
.
 The
index
(ortarget_input_indexes
as the full name) is added. It indicates the set of inputs for which gradients are required. There are two ways to return gradients bybackward
: gradients for all inputs, or gradients for inputs selected byindexes
. In the latter case, you can skip computing the gradients for inputs not listed inindexes
.  The
backward
method implements computation on top ofVariable
instead ofndarray
so that the resulting gradients can be further backpropagated. Thegrad_outputs
is a tuple ofVariable
s, and the newget_retained_inputs()
andget_retained_outputs()
methods return a tuple ofVariable
s corresponding to retained inputs/outputs. Note that the inputs are not retained by default (which is also different fromFunction
).  The forward computation is invoked by
apply()
method instead of__call__()
operator.
There is also a variant of backward()
method named backward_accumulate()
which includes the accumulation of input gradients to existing ones. It enables us to improve the performance in some case.
This change also provides the following changes.
 A new class
FunctionAdapter
provides an implementation ofFunctionNode
interface on top ofFunction
interface. It can be used to convertFunction
into newstyle function nodes. Note that it does not mean the converted function supports differentiable backprop; it is required to rewrite the implementation withFunctionNode
directly to support it. Function.__call__
is updated so that users do not need to update their implementation of custom Function definitions; it automatically creates aFunctionAdapter
object, lets the adapter wrap theFunction
object itself, and inserts the adapter object (which implementsFunctionNode
) into the computational graph. Currently, only elementwise addition and multiplication (
+
and*
) andF.identity
(which exists just for testing purpose) supports differentiable (and economical) backprop. We are planning to widen the set of functions with differentiable backprop support in the upcoming releases.  Note that this change breaks the object structure of the computational graph; now
FunctionNode
objects act as function nodes in the computational graph, andFunction
is just an object referenced by aFunctionAdapter
object (which implementsFunctionNode
).
New features
 When using
Trainer
, any exceptions raised during training are now immediately shown before entering the finalization procedures. It helps users to know the cause of the error before waiting for the finalization which sometimes hangs up (esp. when using multiprocessing) (#2216)  Support a mask pattern shared among examples within each batch in
F.simplified_dropconnect
(#2534, thanks @fukatani!)  Enable strict option in
load_npz
to skip nonexisting entries (#2599 #2601) L.Classifier
is extended so that users can feed multiple input features. An argument that should be treated as the ground truth labels is specified bylabel_key
option. Keyword arguments are also supported. (#2834) Add
print_report()
andsummary()
method toTimerHook
(#2927)  Support all float types in
F.upsampling_2d
(#2978)  Add
chainer.testing.fix_random
decorator to make tests deterministic (#2985)  Add
F.selu
activation function (#2989)  Add
iter_per_epoch
option toget_trainer_with_mock_updater
(#2913, thanks @Hakuyume!)
Improvements
 Automatically import submodules in
chainer.training
(#3032)  Remove redundant type checking in backward implementations and improve the performance of type equality checking (#2891)
 Remove code for old cuDNN (v2 and v3) support (#2920)
 Use
np.einsum
infoward_cpu
of negative sampling (#2931)  Direct initialization of
Variable
on device (#2983)  Select the bestresolution timer function (#2991)
 Improve numerical grad performance on GPU (#3018)
 Make
TimerHook
reentrant (#3019)  Fix the way to copy arrays along with cupy/cupy#159 (#3047 #3054)
Bug fixes
 Fix incorrect firing of
interval_trigger
on resuming the training procedure (#2244 #2484, thanks @Hakuyume!)  Support correct serialization of
ManualScheduleTrigger
(#2988, thanks @Hakuyume!)  Use
np.zeros
as the initialization of arrays to return in the CPUmodeF.roi_pooling_2d
(#2872, thanks @yuyu2172!)  add serialization to ManualScheduleTrigger (#2988, thanks @Hakuyume!)
 Add
mock
as an installation dependency (#2973 #2992)
Examples
 Deep reinforcement learning examples (#1991)
 It includes example code for DQN, DoubleDQN, and DDPG using OpenAI Gym environments.
Document
 Update “Comparison with Other Frameworks” (#2717, thanks @jekbradbury!)
 Add documentation for
Updater
(#3012, thanks @fiarabbit!)MultiprocessParallelUpdater
(#3038)
 Improve the documentation for
 Add links to the Slack archive to README (#2998)
 Fix GitHub link (#2984 #2999)
 Fix typo in the Upgrade Guide for volatile mode changes (#3005, thanks @evdcush!)
 Wording in contribution guide (#3067)
 Fix the file permission of
conf.py
(#3064)
Test
 Reduce the test time (e.g. by removing redundant test cases or parameterization) (#2948)
 Add tests for deepcopying
Link
(#2974)  Use
get_trainer_with_mock_updater
in tests ofManualScheduleTrigger
(#2987, thanks @Hakuyume!)  Remove debug print (#2997)
 Remove cython coding style check to speedup tests (#3008)
Others
Downloads
Watchers：302 
Star：3466 
Fork：920 
创建时间： 20150605 13:50:37 
最后Commits： 5天前 
许可协议：MIT 
v4.0.0b3
kmaehashi released this
Jan 23, 2018
· 16 commits to master since this release
Assets
This is the release of v4.0.0b3. See here for the complete list of solved issues and merged PRs.
Highlights
Adam
optimizer has been updated to support AdamW. See #4050 for details.Variable.backward
. You can also use it via Updaters.Changes without Compatibility
Optimizer.setup
returnsself
to enable method chaining (#4141)New Features
shift
function (#4041)Optimizer.setup
returns self to enable method chaining (#4141)matmul
(#3768),huber_loss
(#3867)Improvements
StandardUpdater
andParallelUpdater
under chainer.training.updaters namespace (#3037)RuntimeError
whenAdam.lr
is evaluated before updating starts (#3931)get_training_length
toIntervalTrigger
(#4079, thanks @himkt!)F.linear
(#4093, thanks @jzhoulon!)F.identity
(#4154)check_backward
(#4156)GoogLeNet
to define the network using init_scope (#4171)log1p
in CTC (F.connectionist_temporal_classification
) for stable computation (#4194)F.separate
(#4195)F.connectionist_temporal_classification
) (#4201)Bug Fixes
VariableNode.data
if new data is assigned (#3869)serialize
toSummary
(#4005, thanks @Hakuyume!)gradient_check
(#4015)t
ofUpdateRule
(#4026)GradientMethod
not to raiseAttributeError
caused by new optimizer setup (#4077)np.stack
in grouped convolution/deconvolution in CPU mode (#4085)np.stack
in examples (#4087)out1
ininc4c
andinc4d
(#4121, thanks @takaaki82!)Examples
Documentation
Variable.backward()
(#3496)huber_loss
(#3950)Installation
Tests
gradient_check
(#4015)test_init_docstring
to use importlib to find package (#4091)