一个用于深度学习的灵活神经网络框架
一个用于深度学习的灵活神经网络框架v4.0.0b2
niboshi released this
This is the release of v4.0.0b2. See here for the complete list of solved issues and merged PRs.

In this release, you can set up an optimizer with a simpler syntax.
In previous versions, the code would be written asoptimizer = chainer.optimizer.SGD() optimizer.setup(model)
We now also allow it to be written more concisely as
optimizer = chainer.optimizers.SGD(link=model)
The
link
argument should be specified as a keyword argument. Otherwise, some optimizers could wrongly interpret it as a hyperparameter (e.g.lr
). We will enforce a keyword argument from the next release. 
We introduced a check for mixed use of CuPy arrays and NumPy arrays in outputs returned from functions. Even though we have previously forbidden this, such functions may have worked without any errors. With the introduction of this check, however, those functions can begin raising errors.
Known Issues
 Grouped convolution/deconvolution does not work in CPU mode with NumPy 1.9 (#4081). This issue is planned to be resolved in the next release.
Changes without compatibility
 Check for mixed use of CuPy/NumPy ndarrays in functions (#4029)
New Features
 Overlap data transfer and GPU kernels (#3336, thanks @anaruse!)
 Add early stopping (#3351, thanks @himkt!)
 Enable optimizer model setup with instantiation (#3488)
 Grouped convolution (#3494, thanks @anaruse!)
 Add
extensions
as a trainer argument (#3528, thanks @nekanat!)  Support parameter update in FP32 (#3708, thanks @anaruse!)
sign
function (#3678) Add more Functions with doublebackprop support: maximum (#3533), im2col (#3587), batch_l2_norm_squared (#3642), expm1 function (#3644), linear_interpolate (#3663), mean_absolute_error (#3672), squared_error (#3691), sigmoid_cross_entropy (#3705), absolute_error (#3707), Gaussian (#3759), det, batch_det (#3767), cross_covariance (#3866), normalize (#3870), bilinear (#3917), negative_sampling (#3992)
Improvements
 Userfriendly error checks for pooling input (#3555)
 Verbose error messages in gradient check (#3833)
 Verbose error messages in basic_math (#3839)
 Refactor (de)convolution_2d (#3848)
 Allow
to_cpu
andto_gpu
to accept list, tuple andNone
(#3850)  Move
should_use_cudnn
andshould_use_cudnn_tensor_core
tochainer.cuda
(#3851)  Skip unnecessary array util ops (#3932)
 Avoid unnecessary
hasattr
(#3952)  Add
chainer.backends
subpackage (#3974)  Fix cuda import path (#4036)
 Remove an unused function (#4051)
Bug Fixes
 Fix backprop dim for unused lstm states on gpu (#3042, thanks @andreasgrv!)
 Forget inputs as Variables (#3788)
 Skip cuDNN in deconvolution_2d if dilate != 1 and deterministic (#3875)
 Fix debug_print() with empty Variable (#4018)
 Avoid mixing cupy.ndarray and numpy.ndarray in n_step_xxx links (#4030)
 Fix F.convolution_2d and F.deconvolution_2d to work without cuDNN (#4062)
 Fix test failure with cuDNN v6 (#4078)
Examples
 Add
noplot
option in MNIST example (#3925)
Documentation
 Add word2vec tutorial (#3040)
 Add ptb tutorial (#3073)
 Replace array in type lists with numpy or cupy ndarray (#3259)
 Improve documents of the debug mode (#3347)
 Function references in docs to point to
FunctionNode
(#3626)  Fix the example in the documentation of
Reporter
(#3795)  Improve documentation of
GRU
(#3858)  Fix documentation in
n_step_gru
,n_step_bigru
,n_step_bilstm
,n_step_rnn
andn_step_birnn
(#3859)  Fix CuPy requirment version (#3899)
 Add
expm1
to the documentation. (#3900)  Fix a formula in the tutorial (#3909, thanks @keisukenakata!)
 Fix typo (#3914, thanks @okayu9!)
 Fix example code in the trainer tutorial (#3926, thanks @keisukenakata!)
 Fix doctest in the trainer tutorial (#3942)
 Fix
GlorotUniform
documentation (#3953, thanks @FTag!)  Add
StatefulZoneoutLSTM
to documentation (#3957)  Small fix for the seq2seq example (#3964)
 Add
ConcatWithAsyncTransfer
to the reference manual (#3975, #3979)  Fix a code fragment in contribution guide (#3982, thanks @anaruse!)
 Fix documentations of negative sampling (#3988)
 Add dilate argument to documentation (#4011)
 Fix broken link in
chainer.function.pad
documentation (#4028)
Tests
 Add ability to check nondifferentiable inputs in
gradient_check.numerical_grad
(#3551, #4003)  Refactor unit tests for various backend configuration (#3862)
 Catch all exceptions in parameterized test (#3876)
 Avoid NaN in test of
F.classification_summary
(#3927)  Avoid NaN error in
test_pad_sequence
in debug mode (#3946)  Show original traceback in
testing.parameterized
(#3954)  Fix macOS test in Travis (#3990)
 Simplify
to_gpu
in RNN tests (#4046)  Adjust numerical tolerances:
convolution_nd
(#3910),im2col
(#3933),triplet
(#3939),linear_interpolate
(#3944)
This is a minor release. See the list for the complete list of solved issues and merged PRs.
Spotlight features
 A lot of new doublebackpropable functions have been added.
 Autotuner for cuDNN convolution functions is now available. Just add this one line
chainer.global_config.autotune = True
for optimizing your ConvNets.
New Features
 New functions: F.fix (#3834)
 Functions with doublebackprop support: where (#3505), softplus (#3593), clipped_relu (#3594), broadcast (#3650), hstack (#3666), dstack (#3890), square (#3681), ELU (#3730), minmax (#3732), log, log2, log10 (#3733), abs (#3734), sqrt (#3738), inv, batch_inv (#3743), div (#3750), pow (#3804), clip (#3805), resize_images (#3806), PReLU (#3814), minimum (#3815), triplet (#3817), floor (#3819), squared_difference (#3823), fliplr (#3827), flipud (#3828), fmod (#3834), pad_sequence (#3835), log1p (#3847), hard_sigmoid (#3849), CReLU (#3852), rdiv (#3857), ceil (#3860), logsumexp (#3877), cosh, sinh (#3879), depth2space, space2depth (#3880), sin, cos, tan, arcsin, arccos, arctan, arctan2 (#3881), tile (#3825), pad (#3855)
 cuDNN Convolution functions Autotuner (#3841)
Improvements
 Relax int type restriction (#3700)
 Allow to_gpu and to_cpu to accept NumPy scalars (#3748)
 Support filelike object in npz serializer (#3758, #3882)
 Avoid zerodivision warning in F.r2_score (#3777)
 Raise userfriendly error when FunctionNode is used like Function object (#3780)
 fix F.inv to raise exception when input has singular matrices (#3784)
 Remove unnecessary branch in minimum forward. (#3836)
 Check too small eps in Adam and RMSprop optimizers (#3783)
Bug fixes
 Prevent ZeroDivisionError in softmax_cross_entropy when input size is 0 (#3656, thanks @knorth55!)
 Fix xxx_pooling_nd causes CUDNN_STATUS_NOT_SUPPORTED for dims > 3 (#3722)
 Fix LSTM bias initialization (#3731)
 Fix the problem with resuming training when switching the freezing layers (#3800, thanks @jinjiren!)
 Avoid zero division error in linear init call (#3885)
Documents
 Add tutorials
 Improve TupleDataset documentation (#3438)
 Documentation fix in FunctionNode (#3444)
 Improve docs of n_step_lstm (#3471)
 Fix dead links to modules in tutorial (#3501)
 Improve doc of sum (#3502, thanks @akitotakeki!)
 Add get_conv_outsize and get_deconv_outsize to doc (#3597)
 Improve docs of huber_loss (#3605, thanks @naoto0804!)
 Improve docs of sigmoid_cross_entropy (#3606, thanks @naoto0804!)
 Improve docs of contrastive and triplet (#3607, thanks @naoto0804!)
 Fix documention error in Function (#3637)
 Add experimental warning in docstring (#3648)
 Add a note to the doc of Evaluator.evaluate (#3667)
 Fix CuPy intersphinx mapping (#3687)
 Document get_svhn (#3690)
 Fix CuPy overview link not working (#3695)
 Add CUDAProfileHook and CupyMemoryProfileHook to the reference (#3709, thanks @ronekko!)
 Fix split_axis documentation (#3712)
 Improve doc of context managers (#3719)
 Improve doc of configuration flags (#3720)
 Fix contribution guide for test framework change (#3726)
 Fix case in doc (#3749)
 Fix doc in Forget (#3773)
 Improve docs of F.forget (#3791)
 Document initializer criteria (#3801)
 Sort out navigation menu (#3812)
 Fix doctest failure in trainer tutorial (#3888)
 Fix typo (#3635, #3638, #3639)
 Fix doctest (#3647, #3651)
Tests
 Move to PyTest
 Use Python 3.4.4 on Travis OSX Python 3.4 case (#3629)
 Fix test_init_docstring (#3636)
 Fix math function testing helper to support new style functions. (#3665)
 Run OS X test only on master/stable branch to avoid delay (#3676)
 Always cast all inputs to given dtype in gradient check (#3679)
 Fix decorators to allow users to filter test cases by number of GPUs (#3683)
 Fix to skip GPU tests on AppVeyor (#3693)
 Fix math function test helper to support double backward test of linear functions(#3706)
 Richer gradient check output (#3713)
 Check deprecation warining in Travis (#3721)
 Fix decorators to allow users to filter test cases by number of GPUs (#3723)
 Use python 3.5 for doctest (#3727)
 Fix normalization warning in F.average test (#3729)
 Fix F.inv test does not test type error as expected (#3775)
 Directional derivative (#3790)
 Add doublebackward test for F.inv and F.batch_inv (#3820)
 Fix test condition in function tutorial (#3873)
 Setup random of Python library in testing/random (#3655)
 Fix coveragerc to measure branch coverage and only target chainer module (#3710)
 Test stability fix
Others
 Warn about vecLib on Mac OS X (#3692)
 Update stable version link in README (#3746)
 Improve version embedding (#3739)
 Rename plot > plt (#3714, thanks @Hakuyume!)
Install
 Remove requirements for unit testing (#3682)
Downloads
This is a major release of Chainer v3.0.0. All the updates from the previous major version (v2.0.0) are found in the release notes below:
 v3.0.0a1 (https://github.com/chainer/chainer/releases/tag/v3.0.0a1)
 v3.0.0b1 (https://github.com/chainer/chainer/releases/tag/v3.0.0b1)
 v3.0.0rc1 (https://github.com/chainer/chainer/releases/tag/v3.0.0rc1)
 v3.0.0 (this document)
The biggest change is the introduction of newstyle differentiable functions and resulting support for double backward (gradient of gradient) in many functions. The details are linked below:
 The newstyle differentiable function (see the details in the v3.0.0b1 release notes)
 Double backward support for many functions (see the list of almost all functions which support double backward in the v3.0.0rc1 release notes, and the others are listed below.)
As for the backward compatibility, most users of v2.x are not affected by the introduction of newstyle function FunctionNode
because the conventional Function
is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function
, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the newstyle function, e.g. double backprop.
The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.
Examples of grad of grad in Chainer
Usage of the grad
function
You can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad
function with enable_double_backprop=True
option.
# Both x and y are chainer.Variable objects
y = x * x * x / 3 # Construct a computational graph
gx, = chainer.grad([y], [x], enable_double_backprop=True)
ggx, = chainer.grad([gx], [x], enable_double_backprop=True)
Here, the above calculation of ggx
is equal to:
gx.backward()
x.grad_var # => This is equal to the above ggx
Of course, one more differentiation gives us 2:
gggx, = chainer.grad([ggx], [x], enable_double_backprop=True)
print(gggx) #=> variable([ 2.])
The loss function of WGANGP
WGANGP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat
between a generated point x_tilde
and a real example x
. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward()
method with enable_double_backprop=True
option:
# G (generator) and D (discriminator) should be implemented somewhere else
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
D(x_hat).backward(enable_double_backprop=True)
gradient_penalty = lambda * (x_hat.grad_var – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
You can also implement it using grad()
, which may be faster because it omits the computation of gradients w.r.t. parameters.
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True)
gradient_penalty = lambda * (gx_hat – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
[1]: I. Gulrajani, et. al. “Improved Training of Wasserstein GANs,” https://arxiv.org/abs/1704.00028
Here are some simple comparisons of grad of grad in Chainer and other frameworks:
https://gist.github.com/delta2323/9bbca950ee32c523c7aec2e02ad7f85a
New features
 Add
F.flip
function (#3532)  Functions with doublebackprop support:
F.swapaxis
(#3480),F.permutate
(#3481),F.transpose_sequence
(#3525)
Bug fixes
 Workaround for NumPy dot operation bug on noncontiguous arrays (#3478)
 Fix
KeyError
when using evaluator without target 'main' (#3460)  Fix
AttributeError
for missinginv_std
inF.fixed_batch_normalization
backward (#3479, thanks @zaburoch!)
Improvements
 Remove unused
invoke_before_training
argument fromTrainer.extend
(#3516)  Improve performance of
MultiprocessIterator
for non tuple/dict datasets (#3413, thanks @yuyu2172!)  Type check in
chainer.grad
(#3514)
Documentation
 Document deprecation of stream option of
to_gpu
(#3519)  Add documentation for
ParameterStatistics
extension (#3323)  Fix typos: (#3414, thanks @knorth55!), (#3455, thanks @HusainZafar!),
 Fix source links for functions defined with
contextlib.contextmanager
(#3567)  Improve or fix documentation:
F.swapaxes
,F.squeeze
,F.transpose
(#3415, thanks @naoto0804!),F.separate
,F.select_item
, andF.permutate
(#3417, thanks @naoto0804!), Constant initializer (#3560),init_scope
(#3520),F.reshape
(#3515), ConvNet tutorial (#3509)  Add documentation of links for framework compatibility (#3476)
 Fix documentation warnings (#3490)
 Intoroduce docstring checker and fix markup of “returns” sections (#3510)
 Remove obsolete statement about copy between devices in
to_gpu
(#3517)  Document deprecation of stream option of
to_gpu
(#3519)  Fix typecheck reference (#3521)
 Improve style of deprecation notification (#3522)
 Avoid horizontal scroll of tables (#3538)
 Add/modify supported versions of dependencies in the installation guide (#3580)
Tests
 Skip multiprocess interrupt tests (#3412)
 Add tests for
__delattr__
inLink
andChain
(#3416, thanks @naoto0804!)  Improve
numerical_grad
accuracy (#3495)  Improve test mode of VAE example (#3431)
 Delete redundant test settings for
F.get_item
(#3469, thanks @yuyu2172!)  Avoid unwanted output of
assert_allclose
failure (#3518)  Stabilization of stochastic numerical errors
Downloads
This is the release candidate (RC) of v3.0.0. See here for the complete list of solved issues and merged PRs.
CuPy has also been updated to v2.0.0 RC. Please see the release notes for CuPy.
Changes that break compatibility
use_cudnn
argument is removed fromspatial_transformer_grid
andspatial_transformer_sampler
(#2955). You can usechainer.using_config(‘use_cudnn’, ‘auto’)
to enable cuDNN in these functions. Almost no users will be affected by the following changes.
 The code for supporting protobuf 2 is removed (#3090). Note that the support of protobuf 2 has been already removed in Chainer v2.
Variable.__hash__
is removed (#2961). Note thatVariable
does not support__eq__
, so it was already not hashable.cache_download
now raisesOSError
instead ofRuntimeError
on a file system error (#2839, thanks @Hakuyume!)
New features
 Newstyle function with double backprop support
 Array:
transpose
(#3144),reshape
expand_dims
broadcast_to
sum
(#3188),concat
split_axis
(#3189),flatten
(#3190),cast
(#3145),rollaxis
(#3306),select_item
(#3308),__getitem__
(#3243)  Connection:
linear
(#3099),convolution_2d
deconvolution_2d
(#3163),embed_id
(#3183),lstm
(#3206),  Activation:
sigmoid
(#3119),relu
(#3175),leaky_relu
(#3177),softmax
(#3213),log_softmax
(#3217)  Pooling:
max_pooling_2d
average_pooling_2d
upsampling_2d
unpooling_2d
spatial_pyramid_pooling_2d
(#3257)  Math: unary

(#3142), binary
(#3143),tanh
(#3200),exp
(#3254)  Loss:
mean_squared_error
(#3194),softmax_cross_entropy
(#3296)  Noise:
dropout
(#3356, thanks @bonprosoft!)  Normalization:
layer_normalization
(#3219),batch_normalization
andfixed_batch_normalization
(#3275)
 Array:
 New Functions and Links
 New core features
chainer.as_variable()
is added (#3218). It can be used to enforce the type of a value to beVariable
.Variable.array
property is added (#3223). It is equivalent toVariable.data
, but.array
is safer; when you mixed upVariable
withndarray
,.array
immediately raises an error while.data
does not.chainer.FunctionHook
, which is an alias tochainer.function_hook.FunctionHook
, is added (#3152, #3153)grad
function (#3015). This function takes input and output variables and compute the gradient of outputs w.r.t. the inputs.check_double_backward
utility (#3096, #3268). It can be used to numerically check if the double backprop is consistent with the firstorder gradient.
 Other features
 The
axis
argument ofaverage
now supports tuple values (#3118)  The performance of
numerical_grad
is improved (#2966). It now performs numerical check of a randomly chosen directional derivative instead of the full gradient check. This change reduces the number of forward computations run for numerical gradient to a constant of the input dimensionality.  Make double backprop support optional in
Variable.backward()
(#3298). To enable double backprop, you have to explicitly passenable_double_backprop=True
. Note that when you do not need double backprop, it is better to turn off this option, thenbackward()
skips constructing the computational graph of backpropagation so that the performance overhead (esp. the memory consumption) is saved.  Cupy memory profiler with cupy memory hook (#2979)
 Add
rgb_format
option toget_mnist
(#3263)
 The
Bug fixes
 Dynamically import
matplotlib.pyplot
inPlotReport
(#2740)  Fix
_make_npz
forResNetLayers
(#3062, thanks @Hakuyume!)  Support noncontiguous array in cuDNN path of
softmax
(#3072) andlog_softmax
(#3310, thanks @knorth55!)  Deny assigning links in
ChainList.init_scope()
(#3129)  Avoid running certain hooks on uninitialized params (#3170)
 Call
VariableNode.data
fromParameter.initialize
(#3204, thanks @bonprosoft!)  Fix nan check (#3208)
 Fix
DictDataset
to work in Python 3 (#3237, thanks @bonprosoft!)  Always return variable in
dropout
(#3239, thanks @naoto0804!) CaffeFunction
to take BatchNorm scaling factor into account (#3261, thanks @hvy!) Fix
params
option ofcheck_double_backward
(#3268, see the previous section)  Check the input type of
to_gpu
(#3269)  Fix
fix_random
(#3330)  Use test mode in predict methods of ResNet, VGG, GoogLeNet (#3201)
Improvements
 Improve
MultiprocessIterator
performance, functionality and stability, usingPool
(#3076, thanks @grafitt!)  Check the range of the dropout ratio (#3100)
 Add function name in the debug message of NaN check (#3161)
 Fix typo in the name of a kernel used in
roi_pooling_2d
(#3185, thanks @knorth55!)  Make
F.cast
skipFunctionNode
application if no cast is needed (#3191)  Fix
Variable.backward
for manually editedrequires_grad
(#3192)  Avoid using deprecated stream option in to_gpu (#3278)
 Always raise warning when
stream
option is specified into_gpu
(#3282)  Speedup
upsampling_2d
on CPU (#3316)  Remove unnecessary use of
enumerate
(#3326)  Optimize backward of
log2
andlog10
(#3352)  Fix warning message for cuDNN (#3227)
 Reduce copy in check_backward (#3312)
 Small improvement for
transpose
backward (#3154)  Modules related to
IntervalTrigger
are slightly reorganized (#2990, thanks @Hakuyume!).
Examples
 New example of machine translation with seq2seq (#2070)
 Avoid to import matplotlib to set its backend Agg in code (#3043)
 Remove deprecated
get_device
from examples (#3122, thanks @naoto0804!)
Documentation
 Improves "Introduction to Chainer" (#1879)
 Add "How to write a training loop in Chainer" tutorial (#2736)
 Add minor version policy and feature backport policy (#3297)
 Update coding guidelines on shortcut aliases (#3198)
 Add warnings about preprocessing for dataset with both grayscale and RGB images. (#3093, thanks @jinjiren!)
 Hide source link for alien objects (#3110)
 Add missing items to the reference manual:
FunctionNode
FunctionAdapter
(#3117),initializers.NaN
(#3293)  Remove “Edit on GitHub” link (#3080)
 Treat Sphinx warnings as errors (#3069)
 Fix example code: RNN tutorial (#3149, thanks @fiarabbit!), fixed doctest failures (#3114, #3247)
 Fix typos: README (#3156, thanks @lc0!),
gradient_check
(#3158), Configuration documentation (#3166, thanks @kristofbc!),Variable.grad
(#3265),Hyperparameter
(#3248),BatchNormalization
(#3137)  Fix docs:
clipped_relu
(#3178),leaky_relu
(#3179),Variable.__getitem__
(#3180),linear
(#3224, thanks @bonprosoft!), link document (#3240),GRU
stateless/stateful (#3340),Updater
(#3084, thanks @fiarabbit!), backslash escaping (#3174), summary markup with periods (#3235), fix for warnings (#3068)  Improve docs:
dropout
(#3184, thanks @fiarabbit!) (#3116, thanks @naoto0804),where
(#3301, thanks @naoto0804!),transpose
(#3302, thanks @naoto0804!),GRU
(#3089), doctest code in training loop tutorial (#3249), ‘hinge’ (#3108), ‘softmax_cross_entropy(#3105),
LSTM(#3104),
Linear(#3103),
binary_accuracy(#3102),
embed_id` (#3091)
Test
 Check
DeprecationWarning
inVariable
(#2932)  Dump more info on
assert_allclose
failure (#2936)  Check deprecated method in tests of
Link
(#3155)  Avoid using deprecated
stream
option into_gpu
(#3278)  Insert
assert_warns
to ignore warnings (#3280)  Stabilize numerical tests:
relu
(#3299),tanh
(#3305), exponentials (#3354),unpooling_2d
(#3341),local_response_normalization
(#3355)  Ignore warnings for
to_gpu
(#3322)  Improve activation function tests (#3332)
 Replace
get_device
(#3363)
Others
Downloads
This is the v3 beta release. See here for the complete list of solved issues and merged PRs.
CuPy has also been updated to v2.0.0b1. Please see the release notes for CuPy. In particular, the updates on memory allocator may be relevant to many existing users of Chainer.
Changes without compatibility
The newstyle differentiable function (#2970)
This change provides the core API support of writing functions that support
 Differentiable backprop (a.k.a. gradient of gradients, higher order differentiation)
 Economical backprop (i.e.,
backward
can skip computation of unnecessary input gradients)
You can write your own function node by implementing a subclass of FunctionNode
. The following is a simple example of writing an elementwise multiplication function (which is already provided by this beta version):
class ElementwiseAdd(chainer.FunctionNode):
def check_type_forward(self, in_types): ...
def forward(self, inputs):
lhs, rhs = inputs
self.retain_inputs((0, 1)) # Newstyle function does not retain inputs by default!!!
return lhs * rhs,
def backward(self, indexes, grad_outputs):
grad_out, = grad_outputs
lhs, rhs = self.get_retained_inputs()
return rhs * grad_out, lhs * grad_out
There are mainly three differences from the conventional definition using Function
.
 The
index
(ortarget_input_indexes
as the full name) is added. It indicates the set of inputs for which gradients are required. There are two ways to return gradients bybackward
: gradients for all inputs, or gradients for inputs selected byindexes
. In the latter case, you can skip computing the gradients for inputs not listed inindexes
.  The
backward
method implements computation on top ofVariable
instead ofndarray
so that the resulting gradients can be further backpropagated. Thegrad_outputs
is a tuple ofVariable
s, and the newget_retained_inputs()
andget_retained_outputs()
methods return a tuple ofVariable
s corresponding to retained inputs/outputs. Note that the inputs are not retained by default (which is also different fromFunction
).  The forward computation is invoked by
apply()
method instead of__call__()
operator.
There is also a variant of backward()
method named backward_accumulate()
which includes the accumulation of input gradients to existing ones. It enables us to improve the performance in some case.
This change also provides the following changes.
 A new class
FunctionAdapter
provides an implementation ofFunctionNode
interface on top ofFunction
interface. It can be used to convertFunction
into newstyle function nodes. Note that it does not mean the converted function supports differentiable backprop; it is required to rewrite the implementation withFunctionNode
directly to support it. Function.__call__
is updated so that users do not need to update their implementation of custom Function definitions; it automatically creates aFunctionAdapter
object, lets the adapter wrap theFunction
object itself, and inserts the adapter object (which implementsFunctionNode
) into the computational graph. Currently, only elementwise addition and multiplication (
+
and*
) andF.identity
(which exists just for testing purpose) supports differentiable (and economical) backprop. We are planning to widen the set of functions with differentiable backprop support in the upcoming releases.  Note that this change breaks the object structure of the computational graph; now
FunctionNode
objects act as function nodes in the computational graph, andFunction
is just an object referenced by aFunctionAdapter
object (which implementsFunctionNode
).
New features
 When using
Trainer
, any exceptions raised during training are now immediately shown before entering the finalization procedures. It helps users to know the cause of the error before waiting for the finalization which sometimes hangs up (esp. when using multiprocessing) (#2216)  Support a mask pattern shared among examples within each batch in
F.simplified_dropconnect
(#2534, thanks @fukatani!)  Enable strict option in
load_npz
to skip nonexisting entries (#2599 #2601) L.Classifier
is extended so that users can feed multiple input features. An argument that should be treated as the ground truth labels is specified bylabel_key
option. Keyword arguments are also supported. (#2834) Add
print_report()
andsummary()
method toTimerHook
(#2927)  Support all float types in
F.upsampling_2d
(#2978)  Add
chainer.testing.fix_random
decorator to make tests deterministic (#2985)  Add
F.selu
activation function (#2989)  Add
iter_per_epoch
option toget_trainer_with_mock_updater
(#2913, thanks @Hakuyume!)
Improvements
 Automatically import submodules in
chainer.training
(#3032)  Remove redundant type checking in backward implementations and improve the performance of type equality checking (#2891)
 Remove code for old cuDNN (v2 and v3) support (#2920)
 Use
np.einsum
infoward_cpu
of negative sampling (#2931)  Direct initialization of
Variable
on device (#2983)  Select the bestresolution timer function (#2991)
 Improve numerical grad performance on GPU (#3018)
 Make
TimerHook
reentrant (#3019)  Fix the way to copy arrays along with cupy/cupy#159 (#3047 #3054)
Bug fixes
 Fix incorrect firing of
interval_trigger
on resuming the training procedure (#2244 #2484, thanks @Hakuyume!)  Support correct serialization of
ManualScheduleTrigger
(#2988, thanks @Hakuyume!)  Use
np.zeros
as the initialization of arrays to return in the CPUmodeF.roi_pooling_2d
(#2872, thanks @yuyu2172!)  add serialization to ManualScheduleTrigger (#2988, thanks @Hakuyume!)
 Add
mock
as an installation dependency (#2973 #2992)
Examples
 Deep reinforcement learning examples (#1991)
 It includes example code for DQN, DoubleDQN, and DDPG using OpenAI Gym environments.
Document
 Update “Comparison with Other Frameworks” (#2717, thanks @jekbradbury!)
 Add documentation for
Updater
(#3012, thanks @fiarabbit!)MultiprocessParallelUpdater
(#3038)
 Improve the documentation for
 Add links to the Slack archive to README (#2998)
 Fix GitHub link (#2984 #2999)
 Fix typo in the Upgrade Guide for volatile mode changes (#3005, thanks @evdcush!)
 Wording in contribution guide (#3067)
 Fix the file permission of
conf.py
(#3064)
Test
 Reduce the test time (e.g. by removing redundant test cases or parameterization) (#2948)
 Add tests for deepcopying
Link
(#2974)  Use
get_trainer_with_mock_updater
in tests ofManualScheduleTrigger
(#2987, thanks @Hakuyume!)  Remove debug print (#2997)
 Remove cython coding style check to speedup tests (#3008)
Others
Downloads
This revision release contains bug fixes and improvements to the documentation and installation procedure. See here for the complete list of solved issues and merged PRs.
Enhancements
 Stop using INFINITY in
MaxPoolingND
(#2917)  Stop using use
get_device
, which is deprecated (#2924)  Use
init_scope
instead of deprecated methods to register links and parameters (#2947)  Use
cleargrads
instead ofzerograds
(#2956)
Bug fixes
 Fix
F.pad_sequence
error on 64bit Windows GPU (#2867, thanks @ronekko)  Fix trainer mock to call
update_core()
(#2878)  Fix resuming issue of *Shift extensions (#2879, thanks @Hakuyume)
 Make vision models copyable (#2885)
 Restore changes unexpectedly overwritten in
get_trainer_with_mock_updater
(#2887, thanks @Hakuyume)  Change the type of several hidden variables in Link and Chain (#2901)
 Fix Variable
repr
andstr
failure when data isNone
(#2902)  Use sorted list of link parameters in gather and scatter functions of
MultiProcessUpdater
(#2914).  Fix a bug dependent on glibc version (#2959, thanks @kennakanishi)
 Fix TrainerTest where elapsed time had been zero with imprecise clock (#2878)
Documentation
 Fix a typo (#2844, thanks @levelfour!)
 Add
F.pad_sequence
to the reference (#2884, thanks @YamaneSasuke!)  Other document improvements (#2848, #2850, #2868, #2883, #2888, #2889, #2899, #2915, #2916, #2951)
Examples
 Add
ResNetLayers
arguments of `n_layers in the ResNet example (#2882)  Use
Evaluator
instead ofTestModeEvaluator
in the dataparallel example (#2886)
Others
Downloads
This is the second major version. See the list for the complete list of solved issues and merged PRs (the list only shows the difference from v2.0.0b1; see the Release Note section below for the difference from v1.24.0).
Announcements
 CuPy has been separated from Chainer into an independent package: CuPy.
 It means you need to install CuPy if you want to enable GPU for Chainer.
 Following this installation guide is recommended to enable GPU.
 Related to the CuPy separation, we cut the support of some old versions of CUDA and cuDNN. The following versions will be supported in Chainer v2.0.0 and CuPy 1.0.0.
 CUDA 7.0 or later
 cuDNN 4.0 or later
 The repository of Chainer is moved from pfnet/chainer to chainer/chainer. The old URL can be used by git, with which any operations will be redirected to the new one.
 For Chainer v1.x Users:
 Here is the Upgrade Guide that describes the details of differences from v1.
 For contributors:
 We strongly recommend you to read the Contribution Guide again, which contains many updates.
 As is explained in the Contribution Guide, we have changed the development and release cycle.
The main development will be continued on the master branch, which will correspond to the next prereleases of v3 (including alpha, beta, and RC). The maintenance of v2 will be done at v2 branch.  If you want to send a pull request, please send it to the master branch unless you have a special reason.
Release Notes
It should be noted that these release notes contain only the differences from v2.0.0b1. See the release notes of v2.0.0a1 and v2.0.0b1 to confirm the full set of changes from v1.
New Features and Changed APIs
 Add
L.StatelessGRU
and change the implementation ofL.GRU
(#2769)  Make input size/channels optional (#2159, #2045)
 Aggressive Buffer Release (#2368, #2586 (thanks @anaruse!))
 Related to the buffer release, the following functions release inputs:
chainer.config.cudnn_deterministic
: cuDNN Deterministic mode (#2574, #2710) Remove
wscale
option fromL.MLPConvolution2D
(#2690)  Add new APIs of parameter/link registration to Link/Chain (#1970, #2657)
 Purge the graph when reporting a variable (#2054, #2640)
 Add
Extension.initialize
and removeinvoke_before_training
(#2639, #2611)  Make
None
serializable (#2638)  Raise an error when an obsolete argument is given (#2556)
 Use
cleargrads
instead ofzerograds
by default (#2521, #2549)  Fix the inconsistent naming convention between LSTM and GRU (#2285, #2510, #2537)
 Add
requires_grad
property toVariable
(#2493)  Support numpy like
repr
function ofVariable
(#2455, thanks @fukatani!)  Clean APIs of
L.Linear
and convolutionlike links related to the bias argument (#2180, #2185)  Remove deprecated methods of
Optimizer
(#2509, #2404)  Make bias vector enabled by default in
L.ConvolutionND
andL.DeconvolutionND
(#2018)
Enhancement
 Remove unnecessary imports from
functions
andlinks
(#2755)  Check old arguments which are not supported in v2 to show an error message. (#2641)
 Raise an error when the
volatile
flag is given (#2718)
Bug fixes
 Fix a bug of Hyperparameter on deep copy (or, strictly speaking, on unpickling) in Py3.6 (#2761)
 Fix
Copy.backward
to check input device (#2668)  Fix
AlexFp16
example (#2637)  Fix
VariableNode
to addcreator
setter (#2770)  Fix for the environment without cuDNN (#2790)
 Check h5py version when serializing
None
(#2789, #2791)  Fix the initial weight of
EmbedID
(#2694, thanks @odanado!)  Fix
DebugPrint
extension to support removed inputs (#2667)
The following PR has been sent to v1.24.0 and merged but we mistakenly failed to add this PR to the previous release note, so now we list this up here and appreciate @himkt for the contribution!
Documentation
 Fix the location of
get_device_from_id
andget_device_from_array
(#2759)  Remove unnecessary sentence from
L.Convolution2D
(#2757)  Improve doc of
F.softmax
(#2751, thanks @tma15!)  Write the Upgrade Guide (#2741)
 Fix documentation errors (#2760)
 Update the Installation Guide for v2.0.0 (#2729)
 Renew the readme (#2692)
 Remove an obsolete document in
L.DilatedConvolution2D
(#2689)  Remove
use_cleargrads
from tutorial (#2645)  Fix a mistake in grammar (#2571, thanks @soramichi!)
 Update API Compatibility Policy (#2778)
 Remove the license for CuPy (#2786)
 Update the Contribution Guide (#2773)
 Update the tutorial (#2762)
 Fix several typos in tutorial (#2737, thanks @PeterTeng)
Examples
Tests
Downloads
This is a minor release. See the list for the complete list of solved issues and merged PRs.
Announcements
 This is the final regular release of Chainer v1.x. No further changes will be made to Chainer v1 except for critical bug fixes.
 We will soon merge the current
_v2
branch intomaster
. It is predicted that many PRs targeted to the current master will be made obsolete (i.e., they will conflict with the v2 source tree).  We have decided to postpone the release of v2.0.0 to May 30. We will work hard to finish the planned changes and documentation stuffs, so wait for the release date!
 We have to apologize that we cannot fulfill for v2 the compatibilitybreaking steps that we declared in our compatibility policy. In particular, many APIs that will be partially changed in v2 do not emit any warnings in v1.24.0.
 Instead, we are preparing an upgrade guide that lists up which part of the existing user codes should be updated to be compatible with v2.0.0. We believe that this upgrade guide is helpful for all users to properly update their codes.
New features
Summary
MultiprocessParallelUpdater
is added. It is an updater forTrainer
that accumulates the gradients computed by multiple processes usingmultiprocessing
and NCCL.reduce
option is added to loss functions. By passingreduce=’no’
, we can let the loss function not aggregate the loss values across data in the minibatch. Many differentiable functions and links are added. In particular, depthwise convolution and spatial transformer networks are supported.
 QR and SVD decompositions are added to CuPy.
Chainer
 Differentiable advanced indexing (indexing by integer arrays and boolean arrays) for Variable (#2203, thanks @yuyu2172!)
 NOTE: This feature was actually included in the previous version. We apologize that this big feature was missed in the previous release note.
 Add
MultiprocessParallelUpdater
: a new version of parallel updater using multiprocessing and NCCL (#2213, #2724, thanks @jekbradbury (#1924) and @anaruse (#1895)!) ParameterStatistics
extension that accumulates various statistics of parameter arrays (#2166, thanks @hvy!) Add
reduce
option to the following loss functions. You can use these loss functions without taking summation/average over the minibatch by passingreduce=’no’
.F.softmax_cross_entropy
(#2325, #2357, thanks @Hakuyume!)F.gaussian_kl_divergence
(#2519)F.bernoulli_nll
(#2525)F.gaussian_nll
(#2526)F.crf1d
(#2559)F.huber_loss
(#2560)F.hinge_loss
(#2577)F.black_out
(#2600)F.contrastive
(#2603)F.connectionist_temporal_classification
(#2658)F.triplet
(#2681)F.cross_covariance
(#2697)F.decov
(#2698)F.negative_sampling
andL.NegativeSampling
(#2704)
 One dimensional integer array indexing (fancy indexing) support for
DatasetMixin
(#2427)  Add
keepdims
option toF.average
andF.mean
(#2508)  Add
TransformDataset
: dataset wrapper to transform each data point by an arbitrary callable (#2513)  Support array inputs in
F.gaussian_kl_divergence
,F.bernoulli_nll
, andF.gaussian_nll
(#2520)  New Functions and Links
F.simplified_dropconnect
andL.SimplifiedDropconnect
: simplified version of DropConnect (#1754, thanks @fukatani!)F.depthwise_convolution_2d
andL.DepthwiseConvolution2D
: depthwise convolution layer used in separable convolution (#2067, thanks @fukatani!)F.spatial_transformer_sampler
: 2d image differentiable sampler from “Spatial Transformer Networks” (#2272, thanks @yuyu2172!)F.spatial_transformer_grid
: function to generate sampling points of STN (#2458, thanks @yuyu2172!)L.GoogLeNet
: pretrained GoogLeNet (#2424, thanks @ronekko!)F.im2col
: differentiable version of im2col (#2466, thanks @yuyu2172!) cuDNNaccelerated Nstep RNNs and bidirectional RNNs (thanks @aonotas!)
F.squared_error
andF.absolute_error
: elementwise squared/absolute error (#2566, thanks @Hakuyume!)F.softmax
supportsaxis
option (#2536, #2538 thanks sergeantwizard!)
CuPy
 Some linalg methods are supported(QR decomposition: #2412, singular value decomposition: #2481)
cupy.sum
supportskeepdims
argument (#2507)
Bug fixes
 Redundant dropout just after input layer in
F.NStepLSTM
is removed (#2504)  Some functions now work correctly with noncontiguous arrays
 Pooling functions: #2512, #2564
F.batch_normalization
(#2582 thanks @soramichi!) deconvolution functions (#2666 thanks @soramichi!)
F.spatial_transformer_sampler
(#2676 @yuyu2172!)
 Fixed
cupy.fuse
behavior for*args
(#2594 thanks @jekbradbury!, #2598)  Fixed resuming behavior of extensions (ExponentialShift: #2686 thanks @Hakuyume!, LinearShift #2721)
 Fixed ResNet101Layers to load pretrained model (#2608, #2609 thanks @yuyu2172!)
 Variable.transpose can be called without argument (#2614 thanks @ronekko!, #2635)
 Added support for broadcasting in
SoftmaxCrossEntropy
on numpy==1.9 (#2719)  Fixed reverse indexing for empty dimension (#2696)
 softmax cross entroy correctly works when
ignore_label
is not 1 (#2715 thanks @musyoku!, #2716)  Treat numpy scalars correctly in
cupy.ndarray.fill
(#2723)  Fixed duplicated test case name (#2605)
 Remove debug print (#2610)
 Fixed convnet tutorial (#2615)
Improvements
 Support arrays over 2GB (#2530, thanks @kmaehashi!)
 Check output size of pooling function (#2589, thanks @soramichi!)
 Stop importing theano automatically (#2570 thanks @mfxss and @jekbradbury, #2619)
split_axis
function works when its result has zerodimensional arrays (#2524) Improved DatasetMixin performance (#2427)
 Check maximum supported version of cuDNN (#2479, #2480)
 Refactored CIFAR dataset (#1516)
 Refactor
F.DilatedConvolution2DFunciton
(#2665 thanks @soramichi!)  Refactor
chainer.Link
(#2711, #2712 thanks @ysekky!)
Documents
 Modify the nccl wrapper for cupynocuda (#2724)
 Add observe_value and observe_lr to extension.rst (#2713)
 Improve docs
 Modify docstring in connection (#2642, thanks @ysekky!)
 Fix some mistakes in ConvNet tutorial (#2615)
 Add reduce option to F.black_out (#2600)
 Move Information to top (#2591)
 Add index page for examples (#2587)
 Add Information in README (#2565)
 Add special members to the document (#2552)
 Update install.rst (#2543)
 Add ConvNet tutorials (#2337)
 Fix typos (#2597 thanks @PeterTeng!, #2596 thanks @kdnk!, #2595 thanks @hvy!, #2714)
 Remove TOC from the readme of the examples (#2731)
Examples
 Added new example that uses a custom training loop (#2339)
 Added
model
argument in PTB example to specify model file (#2617)  Removed outdated comment from word2vec example (#2643, thanks @ysekky!)
Tests
 Fixed
epoch_detail
behavior of mocked trainer (#2472, thanks @Hakuyume!)  Fixed LSTM Dropout test (#2504)
 Fixed coding style in init docstring test (#2588)
 Fixed
contrastive
test (#2604)  Fixed test case name of
gaussian_kl_divergence
test (#2605)  Fixed numerical instability in
Highway
test (#2650)  Added tests for
show_name
functionality of the computational graph (#2517, thanks @sergeantwizard!)  Added corner case in
F.stack
test (#2532)  Use
chainer.functions
alias in tests (#2541)  Retry in unstable dropout test (#2542)
 Skip external classes in init docstring test (#2583)
 Improved test for
max_pooling_2d
in GPU cases (#2589, thanks @soramichi!)  Improved tests of manual schedule trigger (#2557 and #2568, thanks @Hakuyume!)
Others
Downloads
Watchers：300 
Star：3462 
Fork：918 
创建时间： 20150605 13:50:37 
最后Commits： 7天前 
许可协议：MIT 
v4.0.0b3
kmaehashi released this
Jan 23, 2018
· 8 commits to master since this release
Assets
This is the release of v4.0.0b3. See here for the complete list of solved issues and merged PRs.
Highlights
Adam
optimizer has been updated to support AdamW. See #4050 for details.Variable.backward
. You can also use it via Updaters.Changes without Compatibility
Optimizer.setup
returnsself
to enable method chaining (#4141)New Features
shift
function (#4041)Optimizer.setup
returns self to enable method chaining (#4141)matmul
(#3768),huber_loss
(#3867)Improvements
StandardUpdater
andParallelUpdater
under chainer.training.updaters namespace (#3037)RuntimeError
whenAdam.lr
is evaluated before updating starts (#3931)get_training_length
toIntervalTrigger
(#4079, thanks @himkt!)F.linear
(#4093, thanks @jzhoulon!)F.identity
(#4154)check_backward
(#4156)GoogLeNet
to define the network using init_scope (#4171)log1p
in CTC (F.connectionist_temporal_classification
) for stable computation (#4194)F.separate
(#4195)F.connectionist_temporal_classification
) (#4201)Bug Fixes
VariableNode.data
if new data is assigned (#3869)serialize
toSummary
(#4005, thanks @Hakuyume!)gradient_check
(#4015)t
ofUpdateRule
(#4026)GradientMethod
not to raiseAttributeError
caused by new optimizer setup (#4077)np.stack
in grouped convolution/deconvolution in CPU mode (#4085)np.stack
in examples (#4087)out1
ininc4c
andinc4d
(#4121, thanks @takaaki82!)Examples
Documentation
Variable.backward()
(#3496)huber_loss
(#3950)Installation
Tests
gradient_check
(#4015)test_init_docstring
to use importlib to find package (#4091)