- paper “Channel Pruning for Accelerating Very Deep Neural Networks” by Face++: eliminate unnecessary channels in convolution layers without leading to big increase on final loss. For each layer, a coefficient is assigned to each channel, and MSE (mean square error) is adopted to estimate the approximation to the original output signal of the layer. Zero value of the coefficient indicate that the corresponding channel trivial. Another step is to re-evaluate the weights of in the layer for better approximation. For this solution, a small set of training data is used to generate the output signal of layers, but not the whole. As a consequence, the final channel pruning result may not fit the whole training dataset. The accuracy of approximation on the whole dataset cannot be guaranteed by this method.
Fast Disparity Solver
- paper “Fast Bilateral-Space Stereo for Synthetic Defocus” by Google: the core idea of this project is to transform a dense two-dimension space (image) to a sparse five-dimension space (bilateral space). An third-party open-source project provides an preliminary implementation. I then implemented multi-scale bilateral pyramid to speed up iterative convergence. Note that the original paper and supplement is not well documented in multi-scale part. The formulations need rewritten for better understanding, which is confirmed by the author.
- paper “Burst photography for high dynamic range and low-light imaging on mobile cameras” by Google: produce high-quality picture based on multiple RAW files taken underexposure. Alignment (block-matching in fact). This work is quite fantastic. The only obstacle is that the whole ISP pipeline is complicated. An third-party open-source implementation https://github.com/timothybrooks/hdr-plus.git is incompatible with latest halide, a programming language for image processing and computational photography used by HDR+ project. The open-source project also contains bugs. What is worse, the open-source project does not implement wiener filter for rebust merging, which further replies on FFT that can be found in example directory of halide project.
- paper “Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring”, this project is build atop luna language, whose principle is that people should not be limited by the tool they use. This work is able to deblur a blurry image with a deep convolutional network. The deblurring result is remarkable. But the program itself consumes a lot memory. I run on program with TiTAN X (12 GB mem), but the max size of input image should be about 2500×2000. What’s more, the work does not help to remove noise. (yes, I tried to explore its impact on denoising, but got little)
- paper “Learning to See in the Dark” by Intel: this work aims to generate JPG with deep convolutional network based on a DNG file. The paper claims that the result is very fancy. But according to my experiment, the final result cannot be better than that produced by photoshop based on DNG except that the DNG is taken underexposure seriously, e.g., 0.2lux.
to be continued …
Deconvolution: tranposed convolution
Flownet: disparity between left and right images represented as channel (FlowNetC). Given a maximum displacement d, for each location x1 we compute correlations c(x1, x2) only in a neighborhood of size D := 2d + 1 by limiting the range of x2. Since x1&x2 is two dimensions, it results to D^2 outputs.
Xception: pointwise conv.
ShuffleNet: Shuffle layers
Visualizing and Understanding Convolutional Networks: