Saurabh Gupta , David Fouhey , Sergey Levine , Jitendra Malik

Comments: Project page with videos: this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Robotics (cs.RO)

This works presents a formulation for visual navigation that unifies map

based spatial reasoning and path planning, with landmark based robust plan

execution in noisy environments. Our proposed formulation is learned from data

and is thus able to leverage statistical regularities of the world. This allows

it to efficiently navigate in novel environments given only a sparse set of

registered images as input for building representations for space. Our

formulation is based on three key ideas: a learned path planner that outputs

path plans to reach the goal, a feature synthesis engine that predicts features

for locations along the planned path, and a learned goal-driven closed loop

controller that can follow plans given these synthesized features. We test our

approach for goal-driven navigation in simulated real world environments and

report performance gains over competitive baseline approaches.

### Learning Intelligent Dialogs for Bounding Box Annotation

Ksenia Konyushkova , Jasper Uijlings , Christoph Lampert , Vittorio Ferrari **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

We introduce Intelligent Annotation Dialogs for bounding box annotation. We

train an agent to automatically choose a sequence of actions for a human

annotator to produce a bounding box in a minimal amount of time. Specifically,

we consider two actions: box verification [37], where the annotator verifies a

box generated by an object detector, and manual box drawing. We explore two

kinds of agents, one based on predicting the probability that a box will be

positively verified, and the other based on reinforcement learning. We

demonstrate that (1) our agents are able to learn efficient annotation

strategies in several scenarios, automatically adapting to the difficulty of an

input image, the desired quality of the boxes, the strength of the detector,

and other factors; (2) in all scenarios the resulting annotation dialogs speed

up annotation compared to manual box drawing alone and box verification alone,

while also out- performing any fixed combination of verification and draw- ing

in most scenarios; (3) in a realistic scenario where the detector is

iteratively re-trained, our agents evolve a series of strategies that reflect

the shifting trade-off between verification and drawing as the detector grows

stronger.

### Siamese Neural Networks for One-shot detection of Railway Track Switches

Dattaraj J Rao , Shruti Mittal , S. Ritika

Comments: 6 pages and 7 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Deep Learning methods have been extensively used to analyze video data to

extract valuable information by classifying image frames and detecting objects.

We describe a unique approach for using video feed from a moving Locomotive to

continuously monitor the Railway Track and detect significant assets like

Switches on the Track. The technique used here is called Siamese Networks,

which uses 2 identical networks to learn the similarity between of 2 images.

Here we will use a Siamese network to continuously compare Track images and

detect any significant difference in the Track. Switch will be one of those

images that will be different and we will find a mapping that clearly

distinguishes the Switch from other possible Track anomalies. The same method

will then be extended to detect any abnormalities on the Railway Track. Railway

Transportation is unique in the sense that is has wheeled vehicles, Trains

pulled by Locomotives, running on guided Rails at very high speeds nearing 200

mph. Multiple Tracks on the Rail network are connected to each other using an

equipment called Switch or a Turnout. Switch is either operated manually or

automatically through command from a Control center and it governs the movement

of Trains on different Tracks of the network. Accurate location of these

Switches is very important for the railroad and getting a true picture of their

state in field is important. Modern trains use high definition video cameras

facing the Track that continuously record video from track. Using a Siamese

network and comparing to benchmark images, we describe a method to monitor the

Track and highlight anomalies.

### Human Action Recognition: Pose-based Attention draws focus to Hands

Fabien Baradel , Christian Wolf , Julien Mille

Comments: ICCV 2017 Workshop “Hands in action”. arXiv admin note: text overlap with arXiv:1703.10106

Journal-ref: ICCV 2017

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

We propose a new spatio-temporal attention based mechanism for human action

recognition able to automatically attend to the hands most involved into the

studied action and detect the most discriminative moments in an action.

Attention is handled in a recurrent manner employing Recurrent Neural Network

(RNN) and is fully-differentiable. In contrast to standard soft-attention based

mechanisms, our approach does not use the hidden RNN state as input to the

attention model. Instead, attention distributions are extracted using external

information: human articulated pose. We performed an extensive ablation study

to show the strengths of this approach and we particularly studied the

conditioning aspect of the attention mechanism. We evaluate the method on the

largest currently available human action recognition dataset, NTU-RGB+D, and

report state-of-the-art results. Other advantages of our model are certain

aspects of explanability, as the spatial and temporal attention distributions

at test time allow to study and verify on which parts of the input data the

method focuses.

### Encoding CNN Activations for Writer Recognition

Vincent Christlein , Andreas Maier

Comments: (revised) DAS2018 submission

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

The encoding of local features is an essential part for writer identification

and writer retrieval. While CNN activations have already been used as local

features in related works, the encoding of these features has attracted little

attention so far. In this work, we compare the established VLAD encoding with

triangulation embedding. We further investigate generalized max pooling as an

alternative to sum pooling and the impact of decorrelation and Exemplar SVMs.

With these techniques, we set new standards on two publicly available datasets

(ICDAR13, KHATT).

### Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

Aljoša Ošep , Wolfgang Mehner , Paul Voigtlaender , Bastian Leibe

Comments: ICRA’18 submission

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

The most common paradigm for vision-based multi-object tracking is

tracking-by-detection, due to the availability of reliable detectors for

several important object categories such as cars and pedestrians. However,

future mobile systems will need a capability to cope with rich human-made

environments, in which obtaining detectors for every possible object category

would be infeasible. In this paper, we propose a model-free multi-object

tracking approach that uses a category-agnostic image segmentation method to

track objects. We present an efficient segmentation mask-based tracker which

associates pixel-precise masks reported by the segmentation. Our approach can

utilize semantic information whenever it is available for classifying objects

at the track level, while retaining the capability to track generic unknown

objects in the absence of such information. We demonstrate experimentally that

our approach achieves performance comparable to state-of-the-art

tracking-by-detection methods for popular object categories such as cars and

pedestrians. Additionally, we show that the proposed method can discover and

robustly track a large variety of other objects.

Comments: To appear in the Proceedings of the 2018 IEEE International Symposium on Biomedical Imaging (ISBI 2018)

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Ultrasound imaging makes use of backscattering of waves during their

interaction with scatterers present in biological tissues. Simulation of

synthetic ultrasound images is a challenging problem on account of inability to

completely model various factors of which some include intra-/inter scanline

interference, transducer to surface coupling, artifacts on transducer elements,

inhomogeneous shadowing and nonlinear attenuation. While various approaches to

ultrasound simulation has been developed, approaches that produce

patho-realistic images typically solve wave space equations making it

computationally expensive and slow to operate. We propose a generative

adversarial network (GAN) inspired approach for fast simulation of

patho-realistic ultrasound images. We apply the framework to intravascular

ultrasound (IVUS) simulation. A stage 0 simulation is done from the

echogenicity map of the tissue obtained from the ground truth label of

ultrasound image using an off the shelf pseudo B-mode ultrasound image

simulator . The images obtained are adversarially refined using stacked GAN.

The stage I GAN generates low resolution images from the images generated by

the initial simulation. The stage II GAN further refines the output of the

stage I GAN and generates high resolution images which are patho-realistic. We

demonstrate that the network generates realistic appearing images evaluated

with a visual Turing test indicating an equivocal confusion in discriminating

simulated from real. We also quantify the shift in tissue specific intensity

distributions of the real and simulated images to prove their similarity.

### Exploring Models and Data for Remote Sensing Image Caption Generation

Xiaoqiang Lu , Binqiang Wang , Xiangtao Zheng , Xuelong Li

Comments: 14 pages, 8 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Inspired by recent development of artificial satellite, remote sensing images

have attracted extensive attention. Recently, noticeable progress has been made

in scene classification and target detection.However, it is still not clear how

to describe the remote sensing image content with accurate and concise

sentences. In this paper, we investigate to describe the remote sensing images

with accurate and flexible sentences. First, some annotated instructions are

presented to better describe the remote sensing images considering the special

characteristics of remote sensing images. Second, in order to exhaustively

exploit the contents of remote sensing images, a large-scale aerial image data

set is constructed for remote sensing image caption. Finally, a comprehensive

review is presented on the proposed data set to fully advance the task of

remote sensing caption. Extensive experiments on the proposed data set

demonstrate that the content of the remote sensing image can be completely

described by generating language descriptions. The data set is available at

### Deep learning for predicting refractive error from retinal fundus images

Avinash V. Varadarajan , Ryan Poplin , Katy Blumer , Christof Angermueller , Joe Ledsam , Reena Chopra , Pearse A. Keane , Greg S. Corrado , Lily Peng , Dale R. Webster **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Refractive error, one of the leading cause of visual impairment, can be

corrected by simple interventions like prescribing eyeglasses. We trained a

deep learning algorithm to predict refractive error from the fundus photographs

from participants in the UK Biobank cohort, which were 45 degree field of view

images and the AREDS clinical trial, which contained 30 degree field of view

images. Our model use the “attention” method to identify features that are

correlated with refractive error. Mean absolute error (MAE) of the algorithm’s

prediction compared to the refractive error obtained in the AREDS and UK

Biobank. The resulting algorithm had a MAE of 0.56 diopters (95% CI: 0.55-0.56)

for estimating spherical equivalent on the UK Biobank dataset and 0.91 diopters

(95% CI: 0.89-0.92) for the AREDS dataset. The baseline expected MAE (obtained

by simply predicting the mean of this population) was 1.81 diopters (95% CI:

1.79-1.84) for UK Biobank and 1.63 (95% CI: 1.60-1.67) for AREDS. Attention

maps suggested that the foveal region was one of the most important areas used

by the algorithm to make this prediction, though other regions also contribute

to the prediction. The ability to estimate refractive error with high accuracy

from retinal fundus photos has not been previously known and demonstrates that

deep learning can be applied to make novel predictions from medical images.

Given that several groups have recently shown that it is feasible to obtain

retinal fundus photos using mobile phones and inexpensive attachments, this

work may be particularly relevant in regions of the world where autorefractors

may not be readily available.

### Context-Aware Semantic Inpainting

Haofeng Li , Guanbin Li , Liang Lin , Yizhou Yu **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Recently image inpainting has witnessed rapid progress due to generative

adversarial networks (GAN) that are able to synthesize realistic contents.

However, most existing GAN-based methods for semantic inpainting apply an

auto-encoder architecture with a fully connected layer, which cannot accurately

maintain spatial information. In addition, the discriminator in existing GANs

struggle to understand high-level semantics within the image context and yield

semantically consistent content. Existing evaluation criteria are biased

towards blurry results and cannot well characterize edge preservation and

visual authenticity in the inpainting results. In this paper, we propose an

improved generative adversarial network to overcome the aforementioned

limitations. Our proposed GAN-based framework consists of a fully convolutional

design for the generator which helps to better preserve spatial structures and

a joint loss function with a revised perceptual loss to capture high-level

semantics in the context. Furthermore, we also introduce two novel measures to

better assess the quality of image inpainting results. Experimental results

demonstrate that our method outperforms the state of the art under a wide range

of criteria.

### Automatic Estimation of Ice Bottom Surfaces from Radar Imagery

Mingze Xu , David J Crandall , Geoffrey C Fox , John D Paden

Comments: 5 pages, 3 figures, published in ICIP 2017

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Ground-penetrating radar on planes and satellites now makes it practical to

collect 3D observations of the subsurface structure of the polar ice sheets,

providing crucial data for understanding and tracking global climate change.

But converting these noisy readings into useful observations is generally done

by hand, which is impractical at a continental scale. In this paper, we propose

a computer vision-based technique for extracting 3D ice-bottom surfaces by

viewing the task as an inference problem on a probabilistic graphical model. We

first generate a seed surface subject to a set of constraints, and then

incorporate additional sources of evidence to refine it via discrete energy

minimization. We evaluate the performance of the tracking algorithm on 7

topographic sequences (each with over 3000 radar images) collected from the

Canadian Arctic Archipelago with respect to human-labeled ground truth.

### Enhance Visual Recognition under Adverse Conditions via Deep Networks

Ding Liu , Bowen Cheng , Zhangyang Wang , Haichao Zhang , Thomas S. Huang **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Visual recognition under adverse conditions is a very important and

challenging problem of high practical value, due to the ubiquitous existence of

quality distortions during image acquisition, transmission, or storage. While

deep neural networks have been extensively exploited in the techniques of

low-quality image restoration and high-quality image recognition tasks

respectively, few studies have been done on the important problem of

recognition from very low-quality images. This paper proposes a deep learning

based framework for improving the performance of image and video recognition

models under adverse conditions, using robust adverse pre-training or its

aggressive variant. The robust adverse pre-training algorithms leverage the

power of pre-training and generalizes conventional unsupervised pre-training

and data augmentation methods. We further develop a transfer learning approach

to cope with real-world datasets of unknown adverse conditions. The proposed

framework is comprehensively evaluated on a number of image and video

recognition benchmarks, and obtains significant performance improvements under

various single or mixed adverse conditions. Our visualization and analysis

further add to the explainability of results.

### An Order Preserving Bilinear Model for Person Detection in Multi-Modal Data

Oytun Ulutan , Benjamin S. Riggan , Nasser M. Nasrabadi , B.S. Manjunath **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

We propose a new order preserving bilinear framework that exploits

low-resolution video for person detection in a multi-modal setting using deep

neural networks. In this setting cameras are strategically placed such that

less robust sensors, e.g. geophones that monitor seismic activity, are located

within the field of views (FOVs) of cameras. The primary challenge is being

able to leverage sufficient information from videos where there are less than

40 pixels on targets, while also taking advantage of less discriminative

information from other modalities, e.g. seismic. Unlike state-of-the-art

methods, our bilinear framework retains spatio-temporal order when computing

the vector outer products between pairs of features. Despite the high

dimensionality of these outer products, we demonstrate that our order

preserving bilinear framework yields better performance than recent orderless

bilinear models and alternative fusion methods.

### Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth

Yuankai Huo , Zhoubing Xu , Shunxing Bao , Albert Assad , Richard G. Abramson , Bennett A. Landman

Comments: IEEE International Symposium on Biomedical Imaging (ISBI) 2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

A lack of generalizability is one key limitation of deep learning based

segmentation. Typically, one manually labels new training images when

segmenting organs in different imaging modalities or segmenting abnormal organs

from distinct disease cohorts. The manual efforts can be alleviated if one is

able to reuse manual labels from one modality (e.g., MRI) to train a

segmentation network for a new modality (e.g., CT). Previously, two stage

methods have been proposed to use cycle generative adversarial networks

(CycleGAN) to synthesize training images for a target modality. Then, these

efforts trained a segmentation network independently using synthetic images.

However, these two independent stages did not use the complementary information

between synthesis and segmentation. Herein, we proposed a novel end-to-end

synthesis and segmentation network (EssNet) to achieve the unpaired MRI to CT

image synthesis and CT splenomegaly segmentation simultaneously without using

manual labels on CT. The end-to-end EssNet achieved significantly higher median

Dice similarity coefficient (0.9188) than the two stages strategy (0.8801), and

even higher than canonical multi-atlas segmentation (0.9125) and ResNet method

(0.9107), which used the CT manual labels.

### A Deep Learning Interpretable Classifier for Diabetic Retinopathy Disease Grading

Jordi de la Torre , Aida Valls , Domenec Puig

Comments: Submitted to Elsevier

**Subjects**

:

Artificial Intelligence (cs.AI)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep neural network models have been proven to be very successful in image

classification tasks, also for medical diagnosis, but their main concern is its

lack of interpretability. They use to work as intuition machines with high

statistical confidence but unable to give interpretable explanations about the

reported results. The vast amount of parameters of these models make difficult

to infer a rationale interpretation from them. In this paper we present a

diabetic retinopathy interpretable classifier able to classify retine images

into the different levels of disease severity and of explaining its results by

assigning a score for every point in the hidden and input space, evaluating its

contribution to the final classification in a linear way. The generated visual

maps can be interpreted by an expert in order to compare its own knowledge with

the interpretation given by the model.

### AVEID: Automatic Video System for Measuring Engagement In Dementia

Viral Parekh , Pin Sym Foong , Shendong Zhao , Ramanathan Subramanian **Subjects** : Human-Computer Interaction (cs.HC) ; Computer Vision and Pattern Recognition (cs.CV)

Engagement in dementia is typically measured using behavior observational

scales (BOS) that are tedious and involve intensive manual labor to annotate,

and are therefore not easily scalable. We propose AVEID, a low cost and

easy-to-use video-based engagement measurement tool to determine the engagement

level of a person with dementia (PwD) during digital interaction. We show that

the objective behavioral measures computed via AVEID correlate well with

subjective expert impressions for the popular MPES and OME BOS, confirming its

viability and effectiveness. Moreover, AVEID measures can be obtained for a

variety of engagement designs, thereby facilitating large-scale studies with

PwD populations.

### Note on Attacking Object Detectors with Adversarial Stickers

Kevin Eykholt , Ivan Evtimov , Earlence Fernandes , Bo Li , Dawn Song , Tadayoshi Kohno , Amir Rahmati , Atul Prakash , Florian Tramer

Comments: Short Note

**Subjects**

:

Cryptography and Security (cs.CR)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep learning has proven to be a powerful tool for computer vision and has

seen widespread adoption for numerous tasks. However, deep learning algorithms

are known to be vulnerable to adversarial examples. These adversarial inputs

are created such that, when provided to a deep learning algorithm, they are

very likely to be mislabeled. This can be problematic when deep learning is

used to assist in safety critical decisions. Recent research has shown that

classifiers can be attacked by physical adversarial examples under various

physical conditions. Given the fact that state-of-the-art objection detection

algorithms are harder to be fooled by the same set of adversarial examples,

here we show that these detectors can also be attacked by physical adversarial

examples. In this note, we briefly show both static and dynamic test results.

We design an algorithm that produces physical adversarial inputs, which can

fool the YOLO object detector and can also attack Faster-RCNN with relatively

high success rate based on transferability. Furthermore, our algorithm can

compress the size of the adversarial inputs to stickers that, when attached to

the targeted object, result in the detector either mislabeling or not detecting

the object a high percentage of the time. This note provides a small set of

results. Our upcoming paper will contain a thorough evaluation on other object

detectors, and will present the algorithm.

### Wolf in Sheep's Clothing – The Downscaling Attack Against Deep Learning Applications

Qixue Xiao , Kang Li , Deyue Zhang , Yier Jin **Subjects** : Cryptography and Security (cs.CR) ; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

This paper considers security risks buried in the data processing pipeline in

common deep learning applications. Deep learning models usually assume a fixed

scale for their training and input data. To allow deep learning applications to

handle a wide range of input data, popular frameworks, such as Caffe,

TensorFlow, and Torch, all provide data scaling functions to resize input to

the dimensions used by deep learning models. Image scaling algorithms are

intended to preserve the visual features of an image after scaling. However,

common image scaling algorithms are not designed to handle human crafted

images. Attackers can make the scaling outputs look dramatically different from

the corresponding input images.

This paper presents a downscaling attack that targets the data scaling

process in deep learning applications. By carefully crafting input data that

mismatches with the dimension used by deep learning models, attackers can

create deceiving effects. A deep learning application effectively consumes data

that are not the same as those presented to users. The visual inconsistency

enables practical evasion and data poisoning attacks to deep learning

applications. This paper presents proof-of-concept attack samples to popular

deep-learning-based image classification applications. To address the

downscaling attacks, the paper also suggests multiple potential mitigation

strategies.

### Deep metric learning for multi-labelled radiographs

Mauro Annarumma , Giovanni Montana

Comments: SAC 2018

**Subjects**

:

Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV)

Many radiological studies can reveal the presence of several co-existing

abnormalities, each one represented by a distinct visual pattern. In this

article we address the problem of learning a distance metric for plain

radiographs that captures a notion of “radiological similarity”: two chest

radiographs are considered to be similar if they share similar abnormalities.

Deep convolutional neural networks (DCNs) are used to learn a low-dimensional

embedding for the radiographs that is equipped with the desired metric. Two

loss functions are proposed to deal with multi-labelled images and potentially

noisy labels. We report on a large-scale study involving over 745,000 chest

radiographs whose labels were automatically extracted from free-text

radiological reports through a natural language processing system. Using 4,500

validated exams, we demonstrate that the methodology performs satisfactorily on

clustering and image retrieval tasks. Remarkably, the learned metric separates

normal exams from those having radiological abnormalities.

## Artificial Intelligence

### A Deep Learning Interpretable Classifier for Diabetic Retinopathy Disease Grading

Jordi de la Torre , Aida Valls , Domenec Puig

Comments: Submitted to Elsevier

**Subjects**

:

Artificial Intelligence (cs.AI)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep neural network models have been proven to be very successful in image

classification tasks, also for medical diagnosis, but their main concern is its

lack of interpretability. They use to work as intuition machines with high

statistical confidence but unable to give interpretable explanations about the

reported results. The vast amount of parameters of these models make difficult

to infer a rationale interpretation from them. In this paper we present a

diabetic retinopathy interpretable classifier able to classify retine images

into the different levels of disease severity and of explaining its results by

assigning a score for every point in the hidden and input space, evaluating its

contribution to the final classification in a linear way. The generated visual

maps can be interpreted by an expert in order to compare its own knowledge with

the interpretation given by the model.

Mario Lezcano Casado , Atilim Gunes Baydin , David Martinez Rubio , Tuan Anh Le , Frank Wood , Lukas Heinrich , Gilles Louppe , Kyle Cranmer , Karen Ng , Wahid Bhimji , Prabhat

Comments: 7 pages, 2 figures

**Subjects**

:

Artificial Intelligence (cs.AI)

; Data Analysis, Statistics and Probability (physics.data-an)

We consider the problem of Bayesian inference in the family of probabilistic

models implicitly defined by stochastic generative models of data. In

scientific fields ranging from population biology to cosmology, low-level

mechanistic components are composed to create complex generative models. These

models lead to intractable likelihoods and are typically non-differentiable,

which poses challenges for traditional approaches to inference. We extend

previous work in “inference compilation”, which combines universal

probabilistic programming and deep learning methods, to large-scale scientific

simulators, and introduce a C++ based probabilistic programming library called

CPProb. We successfully use CPProb to interface with SHERPA, a large code-base

used in particle physics. Here we describe the technical innovations realized

and planned for this library.

### A Deep Policy Inference Q-Network for Multi-Agent Systems

Zhang-Wei Hong , Shih-Yang Su , Tzu-Yun Shann , Yi-Hsiang Chang , Chun-Yi Lee **Subjects** : Artificial Intelligence (cs.AI)

We present DPIQN, a deep policy inference Q-network that targets multi-agent

systems composed of controllable agents, collaborators, and opponents that

interact with each other. We focus on one challenging issue in such

systems—modeling agents with varying strategies—and propose to employ

“policy features” learned from raw observations (e.g., raw images) of

collaborators and opponents by inferring their policies. DPIQN incorporates the

learned policy features as a hidden vector into its own deep Q-network (DQN),

such that it is able to predict better Q values for the controllable agents

than the state-of-the-art deep reinforcement learning models. We further

propose an enhanced version of DPIQN, called deep recurrent policy inference

Q-network (DRPIQN), for handling partial observability. Both DPIQN and DRPIQN

are trained by an adaptive training procedure, which adjusts the network’s

attention to learn the policy features and its own Q-values at different phases

of the training process. We present a comprehensive analysis of DPIQN and

DRPIQN, and highlight their effectiveness and generalizability in various

multi-agent settings. Our models are evaluated in a classic soccer game

involving both competitive and collaborative scenarios. Experimental results

performed on 1 vs. 1 and 2 vs. 2 games show that DPIQN and DRPIQN demonstrate

superior performance to the baseline DQN and deep recurrent Q-network (DRQN)

models. We also explore scenarios in which collaborators or opponents

dynamically change their policies, and show that DPIQN and DRPIQN do lead to

better overall performance in terms of stability and mean scores.

Rajesh Chidambaram **Subjects** : Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)

Artificial Intelligence (AI), is once again in the phase of drastic

advancements. Unarguably, the technology itself can revolutionize the way we

live our everyday life. But the exponential growth of technology poses a

daunting task for policy researchers and law makers in making amendments to the

existing norms. In addition, not everyone in the society is studying the

potential socio-economic intricacies and cultural drifts that AI can bring

about. It is prudence to reflect from our historical past to propel the

development of technology in the right direction. To benefit the society of the

present and future, I scientifically explore the societal impact of AI. While

there are many public and private partnerships working on similar aspects, here

I describe the necessity for an Unanimous International Regulatory Body for all

applications of AI (UIRB-AI). I also discuss the benefits and drawbacks of such

an organization. To combat any drawbacks in the formation of an UIRB-AI, both

idealistic and pragmatic perspectives are discussed alternatively. The paper

further advances the discussion by proposing novel policies on how such

organization should be structured and how it can bring about a win-win

situation for everyone in the society.

### Pseudorehearsal in actor-critic agents with neural network function approximation

Vladimir Marochko , Leonard Johard , Manuel Mazzara , Luca Longo **Subjects** : Artificial Intelligence (cs.AI)

Catastrophic forgetting has a significant negative impact in reinforcement

learning. The purpose of this study is to investigate how pseudorehearsal can

change performance of an actor-critic agent with neural-network function

approximation. We tested agent in a pole balancing task and compared different

pseudorehearsal approaches. We have found that pseudorehearsal can assist

learning and decrease forgetting.

### Multiagent-based Participatory Urban Simulation through Inverse Reinforcement Learning

Soma Suzuki **Subjects** : Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)

The multiagent-based participatory simulation features prominently in urban

planning as the acquired model is considered as the hybrid system of the domain

and the local knowledge. However, the key problem of generating realistic

agents for particular social phenomena invariably remains. The existing models

have attempted to dictate the factors involving human behavior, which appeared

to be intractable. In this paper, Inverse Reinforcement Learning (IRL) is

introduced to address this problem. IRL is developed for computational modeling

of human behavior and has achieved great successes in robotics, psychology and

machine learning. The possibilities presented by this new style of modeling are

drawn out as conclusions, and the relative challenges with this modeling are

highlighted.

### Geometrical Insights for Implicit Generative Modeling

Leon Bottou , Martin Arjovsky , David Lopez-Paz , Maxime Oquab **Subjects** : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Learning (cs.LG)

Learning algorithms for implicit generative models can optimize a variety of

criteria that measure how the data distribution differs from the implicit model

distribution, including the Wasserstein distance, the Energy distance, and the

Maximum Mean Discrepancy criterion. A careful look at the geometries induced by

these distances on the space of probability measures reveals interesting

differences. In particular, we can establish surprising approximate global

convergence guarantees for the (1)-Wasserstein distance,even when the

parametric generator has a nonconvex parametrization.

### Bit-Vector Model Counting using Statistical Estimation

Seonmo Kim , Stephen McCamant **Subjects** : Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)

Approximate model counting for bit-vector SMT formulas (generalizing #SAT)

has many applications such as probabilistic inference and quantitative

information-flow security, but it is computationally difficult. Adding random

parity constraints (XOR streamlining) and then checking satisfiability is an

effective approximation technique, but it requires a prior hypothesis about the

model count to produce useful results. We propose an approach inspired by

statistical estimation to continually refine a probabilistic estimate of the

model count for a formula, so that each XOR-streamlined query yields as much

information as possible. We implement this approach, with an approximate

probability model, as a wrapper around an off-the-shelf SMT solver or SAT

solver. Experimental results show that the implementation is faster than the

most similar previous approaches which used simpler refinement strategies. The

technique also lets us model count formulas over floating-point constraints,

which we demonstrate with an application to a vulnerability in differential

privacy mechanisms.

## Information Retrieval

### Overview of the Triple Scoring Task at the WSDM Cup 2017

Hannah Bast , Björn Buchhold , Elmar Haussmann **Subjects** : Information Retrieval (cs.IR)

This paper provides an overview of the triple scoring task at the WSDM Cup

2017, including a description of the task and the dataset, an overview of the

participating teams and their results, and a brief account of the methods

employed. In a nutshell, the task was to compute relevance scores for

knowledge-base triples from relations, where such scores make sense. Due to the

way the ground truth was constructed, scores were required to be integers from

the range 0..7. For example, reasonable scores for the triples “Tim Burton

profession Director” and “Tim Burton profession Actor” would be 7 and 2,

respectively, because Tim Burton is well-known as a director, but he acted only

in a few lesser known movies.

The triple scoring task attracted considerable interest, with 52 initial

registrations and 21 teams who submitted a valid run before the deadline. The

winning team achieved an accuracy of 87%, that is, for that fraction of the

triples from the test set (which was revealed only after the deadline) the

difference to the score from the ground truth was at most 2. The best result

for the average difference from the test set scores was 1.50.

### PERS: A Personalized and Explainable POI Recommender System

Ramesh Baral , Tao Li **Subjects** : Information Retrieval (cs.IR)

The Location-Based Social Networks (LBSN) (e.g., Facebook) have many factors

(for instance, ratings, check-in time, etc.) that play a crucial role for the

Point-of-Interest (POI) recommendations. Unlike ratings, the reviews can help

users to elaborate their opinion and share the extent of consumption experience

in terms of the relevant factors of interest (aspects). Though some of the

existing recommendation systems have been using the user reviews, most of them

are less transparent and non-interpretable. These reasons have induced

considerable attention towards explainable and interpretable recommendation. To

the best of our knowledge, this is the first paper to exploit the user reviews

to incorporate the sentiment and opinions on different aspects for the

personalized and explainable POI recommendation. In this paper, we propose a

model termed as PERS (Personalized Explainable POI Recommender System) which

models the review-aspect category correlation by exploiting deep neural

network, formulates the user-aspect category bipartite relation as a bipartite

graph, and models the explainable recommendation using bipartite core-based and

ranking-based methods. The major contributions of this paper are: (i) it models

users and locations based on the aspects posted by user via reviews, (ii) it

exploits a deep neural network to model the review-aspect category correlation,

(iii) it provisions the incorporation of multiple contexts (e.g., categorical,

spatial, etc.) in the POI recommendation model, (iv) it formulates the

preference of users’ on aspect category as a bipartite relation, represents it

as a location-aspect category bipartite graph, and models the explainable

recommendation with the notion of ordered dense subgraph extraction using

bipartite core-based and ranking-based approaches, and (v) it evaluates the

generated recommendation with three real-world datasets.

### Inferring User Interests in Microblogging Social Networks: A Survey

Guangyuan Piao , John G. Breslin

Comments: journal submission

**Subjects**

:

Information Retrieval (cs.IR)

With the popularity of microblogging services such as Twitter in recent

years, an increasing number of users use these services in their daily lives.

The huge volume of information generated by users raises new opportunities in

various applications and areas. Inferring user interests plays a significant

role in providing personalized recommendations on microblogging services, and

third-party applications providing social logins via these services, especially

in cold-start situations. In this survey, we review user modeling strategies

with respect to inferring user interests in previous studies. To this end, we

focus on four dimensions of inferring user interest profiles: (1) data

collection, (2) representation of user interest profiles, (3) construction and

enhancement of user interest profiles, and (4) the evaluation of the

constructed profiles. Through this survey, we aim to provide an overview of

state-of-the-art user modeling strategies for inferring user interest profiles

on microblogging social networks with respect to the four dimensions. For each

dimension, we review and summarize previous studies based on specified

criteria. Finally, we discuss some challenges and opportunities for future work

in this research domain.

### Multiview Deep Learning for Predicting Twitter Users' Location

Tien Huu Do , Duc Minh Nguyen , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis

Comments: Submitted to the IEEE Transactions on Big Data

**Subjects**

:

Learning (cs.LG)

; Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

The problem of predicting the location of users on large social networks like

Twitter has emerged from real-life applications such as social unrest detection

and online marketing. Twitter user geolocation is a difficult and active

research topic with a vast literature. Most of the proposed methods follow

either a content-based or a network-based approach. The former exploits

user-generated content while the latter utilizes the connection or interaction

between Twitter users. In this paper, we introduce a novel method combining the

strength of both approaches. Concretely, we propose a multi-entry neural

network architecture named MENET leveraging the advances in deep learning and

multiview learning. The generalizability of MENET enables the integration of

multiple data representations. In the context of Twitter user geolocation, we

realize MENET with textual, network, and metadata features. Considering the

natural distribution of Twitter users across the concerned geographical area,

we subdivide the surface of the earth into multi-scale cells and train MENET

with the labels of the cells. We show that our method outperforms the state of

the art by a large margin on three benchmark datasets.

## Computation and Language

The Character Thinks Ahead: creative writing with deep learning nets and its stylistic assessment

Comments: A 2 page paper in press in Leonardo Vol 51, 2018. Yet to be copy-edited

**Subjects**

:

Computation and Language (cs.CL)

We discuss how to control outputs from deep learning models of text corpora

so as to create contemporary poetic works. We assess whether these controls are

successful in the immediate sense of creating stylo- metric distinctiveness.

The specific context is our piece The Character Thinks Ahead (2016/17); the

potential applications are broad.

### Context-aware Path Ranking for Knowledge Base Completion

Journal-ref: Published in IJCAI 2017

**Subjects**

:

Computation and Language (cs.CL)

Knowledge base (KB) completion aims to infer missing facts from existing ones

in a KB. Among various approaches, path ranking (PR) algorithms have received

increasing attention in recent years. PR algorithms enumerate paths between

entity pairs in a KB and use those paths as features to train a model for

missing fact prediction. Due to their good performances and high model

interpretability, several methods have been proposed. However, most existing

methods suffer from scalability (high RAM consumption) and feature explosion

(trains on an exponentially large number of features) problems. This paper

proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems

by introducing a selective path exploration strategy. C-PR learns global

semantics of entities in the KB using word embedding and leverages the

knowledge of entity semantics to enumerate contextually relevant paths using

bidirectional random walk. Experimental results on three large KBs show that

the path features (fewer in number) discovered by C-PR not only improve

predictive performance but also are more interpretable than existing baselines.

## Distributed, Parallel, and Cluster Computing

### Renaissance: Self-Stabilizing Distributed SDN Control Plane

Marco Canini (1), Iosif Salem (2), Liron Schiff (3), Elad Michael Schiller (2), Stefan Schmid (4 and 5) ((1) Université catholique de Louvain, (2) Chalmers University of Technology, (3) GuardiCore Labs, (4) University of Vienna, (5) Aalborg University) **Subjects** : Networking and Internet Architecture (cs.NI) ; Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

By introducing programmability, automated verification, and innovative

debugging tools, Software-Defined Networks (SDNs) are poised to meet the

increasingly stringent dependability requirements of today’s communication

networks. However, the design of fault-tolerant SDNs remains an open challenge.

This paper considers the design of dependable SDNs through the lenses of

self-stabilization – a very strong notion of fault-tolerance. In particular, we

develop algorithms for an in-band and distributed control plane for SDNs,

called Renaissance, which tolerate a wide range of (concurrent) controller,

link, and communication failures. Our self-stabilizing algorithms ensure that

after the occurrence of an arbitrary combination of failures, (i) every

non-faulty SDN controller can eventually reach any switch in the network within

a bounded communication delay (in the presence of a bounded number of

concurrent failures) and (ii) every switch is managed by at least one

non-faulty controller. We evaluate Renaissance through a rigorous worst-case

analysis as well as a prototype implementation (based on OVS and Floodlight),

and we report on our experiments using Mininet.

## Learning

### Multiview Deep Learning for Predicting Twitter Users' Location

Tien Huu Do , Duc Minh Nguyen , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis

Comments: Submitted to the IEEE Transactions on Big Data

**Subjects**

:

Learning (cs.LG)

; Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

The problem of predicting the location of users on large social networks like

Twitter has emerged from real-life applications such as social unrest detection

and online marketing. Twitter user geolocation is a difficult and active

research topic with a vast literature. Most of the proposed methods follow

either a content-based or a network-based approach. The former exploits

user-generated content while the latter utilizes the connection or interaction

between Twitter users. In this paper, we introduce a novel method combining the

strength of both approaches. Concretely, we propose a multi-entry neural

network architecture named MENET leveraging the advances in deep learning and

multiview learning. The generalizability of MENET enables the integration of

multiple data representations. In the context of Twitter user geolocation, we

realize MENET with textual, network, and metadata features. Considering the

natural distribution of Twitter users across the concerned geographical area,

we subdivide the surface of the earth into multi-scale cells and train MENET

with the labels of the cells. We show that our method outperforms the state of

the art by a large margin on three benchmark datasets.

### Maximally Distant Cross Domain Generators for Estimating Per-Sample Error

Sagie Benaim , Tomer Galanti , Lior Wolf

Comments: The first and second authors contributed equally

**Subjects**

:

Learning (cs.LG)

While in supervised learning, the validation error is an unbiased estimator

of the generalization (test) error and complexity-based generalization bounds

are abundant, no such bounds exist for learning a mapping in an unsupervised

way. As a result, when training GANs and specifically when using GANs for

learning to map between domains in a completely unsupervised way, one is forced

to select the hyperparameters and the stopping epoch by subjectively examining

multiple options. We propose a novel bound for predicting the success of

unsupervised cross domain mapping methods, which is motivated by the recently

proposed simplicity hypothesis. The bound can be applied both in expectation,

for comparing hyperparameters, or per sample, in order to predict the success

of a specific cross-domain translation. The utility of the bound is

demonstrated in an extensive set of experiments employing multiple recent

algorithms.

### DropMax: Adaptive Stochastic Softmax

Hae Beom Lee , Juho Lee , Eunho Yang , Sung Ju Hwang **Subjects** : Learning (cs.LG)

We propose DropMax, a stochastic version of softmax classifier which at each

iteration drops non-target classes with some probability, for each instance.

Specifically, we overlay binary masking variables over class output

probabilities, which are learned based on the input via regularized variational

inference. This stochastic regularization has an effect of building an ensemble

classifier out of exponential number of classifiers with different decision

boundaries. Moreover, the learning of dropout probabilities for non-target

classes on each instance allows the classifier to focus more on classification

against the most confusing classes. We validate our model on multiple public

datasets for classification, on which it obtains improved accuracy over regular

softmax classifier and other baselines. Further analysis of the learned dropout

masks shows that our model indeed selects confusing classes more often when it

performs classification.

### Deep Unsupervised Clustering Using Mixture of Autoencoders

Dejiao Zhang , Yifan Sun , Brian Eriksson , Laura Balzano

Comments: 8 pages, 7 figures

**Subjects**

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Unsupervised clustering is one of the most fundamental challenges in machine

learning. A popular hypothesis is that data are generated from a union of

low-dimensional nonlinear manifolds; thus an approach to clustering is

identifying and separating these manifolds. In this paper, we present a novel

approach to solve this problem by using a mixture of autoencoders. Our model

consists of two parts: 1) a collection of autoencoders where each autoencoder

learns the underlying manifold of a group of similar objects, and 2) a mixture

assignment neural network, which takes the concatenated latent vectors from the

autoencoders as input and infers the distribution over clusters. By jointly

optimizing the two parts, we simultaneously assign data to clusters and learn

the underlying manifolds of each cluster.

### Unifying Map and Landmark Based Representations for Visual Navigation

Saurabh Gupta , David Fouhey , Sergey Levine , Jitendra Malik

Comments: Project page with videos: this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Robotics (cs.RO)

This works presents a formulation for visual navigation that unifies map

based spatial reasoning and path planning, with landmark based robust plan

execution in noisy environments. Our proposed formulation is learned from data

and is thus able to leverage statistical regularities of the world. This allows

it to efficiently navigate in novel environments given only a sparse set of

registered images as input for building representations for space. Our

formulation is based on three key ideas: a learned path planner that outputs

path plans to reach the goal, a feature synthesis engine that predicts features

for locations along the planned path, and a learned goal-driven closed loop

controller that can follow plans given these synthesized features. We test our

approach for goal-driven navigation in simulated real world environments and

report performance gains over competitive baseline approaches.

### Profit Driven Decision Trees for Churn Prediction

Sebastiaan Höppner , Eugen Stripling , Bart Baesens , Seppe vanden Broucke , Tim Verdonck **Subjects** : Machine Learning (stat.ML) ; Learning (cs.LG); Applications (stat.AP)

Customer retention campaigns increasingly rely on predictive models to detect

potential churners in a vast customer base. From the perspective of machine

learning, the task of predicting customer churn can be presented as a binary

classification problem. Using data on historic behavior, classification

algorithms are built with the purpose of accurately predicting the probability

of a customer defecting. The predictive churn models are then commonly selected

based on accuracy related performance measures such as the area under the ROC

curve (AUC). However, these models are often not well aligned with the core

business requirement of profit maximization, in the sense that, the models fail

to take into account not only misclassification costs, but also the benefits

originating from a correct classification. Therefore, the aim is to construct

churn prediction models that are profitable and preferably interpretable too.

The recently developed expected maximum profit measure for customer churn

(EMPC) has been proposed in order to select the most profitable churn model. We

present a new classifier that integrates the EMPC metric directly into the

model construction. Our technique, called ProfTree, uses an evolutionary

algorithm for learning profit driven decision trees. In a benchmark study with

real-life data sets from various telecommunication service providers, we show

that ProfTree achieves significant profit improvements compared to classic

accuracy driven tree-based methods.

### Note on Attacking Object Detectors with Adversarial Stickers

Kevin Eykholt , Ivan Evtimov , Earlence Fernandes , Bo Li , Dawn Song , Tadayoshi Kohno , Amir Rahmati , Atul Prakash , Florian Tramer

Comments: Short Note

**Subjects**

:

Cryptography and Security (cs.CR)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep learning has proven to be a powerful tool for computer vision and has

seen widespread adoption for numerous tasks. However, deep learning algorithms

are known to be vulnerable to adversarial examples. These adversarial inputs

are created such that, when provided to a deep learning algorithm, they are

very likely to be mislabeled. This can be problematic when deep learning is

used to assist in safety critical decisions. Recent research has shown that

classifiers can be attacked by physical adversarial examples under various

physical conditions. Given the fact that state-of-the-art objection detection

algorithms are harder to be fooled by the same set of adversarial examples,

here we show that these detectors can also be attacked by physical adversarial

examples. In this note, we briefly show both static and dynamic test results.

We design an algorithm that produces physical adversarial inputs, which can

fool the YOLO object detector and can also attack Faster-RCNN with relatively

high success rate based on transferability. Furthermore, our algorithm can

compress the size of the adversarial inputs to stickers that, when attached to

the targeted object, result in the detector either mislabeling or not detecting

the object a high percentage of the time. This note provides a small set of

results. Our upcoming paper will contain a thorough evaluation on other object

detectors, and will present the algorithm.

### Non-convex Optimization for Machine Learning

Prateek Jain , Purushottam Kar

Comments: The official publication is available from now publishers via this http URL

Journal-ref: Foundations and Trends in Machine Learning: Vol. 10: No. 3-4, pp

142-336 (2017)

**Subjects**

:

Machine Learning (stat.ML)

; Learning (cs.LG); Optimization and Control (math.OC)

A vast majority of machine learning algorithms train their models and perform

inference by solving optimization problems. In order to capture the learning

and prediction problems accurately, structural constraints such as sparsity or

low rank are frequently imposed or else the objective itself is designed to be

a non-convex function. This is especially true of algorithms that operate in

high-dimensional spaces or that train non-linear models such as tensor models

and deep networks.

The freedom to express the learning problem as a non-convex optimization

problem gives immense modeling power to the algorithm designer, but often such

problems are NP-hard to solve. A popular workaround to this has been to relax

non-convex problems to convex ones and use traditional methods to solve the

(convex) relaxed optimization problems. However this approach may be lossy and

nevertheless presents significant challenges for large scale optimization.

On the other hand, direct approaches to non-convex optimization have met with

resounding success in several domains and remain the methods of choice for the

practitioner, as they frequently outperform relaxation-based techniques –

popular heuristics include projected gradient descent and alternating

minimization. However, these are often poorly understood in terms of their

convergence and other properties.

This monograph presents a selection of recent advances that bridge a

long-standing gap in our understanding of these heuristics. The monograph will

lead the reader through several widely used non-convex optimization techniques,

as well as applications thereof. The goal of this monograph is to both,

introduce the rich literature in this area, as well as equip the reader with

the tools and techniques needed to analyze these simple procedures for

non-convex problems.

### Geometrical Insights for Implicit Generative Modeling

Leon Bottou , Martin Arjovsky , David Lopez-Paz , Maxime Oquab **Subjects** : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Learning (cs.LG)

Learning algorithms for implicit generative models can optimize a variety of

criteria that measure how the data distribution differs from the implicit model

distribution, including the Wasserstein distance, the Energy distance, and the

Maximum Mean Discrepancy criterion. A careful look at the geometries induced by

these distances on the space of probability measures reveals interesting

differences. In particular, we can establish surprising approximate global

convergence guarantees for the (1)-Wasserstein distance,even when the

parametric generator has a nonconvex parametrization.

### Indoor Sound Source Localization with Probabilistic Neural Network

Yingxiang Sun , Jiajia Chen , Chau Yuen , Susanto Rahardja

Comments: 10 pages, accepted by IEEE Transactions on Industrial Electronics

**Subjects**

:

Sound (cs.SD)

; Learning (cs.LG); Audio and Speech Processing (eess.AS)

It is known that adverse environments such as high reverberation and low

signal-to-noise ratio (SNR) pose a great challenge to indoor sound source

localization. To address this challenge, in this paper, we propose a sound

source localization algorithm based on probabilistic neural network, namely

Generalized cross correlation Classification Algorithm (GCA). Experimental

results for adverse environments with high reverberation time T60 up to 600ms

and low SNR such as -10dB show that, the average azimuth angle error and

elevation angle error by GCA are only 4.6 degrees and 3.1 degrees respectively.

Compared with three recently published algorithms, GCA has increased the

success rate on direction of arrival estimation significantly with good

robustness to environmental changes. These results show that the proposed GCA

can localize accurately and robustly for diverse indoor applications where the

site acoustic features can be studied prior to the localization stage.

### Multi-dimensional Graph Fourier Transform

Takashi Kurokawa , Taihei Oki , Hiromichi Nagao **Subjects** : Methodology (stat.ME) ; Learning (cs.LG); Machine Learning (stat.ML)

Many signals on Cartesian product graphs appear in the real world, such as

digital images, sensor observation time series, and movie ratings on Netflix.

These signals are “multi-dimensional” and have directional characteristics

along each factor graph. However, the existing graph Fourier transform does not

distinguish these directions, and assigns 1-D spectra to signals on product

graphs. Further, these spectra are often multi-valued at some frequencies. Our

main result is a multi-dimensional graph Fourier transform that solves such

problems associated with the conventional GFT. Using algebraic properties of

Cartesian products, the proposed transform rearranges 1-D spectra obtained by

the conventional GFT into the multi-dimensional frequency domain, of which each

dimension represents a directional frequency along each factor graph. Thus, the

multi-dimensional graph Fourier transform enables directional frequency

analysis, in addition to frequency analysis with the conventional GFT.

Moreover, this rearrangement resolves the multi-valuedness of spectra in some

cases. The multi-dimensional graph Fourier transform is a foundation of novel

filterings and stationarities that utilize dimensional information of graph

signals, which are also discussed in this study. The proposed methods are

applicable to a wide variety of data that can be regarded as signals on

Cartesian product graphs. This study also notes that multivariate graph signals

can be regarded as 2-D univariate graph signals. This correspondence provides

natural definitions of the multivariate graph Fourier transform and the

multivariate stationarity based on their 2-D univariate versions.

### Wolf in Sheep's Clothing – The Downscaling Attack Against Deep Learning Applications

Qixue Xiao , Kang Li , Deyue Zhang , Yier Jin **Subjects** : Cryptography and Security (cs.CR) ; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

This paper considers security risks buried in the data processing pipeline in

common deep learning applications. Deep learning models usually assume a fixed

scale for their training and input data. To allow deep learning applications to

handle a wide range of input data, popular frameworks, such as Caffe,

TensorFlow, and Torch, all provide data scaling functions to resize input to

the dimensions used by deep learning models. Image scaling algorithms are

intended to preserve the visual features of an image after scaling. However,

common image scaling algorithms are not designed to handle human crafted

images. Attackers can make the scaling outputs look dramatically different from

the corresponding input images.

This paper presents a downscaling attack that targets the data scaling

process in deep learning applications. By carefully crafting input data that

mismatches with the dimension used by deep learning models, attackers can

create deceiving effects. A deep learning application effectively consumes data

that are not the same as those presented to users. The visual inconsistency

enables practical evasion and data poisoning attacks to deep learning

applications. This paper presents proof-of-concept attack samples to popular

deep-learning-based image classification applications. To address the

downscaling attacks, the paper also suggests multiple potential mitigation

strategies.

### Towards a Deep Improviser: a prototype deep learning post-tonal free music generator

Comments: 13 pages, 1 Figure, 3 Tables

**Subjects**

:

Sound (cs.SD)

; Learning (cs.LG); Audio and Speech Processing (eess.AS)

Two modest-sized symbolic corpora of post-tonal and post-metric keyboard

music have been constructed, one algorithmic, the other improvised. Deep

learning models of each have been trained and largely optimised. Our purpose is

to obtain a model with sufficient generalisation capacity that in response to a

small quantity of separate fresh input seed material, it can generate outputs

that are distinctive, rather than recreative of the learned corpora or the seed

material. This objective has been first assessed statistically, and as judged

by k-sample Anderson-Darling and Cramer tests, has been achieved. Music has

been generated using the approach, and informal judgements place it roughly on

a par with algorithmic and composed music in related forms. Future work will

aim to enhance the model such that it can be evaluated in relation to

expression, meaning and utility in real-time performance.

## Information Theory

### A Deep Reinforcement Learning-Based Framework for Content Caching

Chen Zhong , M. Cenk Gursoy , Senem Velipasalar

Comments: 6 pages, 3 figures

**Subjects**

:

Information Theory (cs.IT)

Content caching at the edge nodes is a promising technique to reduce the data

traffic in next-generation wireless networks. Inspired by the success of Deep

Reinforcement Learning (DRL) in solving complicated control problems, this work

presents a DRL-based framework with Wolpertinger architecture for content

caching at the base station. The proposed framework is aimed at maximizing the

long-term cache hit rate, and it requires no knowledge of the content

popularity distribution. To evaluate the proposed framework, we compare the

performance with other caching algorithms, including Least Recently Used (LRU),

Least Frequently Used (LFU), and First-In First-Out (FIFO) caching strategies.

Meanwhile, since the Wolpertinger architecture can effectively limit the action

space size, we also compare the performance with Deep Q-Network to identify the

impact of dropping a portion of the actions. Our results show that the proposed

framework can achieve improved short-term cache hit rate and improved and

stable long-term cache hit rate in comparison with LRU, LFU, and FIFO schemes.

Additionally, the performance is shown to be competitive in comparison to Deep

Q-learning, while the proposed framework can provide significant savings in

runtime.

Optimal Error Correcting Delivery Scheme for Coded Caching with Symmetric Batch Prefetching

Nujoom Sageer Karat , Anoop Thomas , B. Sundar Rajan

Comments: 9 pages and 4 figures

**Subjects**

:

Information Theory (cs.IT)

Coded caching is used to reduce network congestion during peak hours. A

single server is connected to a set of users through a bottleneck link, which

generally is assumed to be error-free. During non-peak hours, all the users

have full access to the files and they fill their local cache with portions of

the files available. During delivery phase, each user requests a file and the

server delivers coded transmissions to meet the demands taking into

consideration their cache contents. In this paper we assume that the shared

link is error prone. A new delivery scheme is required to meet the demands of

each user even after receiving finite number of transmissions in error. We

characterize the minimum average rate and minimum peak rate for this problem.

We find closed form expressions of these rates for a particular caching scheme

namely extit{symmetric batch prefetching}. We also propose an optimal error

correcting delivery scheme for coded caching problem with symmetric batch

prefetching.

Libo Wang , Baofeng Wu **Subjects** : Information Theory (cs.IT)

Recently, there has been a lot of work on constructions of permutation

polynomials of the form ((x^{2^m}+x+delta)^{s}+x) over the finite field

(F_{2^{2m}}), especially in the case when (s) is of the form (s=i(2^m-1)+1)

(Niho exponent). In this paper, we further investigate permutation polynomials

with this form. Instead of seeking for sporadic constructions of the parameter

(i), we give a general sufficient condition on (i) such that

((x^{2^m}+x+delta)^{i(2^m-1)+1}+x) permutes (F_{2^{2m}}), that is, ((2^k+1)i

equiv 1 ~ extrm{or}~ 2^k~( extrm{mod}~ 2^m+1)), where (1 leq k leq m-1) is

any integer. This generalizes a recent result obtained by Gupta and Sharma who

actually dealt with the case (k=2). It turns out that most of previous

constructions of the parameter (i) are covered by our result, and it yields

many new classes of permutation polynomials as well.

### Bounds on the Entropy of a Function of a Random Variable and their Applications

Ferdinando Cicalese , Luisa Gargano , Ugo Vaccaro

Comments: Accepted for publications to IEEE Transactions on Information Theory

**Subjects**

:

Information Theory (cs.IT)

; Data Structures and Algorithms (cs.DS)

It is well known that the entropy (H(X)) of a discrete random variable (X) is

always greater than or equal to the entropy (H(f(X))) of a function (f) of (X),

with equality if and only if (f) is one-to-one. In this paper, we give tight

bounds on (H(f(X))) when the function (f) is not one-to-one, and we illustrate

a few scenarios where this matters. As an intermediate step towards our main

result, we derive a lower bound on the entropy of a probability distribution,

when only a bound on the ratio between the maximal and minimal probabilities is

known. The lower bound improves on previous results in the literature, and it

could find applications outside the present scenario.

### On the Information Dimension of Multivariate Gaussian Processes

Bernhard C. Geiger , Tobias Koch

Comments: This work will be presented in part at the 2018 International Zurich Seminar on Information and Communication

**Subjects**

:

Information Theory (cs.IT)

The authors have recently defined the R’enyi information dimension rate

(d({X_t})) of a stationary stochastic process ({X_t,,tinmathbb{Z}}) as

the entropy rate of the uniformly-quantized process divided by minus the

logarithm of the quantizer step size (1/m) in the limit as (m oinfty) (B.

Geiger and T. Koch, “On the information dimension rate of stochastic

processes,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June

2017). For Gaussian processes with a given spectral distribution function

(F_X), they showed that the information dimension rate equals the Lebesgue

measure of the set of harmonics where the derivative of (F_X) is positive. This

paper extends this result to multivariate Gaussian processes with a given

matrix-valued spectral distribution function (F_{mathbf{X}}). It is

demonstrated that the information dimension rate equals the average rank of the

derivative of (F_{mathbf{X}}). As a side result, it is shown that the scale

and translation invariance of information dimension carries over from random

variables to stochastic processes.

Interference Exploitation Precoding Made Practical: Closed-Form Solutions with Optimal Performance

Comments: submitted to IEEE transactions

**Subjects**

:

Information Theory (cs.IT)

In this paper, we propose closed-form precoding schemes with optimal

performance for constructive interference (CI) exploitation in the multiuser

multiple-input single-output (MU-MISO) downlink. We first consider an

optimization where we maximize the distance between the constructive region and

the detection thresholds. The cases of both strict and non-strict phase

rotation are considered and can further be formulated as convex optimization

problems. For optimization with strict phase rotation, we mathematically derive

the optimal beamforming structure with Lagrangian and Karush-Kuhn-Tucker (KKT)

conditions. By formulating its dual problem, the optimization problem is

further shown to be equivalent to a quadratic programming (QP) over a simplex,

which can be solved more efficiently. We then extend our analysis to the case

of non-strict phase rotation, where it is mathematically shown that a

K-dimensional optimization for non-strict phase rotation is equivalent to a

2K-dimensional optimization for strict phase rotation in terms of the problem

formulation. The connection with the conventional zero-forcing (ZF) precoding

is also discussed. Based on the above analysis, we further propose an iterative

closed-form scheme to obtain the optimal beamforming matrix, where within each

iteration a closed-form solution can be obtained. Numerical results validate

our analysis and the optimality of the proposed iterative scheme, and further

show that the proposed closed-form scheme is more efficient than the

conventional QP algorithms with interior-point methods, which motivates the use

of CI beamforming in practical wireless systems.

### Interference Steering to Manage Interference

Zhao Li , Fengjuan Guo , Kang G Shin , Yinghou Liu , Jia Liu **Subjects** : Information Theory (cs.IT)

To enable densely deployed base stations (BSs) or access points (APs) to

serve an increasing number of users and provide diverse mobile services, we

need to improve spectrum utilization in wireless communication networks.

Although spectral efficiency (SE) can be enhanced via smart and dynamic

resource allocation, interference has become a major impediment in improving

SE. There have been numerous interference management (IM) proposals at the

interfering transmitter or the victim transmitter/receiver separately or

cooperatively. Moreover, the existing IM schemes rely mainly on the use of

channel state information (CSI). However, in some communication scenarios, the

option to adjust the interferer is not available, and, in the case of downlink

transmission, it is always difficult or even impossible for the victim receiver

to acquire necessary information for IM. Based on the above observations, we

first propose a novel IM technique, called interference steering (IS). By

making use of both CSI w.r.t. and data carried in the interfering signal, IS

generates a signal to modify the spatial feature of the original interference,

so that the steered interference at the victim receiver is orthogonal to its

intended signal. We then apply IS to an infrastructurebased enterprise wireless

local area network (WLAN) in which the same frequency band is reused by

adjacent basic service sets (BSSs) with overlapping areas. With IS, multiple

nearby APs could simultaneously transmit data on the same channel to their

mobile stations (STAs), thus enhancing spectrum reuse. Our in-depth simulation

results show that IS significantly improves network SE over existing IM

schemes.

### Skew cyclic codes over (mathbb{F}_{p}+umathbb{F}_{p})

Reza Dastbasteh , Seyyed Hamed Mousavi , Taher Abualrub , Nuh Aydin , Javad Haghighat **Subjects** : Information Theory (cs.IT)

In this paper, we study skew cyclic codes with arbitrary length over the ring

(R=mathbb{F}_{p}+umathbb{F}_{p}) where (p) is an odd prime and (% u^{2}=0).

We characterize all skew cyclic codes of length (n) as left (% R[x; heta

])-submodules of (R_{n}=R[x; heta ]/langle x^{n}-1

angle ). We find all

generator polynomials for these codes and describe their minimal spanning sets.

Moreover, an encoding and decoding algorithm is presented for skew cyclic codes

over the ring (R). Finally, based on the theory we developed in this paper, we

provide examples of codes with good parameters over (F_{p}) with different odd

prime (p.) In fact, example 25 in our paper is a new ternary code in the class

of quasi-twisted codes. The other examples we provided are examples of optimal

codes.

Tan Zheng Hui Ernest , A S Madhukumar , Rajendra Prasad Sirigina , Anoop Kumar Krishna **Subjects** : Information Theory (cs.IT)

A hybrid-duplex aeronautical communication system (HBD-ACS) consisting of a

full-duplex (FD) enabled ground station (GS), and two half-duplex (HD)

air-stations (ASs) is proposed as a direct solution to the spectrum crunch

faced by the aviation industry. Closed-form outage probability and finite

signal-to-noise ratio (SNR) diversity gain expressions in aeronautical

communications over Rician fading channels are derived for a successive

interference cancellation (SIC) detector. Similar expressions are also

presented for an interference ignorant (II) detector and HD-equivalent modes at

GS and ASs. Through outage and finite SNR diversity gain analysis conducted at

the nodes, and system level, residual SI and inter-AS interference are found to

be the primary limiting factors in the proposed HBD-ACS. Additional analysis

also revealed that the II and SIC detectors in the proposed HBD-ACS are

suitable for weak and strong interference scenarios, respectively. When

compared to HD-ACS, the proposed HBD-ACS achieves lower outage probability and

higher diversity gains at higher multiplexing gains when operating at low SNRs.

Finite SNR analysis also showed the possibility of the proposed HBD-ACS being

able to attain interference-free diversity gains through proper management of

residual SI. Hence, the proposed HBD-ACS is more reliable and can provide

better throughput compared to existing HD-ACS at low-to-moderate SNRs.

### Error-Free Communication Over State-Dependent Channels with Variable-Length Feedback

Mladen Kovačević , Carol Wang , Vincent Y. F. Tan **Subjects** : Information Theory (cs.IT)

The zero-error capacity of state-dependent channels with noiseless feedback

is determined, under the assumption that the transmitter and the receiver are

allowed to use variable-length coding schemes. Various cases are analyzed, with

the employed coding schemes having either bounded or unbounded codeword lengths

and with state information revealed to the encoder and/or decoder in a strictly

causal, causal, or non-causal manner. In each of these settings, necessary and

sufficient conditions for positivity of the zero-error capacity are obtained

and it is shown that, whenever the zero-error capacity is positive, it equals

the conventional vanishing-error capacity. Moreover, it is shown that the

vanishing-error capacity of state-dependent channels is not increased by the

use of feedback and variable-length coding. A comparison of the results with

the recently solved fixed-length case is also given.

### A Unified Asymptotic Analysis of Area Spectral Efficiency in Ultradense Cellular Networks

Ahmad AlAmmouri , Jeffrey G. Andrews , Francois Baccelli **Subjects** : Information Theory (cs.IT)

This paper studies the asymptotic properties of area spectral efficiency

(ASE) of a downlink cellular network in the limit of very dense base station

(BS) and user densities. This asymptotic analysis relies on three assumptions:

(1) interference is treated as noise; (2) the BS locations are drawn from a

Poisson point process; (3) the path loss function is bounded above satisfying

mild regularity conditions. We consider three possible definitions of the ASE,

all of which give units of bits per second per unit area. When there is no

constraint on the minimum operational SINR and instantaneous full channel state

information is available at the transmitter, the ASE is proven to saturate to a

constant, which we derive in closed form. For the other two ASE definitions,

wherein either a minimum SINR is enforced or full CSI is not available, the ASE

is instead shown to collapse to zero at high BS density. We provide several

familiar case studies for the class of considered path loss models, and

demonstrate that our results cover most previous models and results on

ultradense networks as special cases.

### A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians

Aleksander Mądry , Slobodan Mitrović , Ludwig Schmidt **Subjects** : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT)

Sparsity-based methods are widely used in machine learning, statistics, and

signal processing. There is now a rich class of structured sparsity approaches

that expand the modeling power of the sparsity paradigm and incorporate

constraints such as group sparsity, graph sparsity, or hierarchical sparsity.

While these sparsity models offer improved sample complexity and better

interpretability, the improvements come at a computational cost: it is often

challenging to optimize over the (non-convex) constraint sets that capture

various sparsity structures. In this paper, we make progress in this direction

in the context of separated sparsity — a fundamental sparsity notion that

captures exclusion constraints in linearly ordered data such as time series.

While prior algorithms for computing a projection onto this constraint set

required quadratic time, we provide a perturbed Lagrangian relaxation approach

that computes provably exact projection in only nearly-linear time. Although

the sparsity constraint is non-convex, our perturbed Lagrangian approach is

still guaranteed to find a globally optimal solution. In experiments, our new

algorithms offer a 10( imes) speed-up already on moderately-size inputs.

Shihui Fu , Xiutao Feng , Dongdai Lin , Qiang Wang **Subjects** : Number Theory (math.NT) ; Information Theory (cs.IT); Combinatorics (math.CO)

In this paper, we construct two classes of permutation polynomials over

(mathbb{F}_{q^2}) with odd characteristic from rational R'{e}dei functions. A

complete characterization of their compositional inverses is also given. These

permutation polynomials can be generated recursively. As a consequence, we can

generate recursively permutation polynomials with arbitrary number of terms.

More importantly, the conditions of these polynomials being permutations are

very easy to characterize. For wide applications in practice, several classes

of permutation binomials and trinomials are given. With the help of a computer,

we find that the number of permutation polynomials of these types is very

large.

Satoshi Takabe , Takafumi Nakano , Tadashi Wadayama

Comments: 5 pages

**Subjects**

:

Disordered Systems and Neural Networks (cond-mat.dis-nn)

; Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Social and Information Networks (cs.SI)

The fault tolerance of random graphs with unbounded degrees with respect to

connectivity is investigated. It is related to the reliability of wireless

sensor networks with unreliable relay nodes. The model evaluates the network

breakdown probability that a graph is disconnected after stochastic node

removal. To establish a mean-field approximation for the model, the cavity

method for finite systems is proposed. Then the asymptotic analysis is applied.

As a result, the former enables us to obtain an approximation formula for any

number of nodes and an arbitrary and degree distribution. In addition, the

latter reveals that the phase transition occurs on random graphs with

logarithmic average degrees. Those results, which are supported by numerical

simulations, coincide with the mathematical results, indicating successful

predictions by mean-field approximation for unbounded but not dense random

graphs.

### PKC-PC: A Variant of the McEliece Public Key Cryptosystem based on Polar Codes

Reza Hooshmand , Masoumeh Koochak Shooshtari , Mohammad Reza Aref **Subjects** : Cryptography and Security (cs.CR) ; Information Theory (cs.IT)

Polar codes are novel and efficient error correcting codes with low encoding

and decoding complexities. These codes have a channel dependent generator

matrix which is determined by the code dimension, code length and transmission

channel parameters. This paper studies a variant of the McEliece public key

cryptosystem based on polar codes, called “PKC-PC”. Due to the fact that the

structure of polar codes’ generator matrix depends on the parameters of

channel, we used an efficient approach to conceal their generator matrix. Then,

by the help of the characteristics of polar codes and also introducing an

efficient approach, we reduced the public and private key sizes of the PKC-PC

and increased its information rate compared to the McEliece cryptosystem. It

was shown that polar codes are able to yield an increased security level

against conventional attacks and possible vulnerabilities on the code-based

public key cryptosystems. Moreover, it is indicated that the security of the

PKC-PC is reduced to solve NP-complete problems. Compared to other post-quantum

public key schemes, we believe that the PKC-PC is a promising candidate for

NIST post-quantum crypto standardization.

欢迎加入我爱机器学习QQ14群：336582044

微信扫一扫，关注我爱机器学习公众号

微博：我爱机器学习