A Voting-Based System for Ethical Decision Making

Posted on 2018-04-19 | In machine learning

这篇论文主要是介绍如何利用swap-dominance efficient voting rules来解决，ethical decision making里面的dilemmas
感觉可以关注点是，如何将这个伦理道德的问题，建模解构成一个AI的问题，然后里面涉及的转化过来的数学语言也是值得去学习的。

abstract:

We present a general approach to automating ethical decisions, drawing on machine learning and computational social choice. In a nutshell, we propose to learn a model of societal preferences, and, when faced with a specific ethical dilemma at runtime, efficiently aggregate those preferences to identify a desirable choice. We provide a concrete algorithm that instantiates our approach; some of its crucial steps are informed by a new theory of swap-dominance efficient voting rules. Finally, we implement and evaluate a system for ethical decision making in the autonomous vehicle domain, using preference data collected from 1.3 million people through the Moral Machine website.

首先是这里的问题:

(列车问题: 一辆疾驰的车，突然刹车失灵了，然后只能往左或者往右打方向盘，往左打的话，会让3个行人不幸，往右打的话，会让一个运动员和他的狗失去生命)
在列车问题里面最主要的问题，是没有ground truth，什么是对，什么是错，已经被争论的了几千年了。

main idea:

We submit that decision making can, in fact, be automated, even in the absence of such ground-truth principles, by aggregating people’s opinions on ethical dilemmas.
就是说没有对错的标准，根据人们对这个道德困境观点进行聚集，然后在自动的去做判断。
然后有一个大佬，之前给出一个算法：computational social choice，给出了一些intuition。然后，这篇论文是在原来的基础上更进一步。

approach:

I data collection:

Datacollection:Ask human voters to compare pairs of alternatives (say a few dozen per voter). In the autonomous vehicle domain, an alternative is determined by a vector of features such as the number of victims and their gender, age, health — even species!

obtained from 1,303,778 voters, through the website Moral Machine

II Learning:

Use the pairwise comparisons to learn a model of the preferences of each voter over all possible alternatives.

III Summarization:

Combine the individual models into a single model, which approximately captures the collec- tive preferences of all voters over all possible alternatives.
这里的multi-model感觉就像10-701里面讲的boosting

IV Aggregation:

At runtime, when encountering an ethical dilemma involving a specific subset of alternatives, use the summary model to deduce the preferences of all voters over this particular subset, and apply a voting rule to aggregate these preferences into a collective decision.
In the autonomous vehicle domain, the selected alternative is the outcome that society (as represented by the voters whose preferences were elicited in Step I) views as the least catastrophic among the grim options the vehicle currently faces. Note that this step is only applied when all other options have been exhausted, i.e., all technical ways of avoiding the dilemma in the first place have failed, and all legal constraints that may dictate what to do have also failed.

I-IV: 涉及的理论是 random utility models (for alternatives to generate rankings over the alterna- tives)
在这个情况下，作者需要的是infinite的选择set以及有限的encountered set，然后这个过程叫做permutation processes

central question:
This means we can apply a voting rule in order to aggregate the preferences — but which voting rule should we apply?
And how can we compute the outcome efficiently?
solution:

new pipeline:
learning a permutation process for each voter (Step II); summarizing these individual processes into a single permutation process that satisfies the required swap-dominance property (Step III); and using any swap- dominance efficient voting rule, which is computationally efficient given the swap-dominance property (Step IV)
theory 部分

answer_processing

Posted on 2018-04-19

自然语言处理的project，我负责的是anser processing，大概的pipeline是以question为query，做输入的article做information retrieval. 这里主要用的是tf-idf.

tf-idf (term frequency–inverse document frequency)
- tf 就是term frequency, 就是这个词在某一个文档出现的次数
  - 有好几种计算方法，最简单就是 term在文档出现的词频/文档里面总词数了
- idf 就是inverse docment frequency, 就是衡量这个词在整个语言环境下使用的常见程度，比如the等常见词，在整个英语里面的idf值是很小的，因为太常见了
  - 计算：log(语料库的文档总数/(包含这个词的总文档数 + 1)) (这里加1是为了防止出现divide by zero error)

tf-idf 用来找到doc里面和question最相关的一句话

当找到这句话后，我需要用syntax的方法来找到我的answer
主要用的的analysis方法是dependency parser和consistuency parser:
下面先比较下这两种parse:

consistuency tree:
就是正常的S -> NP VP 这种形式
dependency tree:
就是每一个phrase里面都有一个head，其他相依赖于它，比如：
I love you.
head 是love，subject是I, 然后object是you

然后我主要用的是spacy parser的api

nlp = spacy.load('en')
doc = nlp(u'Autonomous cars shift insurance liability toward manufacturers')
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
          chunk.root.head.text)
Text: The original noun chunk text.
Root text: The original text of the word connecting the noun chunk to the rest of the parse.
Root dep: Dependency relation connecting the root to its head.
Root head text: The text of the root token's head.
(u'Autonomous cars', u'cars', u'nsubj', u'shift')
(u'insurance liability', u'liability', u'dobj', u'shift')
(u'manufacturers', u'manufacturers', u'pobj', u'toward')

这里要处理who, what, when, where, yes no, why 这几种问题
- when, where的话，我直接用parser先parse，然后遍历tree，找出期中的所有pp，然后在用NER tagger去tagger，看下里面有没涉及时间和LOCATION的。
- yes no 比较简单，就是用dependency parse，找里面有没有neg，没有的话就是true

question generation

概率图模型

Posted on 2018-04-18

静下心来

Posted on 2018-04-18 | In 日常

还是把心静下来吧，不用想太多，总结和整理。
总是会有波动，感觉游泳和合理膳食的确是提高效率的关键。规律的生活总是让人一天有效率的工作。

dilated convolution

Posted on 2018-04-17 | In deep learning

先是U-net，它的结构是这样子的，然后是
论文里面是这么说的:

The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each fol- lowed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.

然后代码大概是这样子:

from keras.models import Input, Model
from keras.layers import Conv2D, Concatenate, MaxPooling2D, Reshape, ZeroPadding2D
from keras.layers import UpSampling2D, Activation, Permute

def level_block(m, dim, depth, factor, acti):
    if depth > 0:
        n = Conv2D(dim, 3, activation=acti, padding='same')(m)
        n = Conv2D(dim, 3, activation=acti, padding='same')(n)
        # n = ZeroPadding2D()(n)
        # m = AtrousConvolution2D(dim, 3, 3, atrous_rate=(2, 2), activation=acti)(n)
        m = MaxPooling2D()(n)
        m = level_block(m, int(factor*dim), depth-1, factor, acti)
        m = UpSampling2D()(m)
        m = Conv2D(dim, 2, activation=acti, padding='same')(m)
        # m = Concatenate(axis=3)([n, m])
    m = Conv2D(dim, 3, activation=acti, padding='same')(m)
    return Conv2D(dim, 3, activation=acti, padding='same')(m)

def UNet(img_shape, n_out=1, dim=64, depth=4, factor=2, acti='elu', flatten=False):
    i = Input(shape=img_shape)
    o = level_block(i, dim, depth, factor, acti)
    o = Conv2D(n_out, (1, 1))(o)
    if flatten:
        o = Reshape(n_out, img_shape[0] * img_shape[1])(o)
        o = Permute((2, 1))(o)
    o = Activation('sigmoid')(o)
    return Model(inputs=i, outputs=o)

cited (https://github.com/pietz/brats-segmentation)

先熟悉一下keras里面相关的接口吧，

Conv2D(dim, 3, activation=acti, padding='same')
Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

UpSampling2D(size=(2, 2), data_format=None)
upsampling这里doc里面是说repeat。。。

padding 有三种
- same padding
  顾名思义padding完size和原来一样
- valid padding

先是讲下dilated convolution, 他的结构是这样子的,
avatar
就是kernel的计算的时间复杂度是一样的，但是感受域变大了

CRF

conditional random field

newstart

Posted on 2018-04-17

去人迹罕至的地方吧，留下自己的足迹。
最近，由于时临毕业，开始明白自己的内心，知道自己是想读博的，于是匆匆的找导师，那想一个学期了，除了拒信还是拒信。
期间，也常常在看田渊栋师兄的知乎的回答，他说你要静下来，首先，心静下来才能钻进某个领域里认真做事。
于是，我知道自己不该因为同学找到了google，microsoft等大厂的工作而自乱阵脚，我有自己要走的路，也许这条路很艰辛，也许这条路是那么的不可能，但是自己毕竟还没有真正失败过，还是有点时间，所以继续坚持坚持。

所以我现在就两件事，好好做project，做这个学期（也许是学生时代最后的时期了）的收尾工作，然后继续找看论文老师。学校里的老师重点找，然后是其他学校的老师。然后还是抓住自己一直开始的领域，强化学习，深入下去，还是有很多东西可以做的。加油~少年~

Chu Lin

去人迹罕至的地方，留下自己的足迹。