Merge pull request #7282 from Valoniakim/master

Translated 20180111 AI and machine learning bias has dangerous implications.md
This commit is contained in:
Xingyu.Wang 2018-01-19 22:49:46 +08:00 committed by GitHub
commit eb44c31484
2 changed files with 81 additions and 82 deletions

View File

@ -1,82 +0,0 @@
AI and machine learning bias has dangerous implications
======
translating
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/LAW_goodbadugly.png?itok=ZxaimUWU)
Image by : opensource.com
Algorithms are everywhere in our world, and so is bias. From social media news feeds to streaming service recommendations to online shopping, computer algorithms--specifically, machine learning algorithms--have permeated our day-to-day world. As for bias, we need only examine the 2016 American election to understand how deeply--both implicitly and explicitly--it permeates our society as well.
What's often overlooked, however, is the intersection between these two: bias in computer algorithms themselves.
Contrary to what many of us might think, technology is not objective. AI algorithms and their decision-making processes are directly shaped by those who build them--what code they write, what data they use to "[train][1]" the machine learning models, and how they [stress-test][2] the models after they're finished. This means that the programmers' values, biases, and human flaws are reflected in the software. If I fed an image-recognition algorithm the faces of only white researchers in my lab, for instance, it [wouldn't recognize non-white faces as human][3]. Such a conclusion isn't the result of a "stupid" or "unsophisticated" AI, but to a bias in training data: a lack of diverse faces. This has dangerous consequences.
There's no shortage of examples. [State court systems][4] across the country use "black box" algorithms to recommend prison sentences for convicts. [These algorithms are biased][5] against black individuals because of the data that trained them--so they recommend longer sentences as a result, thus perpetuating existing racial disparities in prisons. All this happens under the guise of objective, "scientific" decision-making.
The United States federal government uses machine-learning algorithms to calculate welfare payouts and other types of subsidies. But [information on these algorithms][6], such as their creators and their training data, is extremely difficult to find--which increases the risk of public officials operating under bias and meting out systematically unfair payments.
This list goes on. From Facebook news algorithms to medical care systems to police body cameras, we as a society are at great risk of inserting our biases--racism, sexism, xenophobia, socioeconomic discrimination, confirmation bias, and more--into machines that will be mass-produced and mass-distributed, operating under the veil of perceived technological objectivity.
This must stop.
While we should by no means halt research and development on artificial intelligence, we need to slow its development such that we tread carefully. The danger of algorithmic bias is already too great.
## How can we fight algorithmic bias?
One of the best ways to fight algorithmic bias is by vetting the training data fed into machine learning models themselves. As [researchers at Microsoft][2] point out, this can take many forms.
The data itself might have a skewed distribution--for instance, programmers may have more data about United States-born citizens than immigrants, and about rich men than poor women. Such imbalances will cause an AI to make improper conclusions about how our society is in fact represented--i.e., that most Americans are wealthy white businessmen--simply because of the way machine-learning models make statistical correlations.
It's also possible, even if men and women are equally represented in training data, that the representations themselves result in prejudiced understandings of humanity. For instance, if all the pictures of "male occupation" are of CEOs and all those of "female occupation" are of secretaries (even if more CEOs are in fact male than female), the AI could conclude that women are inherently not meant to be CEOs.
We can imagine similar issues, for example, with law enforcement AIs that examine representations of criminality in the media, which dozens of studies have shown to be [egregiously slanted][7] towards black and Latino citizens.
Bias in training data can take many other forms as well--unfortunately, more than can be adequately covered here. Nonetheless, training data is just one form of vetting; it's also important that AI models are "stress-tested" after they're completed to seek out prejudice.
If we show an Indian face to our camera, is it appropriately recognized? Is our AI less likely to recommend a job candidate from an inner city than a candidate from the suburbs, even if they're equally qualified? How does our terrorism algorithm respond to intelligence on a white domestic terrorist compared to an Iraqi? Can our ER camera pull up medical records of children?
These are obviously difficult issues to resolve in the data itself, but we can begin to identify and address them through comprehensive testing.
## Why is open source well-suited for this task?
Both open source technology and open source methodologies have extreme potential to help in this fight against algorithmic bias.
Modern artificial intelligence is dominated by open source software, from TensorFlow to IBM Watson to packages like [scikit-learn][8]. The open source community has already proven extremely effective in developing robust and rigorously tested machine-learning tools, so it follows that the same community could effectively build anti-bias tests into that same software.
Debugging tools like [DeepXplore][9], out of Columbia and Lehigh Universities, for example, make the AI stress-testing process extensive yet also easy to navigate. This and other projects, such as work being done at [MIT's Computer Science and Artificial Intelligence Lab][10], develop the agile and rapid prototyping the open source community should adopt.
Open source technology has also proven to be extremely effective for vetting and sorting large sets of data. Nothing should make this more obvious than the domination of open source tools in the data analysis market (Weka, Rapid Miner, etc.). Tools for identifying data bias should be designed by the open source community, and those techniques should also be applied to the plethora of open training data sets already published on sites like [Kaggle][11].
The open source methodology itself is also well-suited for designing processes to fight bias. Making conversations about software open, democratized, and in tune with social good are pivotal to combating an issue that is partly caused by the very opposite--closed conversations, private software development, and undemocratized decision-making. If online communities, corporations, and academics can adopt these open source characteristics when approaching machine learning, fighting algorithmic bias should become easier.
## How can we all get involved?
Education is extremely important. We all know people who may be unaware of algorithmic bias but who care about its implications--for law, social justice, public policy, and more. It's critical to talk to those people and explain both how the bias is formed and why it matters because the only way to get these conversations started is to start them ourselves.
For those of us who work with artificial intelligence in some capacity--as developers, on the policy side, through academic research, or in other capacities--these conversations are even more important. Those who are designing the artificial intelligence of tomorrow need to understand the extreme dangers that bias presents today; clearly, integrating anti-bias processes into software design depends on this very awareness.
Finally, we should all build and strengthen open source community around ethical AI. Whether that means contributing to software tools, stress-testing machine learning models, or sifting through gigabytes of training data, it's time we leverage the power of open source methodology to combat one of the greatest threats of our digital age.
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/1/how-open-source-can-fight-algorithmic-bias
作者:[Justin Sherman][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/justinsherman
[1]:https://www.crowdflower.com/what-is-training-data/
[2]:https://medium.com/microsoft-design/how-to-recognize-exclusion-in-ai-ec2d6d89f850
[3]:https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms
[4]:https://www.wired.com/2017/04/courts-using-ai-sentence-criminals-must-stop-now/
[5]:https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[6]:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3012499
[7]:https://www.hivlawandpolicy.org/sites/default/files/Race%20and%20Punishment-%20Racial%20Perceptions%20of%20Crime%20and%20Support%20for%20Punitive%20Policies%20%282014%29.pdf
[8]:http://scikit-learn.org/stable/
[9]:https://arxiv.org/pdf/1705.06640.pdf
[10]:https://www.csail.mit.edu/research/understandable-deep-networks
[11]:https://www.kaggle.com/datasets

View File

@ -0,0 +1,81 @@
AI 和机器中暗含的算法偏见是怎样形成的,我们又能通过开源社区做些什么
======
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/LAW_goodbadugly.png?itok=ZxaimUWU)
图片来源opensource.com
在我们的世界里,算法无处不在,偏见也是一样。从社会媒体新闻的提供到流式媒体服务的推荐到线上购物,计算机算法,尤其是机器学习算法,已经渗透到我们日常生活的每一个角落。至于偏见,我们只需要参考 2016 年美国大选就可以知道,偏见是怎样在明处与暗处影响着我们的社会。
很难想像,我们经常忽略的一点是这二者的交集:计算机算法中存在的偏见。
与我们大多数人所认为的相反,科技并不是客观的。 AI 算法和它们的决策程序是由它们的研发者塑造的,他们写入的代码,使用的“[训练][1]”数据还有他们对算法进行[应力测试][2] 的过程,都会影响这些算法今后的选择。这意味着研发者的价值观,偏见和人类缺陷都会反映在软件上。如果我只给实验室中的人脸识别算法提供白人的照片,当遇到不是白人照片时,它[不会认为照片中的是人类][3] 。这结论并不意味着 AI 是“愚蠢的”或是“天真的”,它显示的是训练数据的分布偏差:缺乏多种的脸部照片。这会引来非常严重的后果。
这样的例子并不少。全美范围内的[州法院系统][4] 都使用“黑箱子”对罪犯进行宣判。由于训练数据的问题,[这些算法对黑人有偏见][5] ,他们对黑人罪犯会选择更长的服刑期,因此监狱中的种族差异会一直存在。而这些都发生在科技的客观性伪装下,这是“科学的”选择。
美国联邦政府使用机器学习算法来计算福利性支出和各类政府补贴。[但这些算法中的信息][6],例如它们的创造者和训练信息,都很难找到。这增加了政府工作人员进行不平等补助金分发操作的几率。
算法偏见情况还不止这些。从 Facebook 的新闻算法到医疗系统再到警方使用的相机,我们作为社会的一部分极有可能对这些算法输入各式各样的偏见,性别歧视,仇外思想,社会经济地位歧视,确认偏误等等。这些被输入了偏见的机器会大量生产分配,将种种社会偏见潜藏于科技客观性的面纱之下。
这种状况绝对不能再继续下去了。
在我们对人工智能进行不断开发研究的同时,需要降低它的开发速度,小心仔细地开发。算法偏见的危害已经足够大了。
## 我们能怎样减少算法偏见?
最好的方式是从算法训练的数据开始审查,根据 [Microsoft 的研究者][2] 所说,这方法很有效。
数据分布本身就带有一定的偏见性。编程者手中的美国公民数据分布并不均衡,本地居民的数据多于移民者,富人的数据多于穷人,这是极有可能出现的情况。这种数据的不平均会使 AI 对我们是社会组成得出错误的结论。例如机器学习算法仅仅通过统计分析,就得出“大多数美国人都是富有的白人”这个结论。
即使男性和女性的样本在训练数据中等量分布,也可能出现偏见的结果。如果训练数据中所有男性的职业都是 CEO而所有女性的职业都是秘书即使现实中男性 CEO 的数量要多于女性AI 也可能得出女性天生不适合做 CEO 的结论。
同样的,大量研究表明,用于执法部门的 AI 在检测新闻中出现的罪犯照片时,结果会 [惊人地偏向][7] 黑人及拉丁美洲裔居民。
在训练数据中存在的偏见还有很多其他形式,不幸的是比这里提到的要多得多。但是训练数据只是审查方式的一种,通过“应力测验”找出人类存在的偏见也同样重要。
如果提供一张印度人的照片,我们自己的相机能够识别吗?在两名同样水平的应聘者中,我们的 AI 是否会倾向于推荐住在市区的应聘者呢?对于情报中本地白人恐怖分子和伊拉克籍恐怖分子,反恐算法会怎样选择呢?急诊室的相机可以调出儿童的病历吗?
这些对于 AI 来说是十分复杂的数据,但我们可以通过多项测试对它们进行定义和传达。
## 为什么开源很适合这项任务?
开源方法和开源技术都有着极大的潜力改变算法偏见。
现代人工智能已经被开源软件占领TensorFlow、IBM Watson 还有 [scikit-learn][8] 这类的程序包都是开源软件。开源社区已经证明它能够开发出强健的,经得住严酷测试的机器学习工具。同样的,我相信,开源社区也能开发出消除偏见的测试程序,并将其应用于这些软件中。
调试工具如哥伦比亚大学和理海大学推出的 [DeepXplore][9],增强了 AI 应力测试的强度,同时提高了其操控性。还有 [麻省理工学院的计算机科学和人工智能实验室][10]完成的项目,它开发出敏捷快速的样机研究软件,这些应该会被开源社区采纳。
开源技术也已经证明了其在审查和分类大组数据方面的能力。最明显的体现在开源工具在数据分析市场的占有率上Weka , Rapid Miner 等等)。应当由开源社区来设计识别数据偏见的工具,已经在网上发布的大量训练数据组比如 [Kaggle][11]也应当使用这种技术进行识别筛选。
开源方法本身十分适合消除偏见程序的设计。内部谈话,私人软件开发及非民主的决策制定引起了很多问题。开源社区能够进行软件公开的谈话,进行大众化,维持好与大众的关系,这对于处理以上问题是十分重要的。如果线上社团,组织和院校能够接受这些开源特质,那么由开源社区进行消除算法偏见的机器设计也会顺利很多。
## 我们怎样才能够参与其中?
教育是一个很重要的环节。我们身边有很多还没意识到算法偏见的人,但算法偏见在立法,社会公正,政策及更多领域产生的影响与他们息息相关。让这些人知道算法偏见是怎样形成的和它们带来的重要影响是很重要的,因为想要改变目前是局面,从我们自身做起是唯一的方法。
对于我们中间那些与人工智能一起工作的人来说,这种沟通尤其重要。不论是人工智能的研发者,警方或是科研人员,当他们为今后设计人工智能时,应当格外意识到现今这种偏见存在的危险性,很明显,想要消除人工智能中存在的偏见,就要从意识到偏见的存在开始。
最后,我们需要围绕 AI 伦理化建立并加强开源社区。不论是需要建立应力实验训练模型,软件工具,或是从千兆字节的训练数据中筛选,现在已经到了我们利用开源方法来应对数字化时代最大的威胁的时间了。
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/1/how-open-source-can-fight-algorithmic-bias
作者:[Justin Sherman][a]
译者:[Valoniakim](https://github.com/Valoniakim)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/justinsherman
[1]:https://www.crowdflower.com/what-is-training-data/
[2]:https://medium.com/microsoft-design/how-to-recognize-exclusion-in-ai-ec2d6d89f850
[3]:https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms
[4]:https://www.wired.com/2017/04/courts-using-ai-sentence-criminals-must-stop-now/
[5]:https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[6]:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3012499
[7]:https://www.hivlawandpolicy.org/sites/default/files/Race%20and%20Punishment-%20Racial%20Perceptions%20of%20Crime%20and%20Support%20for%20Punitive%20Policies%20%282014%29.pdf
[8]:http://scikit-learn.org/stable/
[9]:https://arxiv.org/pdf/1705.06640.pdf
[10]:https://www.csail.mit.edu/research/understandable-deep-networks
[11]:https://www.kaggle.com/datasets