Merge pull request #20780 from sthwhl/talk-20191010-01

提交译文
This commit is contained in:
Xingyu.Wang 2021-01-19 23:06:16 +08:00 committed by GitHub
commit 4a73796a91
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 40 additions and 65 deletions

View File

@ -1,65 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (sthwhl)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (The biggest risk to uptime? Your staff)
[#]: via: (https://www.networkworld.com/article/3444762/the-biggest-risk-to-uptime-your-staff.html)
[#]: author: (Andy Patrizio https://www.networkworld.com/author/Andy-Patrizio/)
The biggest risk to uptime? Your staff
======
Human error is the chief cause of downtime, a new study finds. Imagine that.
Getty Images
There was an old joke: "To err is human, but to really foul up you need a computer." Now it seems the reverse is true. The reliability of [data center][1] equipment is vastly improved but the humans running them have not kept up and it's a threat to uptime.
The Uptime Institute has surveyed thousands of IT professionals throughout the year on outages and said the vast majority of data center failures are caused by human error, from 70 percent to 75 percent.
[[Get regularly scheduled insights by signing up for Network World newsletters. ]][2]
And some of them are severe. It found more than 30 percent of IT service and data center operators experienced downtime that they called a “severe degradation of service” over the last year, with 10 percent of the 2019 respondents reporting that their most recent incident cost more than $1 million.
[][3]
BrandPost Sponsored by HPE
[Take the Intelligent Route with Consumption-Based Storage][3]
Combine the agility and economics of HPE storage with HPE GreenLake and run your IT department with efficiency.
In Uptime's April 2019 survey, 60 percent of respondents believed that their most recent significant downtime incident could have been prevented with better management/processes or configuration. For outages that cost greater than $1 million, this figure jumped to 74 percent.
However, the end fault is not necessarily with the staff, Uptime argues, but with management that has failed them.
Advertisement
"Perhaps there is simply a limit to what can be achieved in an industry that still relies heavily on people to perform many of the most basic and critical tasks and thus is subject to human error, which can never be completely eliminated," wrote Kevin Heslin, chief editor of the Uptime Institute Journal in a [blog post][4].
"However, a quick survey of the issues suggests that management failure — not human error — is the main reason that outages persist. By under-investing in training, failing to enforce policies, allowing procedures to grow outdated, and underestimating the importance of qualified staff, management sets the stage for a cascade of circumstances that leads to downtime," Heslin went on to say.
Uptime noted that the complexity of a companys infrastructure, especially the distributed nature of it, can increase the risk that simple errors will cascade into a service outage and said companies need to be aware of the greater risk involved with greater complexity.
On the staffing side, it cautioned against expanding critical IT capacity faster than the company can attract and apply the resources to manage that infrastructure and to be aware of any staffing and skills shortage before they start to impair mission-critical operations.
Join the Network World communities on [Facebook][5] and [LinkedIn][6] to comment on topics that are top of mind.
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3444762/the-biggest-risk-to-uptime-your-staff.html
作者:[Andy Patrizio][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Andy-Patrizio/
[b]: https://github.com/lujun9972
[1]: https://www.networkworld.com/article/3223692/what-is-a-data-centerhow-its-changed-and-what-you-need-to-know.html
[2]: https://www.networkworld.com/newsletters/signup.html
[3]: https://www.networkworld.com/article/3440100/take-the-intelligent-route-with-consumption-based-storage.html?utm_source=IDG&utm_medium=promotions&utm_campaign=HPE20773&utm_content=sidebar ( Take the Intelligent Route with Consumption-Based Storage)
[4]: https://journal.uptimeinstitute.com/how-to-avoid-outages-try-harder/
[5]: https://www.facebook.com/NetworkWorld/
[6]: https://www.linkedin.com/company/network-world

View File

@ -0,0 +1,40 @@
[#]: collector: "lujun9972"
[#]: translator: "sthwhl"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
[#]: subject: "The biggest risk to uptime? Your staff"
[#]: via: "https://www.networkworld.com/article/3444762/the-biggest-risk-to-uptime-your-staff.html"
[#]: author: "Andy Patrizio https://www.networkworld.com/author/Andy-Patrizio/"
计算机正常运行最大的威胁是什么?是员工
======
  想象一下,一项新研究发现,人为失误是引发停机时间的首要原因。之前有一个很老的笑话:“是人都会犯错,但是要真正把事情搞砸,你还缺台计算机。” 现在情况正好相反了,现如今,数据中心设备的可靠性已经得到了极大的提升,反而是使用设备的人员素质没能跟上,从而给计算机正常运行带来了很大的威胁。
  国际正常运行时间协会对数千名 IT 专业人员一整年发生的故障事件进行了调查,得出结论表示绝大多数的数据中心故障是由于人为错误造成的,人为错误导致的故障率为 70%-75%,而且有些故障是很严重。调查发现,超过 30% 的 IT 服务与数据中心运营商经历了他们称之为是“严重服务退化”的停机事故。2019 年有 10% 的受访者称他们最近的事故造成的损失超过 100 万美元,在 Uptime 2019 年 4 月的调查中60% 的受访者认为,对于最近发生的重大停机事件,他们本可以通过更好的管理/流程或配置进行防止。而对于损失超过 100 万美元的故障事件,这一数字跃升至 74%。
  Uptime 认为,导致故障事件发生的最终的错误不一定是员工,而是令人失望的管理。这个行业仍然严重依赖于人工去完成一些最基础和最重要的工作,易受人为错误的影响,这一点无法避免,也许可做的防错/防灾措施很有限。“然而,对这些故障问题的快速调查发现,故障持续存在的主要原因不是人为失误,而是由于管理失误导致,如“针对员工培训投资不足,相关政策执行不力,管理程序老旧,低估一名合格员工的重要性”,这一系列的管理问题导致了故障停机。” Heslin 继续说道。
  Uptime 指出,公司的 IT 基础设施越复杂,特别是分布式特性基础设施,可能会越容易增加简单错误导致的业务中断风险。同时指出公司需要意识到基础设施越复杂所涉及的风险就越大。并警告说,在人员配备方面,不要以超过公司吸引和应用资源来管理基础设施的速度扩大关键 it 能力,并在影响关键任务操作之前意识到任何人员和技能短缺。
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3444762/the-biggest-risk-to-uptime-your-staff.html
作者:[Andy Patrizio][a]
选题:[lujun9972][b]
译者:[sthwhl](https://github.com/sthwhl)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Andy-Patrizio/
[b]: https://github.com/lujun9972
[1]: https://www.networkworld.com/article/3223692/what-is-a-data-centerhow-its-changed-and-what-you-need-to-know.html
[2]: https://www.networkworld.com/newsletters/signup.html
[3]: https://www.networkworld.com/article/3440100/take-the-intelligent-route-with-consumption-based-storage.html?utm_source=IDG&utm_medium=promotions&utm_campaign=HPE20773&utm_content=sidebar " Take the Intelligent Route with Consumption-Based Storage"
[4]: https://journal.uptimeinstitute.com/how-to-avoid-outages-try-harder/
[5]: https://www.facebook.com/NetworkWorld/
[6]: https://www.linkedin.com/company/network-world