Merge pull request #6558 from haoqixu/master

【翻译完成】Sysadmin 101: Patch Management
This commit is contained in:
Xingyu.Wang 2017-12-10 09:07:10 +08:00 committed by GitHub
commit c1e2a7cc71
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 54 additions and 61 deletions

View File

@ -1,61 +0,0 @@
【翻译中 @haoqixu】Sysadmin 101: Patch Management
============================================================
* [HOW-TOs][1]
* [Servers][2]
* [SysAdmin][3]
A few articles ago, I started a Sysadmin 101 series to pass down some fundamental knowledge about systems administration that the current generation of junior sysadmins, DevOps engineers or "full stack" developers might not learn otherwise. I had thought that I was done with the series, but then the WannaCry malware came out and exposed some of the poor patch management practices still in place in Windows networks. I imagine some readers that are still stuck in the Linux versus Windows wars of the 2000s might have even smiled with a sense of superiority when they heard about this outbreak.
The reason I decided to revive my Sysadmin 101 series so soon is I realized that most Linux system administrators are no different from Windows sysadmins when it comes to patch management. Honestly, in some areas (in particular, uptime pride), some Linux sysadmins are even worse than Windows sysadmins regarding patch management. So in this article, I cover some of the fundamentals of patch management under Linux, including what a good patch management system looks like, the tools you will want to put in place and how the overall patching process should work.
### What Is Patch Management?
When I say patch management, I'm referring to the systems you have in place to update software already on a server. I'm not just talking about keeping up with the latest-and-greatest bleeding-edge version of a piece of software. Even more conservative distributions like Debian that stick with a particular version of software for its "stable" release still release frequent updates that patch bugs or security holes.
Of course, if your organization decided to roll its own version of a particular piece of software, either because developers demanded the latest and greatest, you needed to fork the software to apply a custom change, or you just like giving yourself extra work, you now have a problem. Ideally you have put in a system that automatically packages up the custom version of the software for you in the same continuous integration system you use to build and package any other software, but many sysadmins still rely on the outdated method of packaging the software on their local machine based on (hopefully up to date) documentation on their wiki. In either case, you will need to confirm that your particular version has the security flaw, and if so, make sure that the new patch applies cleanly to your custom version.
### What Good Patch Management Looks Like
Patch management starts with knowing that there is a software update to begin with. First, for your core software, you should be subscribed to your Linux distribution's security mailing list, so you're notified immediately when there are security patches. If there you use any software that doesn't come from your distribution, you must find out how to be kept up to date on security patches for that software as well. When new security notifications come in, you should review the details so you understand how severe the security flaw is, whether you are affected and gauge a sense of how urgent the patch is.
Some organizations have a purely manual patch management system. With such a system, when a security patch comes along, the sysadmin figures out which servers are running the software, generally by relying on memory and by logging in to servers and checking. Then the sysadmin uses the server's built-in package management tool to update the software with the latest from the distribution. Then the sysadmin moves on to the next server, and the next, until all of the servers are patched.
There are many problems with manual patch management. First is the fact that it makes patching a laborious chore. The more work patching is, the more likely a sysadmin will put it off or skip doing it entirely. The second problem is that manual patch management relies too much on the sysadmin's ability to remember and recall all of the servers he or she is responsible for and keep track of which are patched and which aren't. This makes it easy for servers to be forgotten and sit unpatched.
The faster and easier patch management is, the more likely you are to do it. You should have a system in place that quickly can tell you which servers are running a particular piece of software at which version. Ideally, that system also can push out updates. Personally, I prefer orchestration tools like MCollective for this task, but Red Hat provides Satellite, and Canonical provides Landscape as central tools that let you view software versions across your fleet of servers and apply patches all from a central place.
Patching should be fault-tolerant as well. You should be able to patch a service and restart it without any overall down time. The same idea goes for kernel patches that require a reboot. My approach is to divide my servers into different high availability groups so that lb1, app1, rabbitmq1 and db1 would all be in one group, and lb2, app2, rabbitmq2 and db2 are in another. Then, I know I can patch one group at a time without it causing downtime anywhere else.
So, how fast is fast? Your system should be able to roll out a patch to a minor piece of software that doesn't have an accompanying service (such as bash in the case of the ShellShock vulnerability) within a few minutes to an hour at most. For something like OpenSSL that requires you to restart services, the careful process of patching and restarting services in a fault-tolerant way probably will take more time, but this is where orchestration tools come in handy. I gave examples of how to use MCollective to accomplish this in my recent MCollective articles (see the December 2016 and January 2017 issues), but ideally, you should put a system in place that makes it easy to patch and restart services in a fault-tolerant and automated way.
When patching requires a reboot, such as in the case of kernel patches, it might take a bit more time, but again, automation and orchestration tools can make this go much faster than you might imagine. I can patch and reboot the servers in an environment in a fault-tolerant way within an hour or two, and it would be much faster than that if I didn't need to wait for clusters to sync back up in between reboots.
Unfortunately, many sysadmins still hold on to the outdated notion that uptime is a badge of pride—given that serious kernel patches tend to come out at least once a year if not more often, to me, it's proof you don't take security seriously.
Many organizations also still have that single point of failure server that can never go down, and as a result, it never gets patched or rebooted. If you want to be secure, you need to remove these outdated liabilities and create systems that at least can be rebooted during a late-night maintenance window.
Ultimately, fast and easy patch management is a sign of a mature and professional sysadmin team. Updating software is something all sysadmins have to do as part of their jobs, and investing time into systems that make that process easy and fast pays dividends far beyond security. For one, it helps identify bad architecture decisions that cause single points of failure. For another, it helps identify stagnant, out-of-date legacy systems in an environment and provides you with an incentive to replace them. Finally, when patching is managed well, it frees up sysadmins' time and turns their attention to the things that truly require their expertise.
______________________
Kyle Rankin is senior security and infrastructure architect, the author of many books including Linux Hardening in Hostile Networks, DevOps Troubleshooting and The Official Ubuntu Server Book, and a columnist for Linux Journal. Follow him @kylerankin
--------------------------------------------------------------------------------
via: https://www.linuxjournal.com/content/sysadmin-101-patch-management
作者:[Kyle Rankin ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linuxjournal.com/users/kyle-rankin
[1]:https://www.linuxjournal.com/tag/how-tos
[2]:https://www.linuxjournal.com/tag/servers
[3]:https://www.linuxjournal.com/tag/sysadmin
[4]:https://www.linuxjournal.com/users/kyle-rankin

View File

@ -0,0 +1,54 @@
系统管理 101补丁管理
============================================================
就在之前几篇文章,我开始了“系统管理 101”系列文章用来记录现今许多初级系统管理员DevOps 工程师或者“全栈”开发者可能不曾接触过的一些系统管理方面的基本知识。按照我原本的设想,该系列文章已经是完结了的。然而后来 WannaCry 恶意软件出现并在补丁管理不善的 Windows 主机网络间爆发。我能想象到那些仍然深陷 2000 年代 Linux 与 Windows 争论的读者听到这个消息可能已经面露优越的微笑。
我之所以这么快就决定再次继续“系统管理 101”文章系列是因为我意识到在补丁管理方面一些 Linux 系统管理员和 Windows 系统管理员没有差别。实话说,在一些方面甚至做的更差(特别是以运行时间为豪)。所以,这篇文章会涉及 Linux 下补丁管理的基础概念,包括良好的补丁管理该是怎样的,你可能会用到的一些相关工具,以及整个补丁安装过程是如何进行的。
### 什么是补丁管理?
我所说的补丁管理,是指你部署用于升级服务器上软件的系统,不仅仅是把软件更新到最新最好的前沿版本。即使是像 Debian 这样为了“稳定性”持续保持某一特定版本软件的保守派发行版,也会时常发布升级补丁用于修补错误和安全漏洞。
当然,因为开发者对最新最好版本的需求,你需要派生软件源码并做出修改,或者因为你喜欢给自己额外的工作量,你的组织可能会决定自己维护特定软件的版本,这时你就会遇到问题。理想情况下,你应该已经配置好你的系统,让它在自动构建和打包定制版本软件时使用其它软件所用的同一套持续集成系统。然而,许多系统管理员仍旧在自己的本地主机上按照维基上的文档(但愿是最新的文档)使用过时的方法打包软件。不论使用哪种方法,你都需要明确你所使用的版本有没有安全缺陷,如果有,那必须确保新补丁安装到你定制版本的软件上了。
### 良好的补丁管理是怎样的
补丁管理首先要做的是检查软件的升级。首先,对于核心软件,你应该订阅相应 Linux 发行版的安全邮件列表,这样才能第一时间得知软件的安全升级情况。如果你使用的软件有些不是来自发行版的仓库,那么你也必须设法跟踪它们的安全更新。一旦接收到新的安全通知,你必须查阅通知细节,以此明确安全漏洞的严重程度,确定你的系统是否受影响,以及安全补丁的紧急性。
一些组织仍在使用手动方式管理补丁。在这种方式下,当出现一个安全补丁,系统管理员就要凭借记忆,登录到各个服务器上进行检查。在确定了哪些服务器需要升级后,再使用服务器内建的包管理工具从发行版仓库升级这些软件。最后以相同的方式升级剩余的所有服务器。
手动管理补丁的方式存在很多问题。首先,这么做会使补丁安装成为一个苦力活,安装补丁需要越多人力成本,系统管理员就越可能推迟甚至完全忽略它。其次,手动管理方式依赖系统管理员凭借记忆去跟踪他或她所负责的服务器的升级情况。这非常容易导致有些服务器被遗漏而未能及时升级。
补丁管理越快速简便,你就越可能把它做好。你应该构建一个系统,用来快速查询哪些服务器运行着特定的软件,以及这些软件的版本号,而且它最好还能够推送各种升级补丁。就个人而言,我倾向于使用 MCollective 这样的编排工具来完成这个任务,但是红帽提供的 Satellite 以及 Canonical 提供的 Landscape 也可以让你在统一的管理接口查看服务器上软件的版本信息,并且安装补丁。
补丁安装还应该具有容错能力。你应该具备在不下线的情况下为服务安装补丁的能力。这同样适用于需要重启系统的内核补丁。我采用的方法是把我的服务器划分为不同的高可用组lb1app1,rabbitmq1 和 db1 在一个组而lb2app2,rabbitmq2 和 db2 在另一个组。这样,我就能一次升级一个组,而无须下线服务。
所以,多快才能算快呢?对于少数没有附带服务的软件,你的系统最快应该能够在几分钟到一小时内安装好补丁(例如 bash 的 ShellShock 漏洞)。对于像 OpenSSL 这样需要重启服务的软件,以容错的方式安装补丁并重启服务的过程可能会花费稍多的时间,但这就是编排工具派上用场的时候。我在最近的关于 MCollective 的文章中(查看 2016 年 12 月和 2017 年 1 月的工单)给了几个使用 MCollective 实现补丁管理的例子。你最好能够部署一个系统,以具备容错性的自动化方式简化补丁安装和服务重启的过程。
如果补丁要求重启系统,像内核补丁,那它会花费更多的时间。再次强调,自动化和编排工具能够让这个过程比你想象的还要快。我能够在一到两个小时内在生产环境中以容错方式升级并重启服务器,如果重启之间无须等待集群同步备份,这个过程还能更快。
不幸的是,许多系统管理员仍坚信过时的观点,把运行时间作为一种骄傲的象征——鉴于紧急内核补丁大约每年一次。对于我来说,这只能说明你没有认真对待系统的安全性。
很多组织仍然使用无法暂时下线的单点故障的服务器,也因为这个原因,它无法升级或者重启。如果你想让系统更加安全,你需要去除过时的包袱,搭建一个至少能在深夜维护时段重启的系统。
基本上,快速便捷的补丁管理也是一个成熟专业的系统管理团队所具备的标志。升级软件是所有系统管理员的必要工作之一,花费时间去让这个过程简洁快速,带来的好处远远不止是系统安全性。例如,它能帮助我们找到架构设计中的单点故障。另外,它还帮助鉴定出环境中过时的系统,给我们替换这些部分提供了动机。最后,当补丁管理做得足够好,它会节省系统管理员的时间,让他们把精力放在真正需要专业知识的地方。
______________________
Kyle Rankin 是高级安全与基础设施架构师,其著作包括: Linux Hardening in Hostile NetworksDevOps Troubleshooting 以及 The Official Ubuntu Server Book。同时他还是 Linux Journal 的专栏作家。
--------------------------------------------------------------------------------
via: https://www.linuxjournal.com/content/sysadmin-101-patch-management
作者:[Kyle Rankin ][a]
译者:[haoqixu](https://github.com/haoqixu)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linuxjournal.com/users/kyle-rankin
[1]:https://www.linuxjournal.com/tag/how-tos
[2]:https://www.linuxjournal.com/tag/servers
[3]:https://www.linuxjournal.com/tag/sysadmin
[4]:https://www.linuxjournal.com/users/kyle-rankin