[Translatied] Deciphering Top

This commit is contained in:
geekpi 2013-11-18 03:53:49 +00:00
parent 953d6ae16c
commit fde660d09b
2 changed files with 32 additions and 34 deletions

View File

@ -1,34 +0,0 @@
Translating-------------geekpi
Deciphering Top
================================================================================
When curious about the performance of a server, one of the first places I stop is "top". Top is not perfect, not by a long shot, but it does provide a decent point in time snapshot of the server, and attempts to answer the question of "what is going on right now?". Unfortunatly, the output of top can easily be misinterpreted if you do not have a good understanding of the different fields of data presented.
I'm not going to go through the [man page][2] for top, when you have the time and inclination it is always there waiting for you. What I would like to do is point out a few highlights of how I use it to get a quick overview of the system and hopefully get a direction I should go next. Top is often my first stop in troubleshooting, but it is rarely my only stop.
[![](http://farm4.staticflickr.com/3827/10847969205_c1b75f9fa2_m.jpg)][1]
The very first thing I look at in top is the load average, in the top right hand corner of the screen. The load average is computed based on a number of statistics gathered, but can generally be thought of as the amount of work the CPU is being asked to do. If your machine has a single CPU core, than a load average of one would mean that the machine was perfectly loaded and had sufficient power to accomplish all tasks during the time it was sampled. Likewise, if the load average is two, the single CPU machine was overloaded, and would have needed two available cores to accomplish the work it was being asked to do in the same amount of time. With todays 8, 16, and 32 core servers shipping, I need to think twice when considering the load average. If I need to check, I press "1" in top, which will drop down a list of all CPU cores so I can get a quick count for comparison.
The second item I check is the first process listed, and the ninth column over, labled "%CPU". The explanation for this column is novel:
> The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. In a true SMP environment, if 'Irix mode' is Off, top will operate in 'Solaris mode' where a task's cpu usage will be divided by the total number of CPUs. You toggle 'Irix/Solaris' modes with the 'I' interactive command.
Clear as mud, right? The main idea to keep in mind is that if a single process has gone berzerk for one reason or antoher, it will probably show up listed first in top, with a rather extreme number for %CPU.
The next area I glance at is the "Cpu(s):" line, in the center of the header block. Specifically, I'm interested in the %us, which is user processes, %sy, for system processes, %id, which is idle time, and %wa, which is the percent of time the CPU had processes that were waiting on a response from an I/O stream to execute. This percentage should always be close to zero, and anything higher than 5% should be looked at closer.
Lastly, I like to check the system up time, shown in the top left hand corner. If I'm having problems with a server, and the server was recently rebooted, there may be a correlation there, perhaps a daemon that didn't start.
All of these checks take only a few seconds. I may leave top running for a few minutes and watch the processes, CPU, and load if I'm just observing, but normally I'm in and out of top fairly quickly. Top is one of those fantastic sysadmin tools that is built to give you a quick overview of the health of your system, and allow you to quickly diagnose potential problems.
--------------------------------------------------------------------------------
via: http://ostatic.com/blog/deciphering-top
译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:http://www.flickr.com/photos/51724787@N06/10847969205/
[2]:http://www.linuxmanpages.com/man1/top.1.php

32
translated/Deciphering Top.md Executable file
View File

@ -0,0 +1,32 @@
解密"top"
================================================================================
当对服务器的性能好奇时,我第一个想到的是"top"命令。top不是最好的它不是一个长期的快照但是它提供了服务器的一个像样的时间点的快照并且尝试回答了问题"现在在发生什么?"。不幸地top的输出很容易被误解如果你没有很好地理解数据显示的不同域。
我不会完整解读top命令的[man page][2]当你和时间和意愿时它一直在那等着你。我想要做的是指出一些我如何得到系统的快速概况的几个要点并希望得到我接下来该怎么做。top是我在故障排除时的第一站但这很少是我唯一的一站。
[![](http://farm4.staticflickr.com/3827/10847969205_c1b75f9fa2_m.jpg)][1]
top命令我第一个看的是平均负载(load average),它在右上角的屏幕上。平均负载的计算是基于统计搜集的数量但是可以通常地认为是CPU被请求工作的数量。如果你的机器有一个单核CPU那么平均负载是1就意味着机器是被完全加载的并且有充分的能力去完成在采样时间内的任务。同样地如果平均负载是2那单核的CPU是超载的并需要2个可用内核去完成在同样采样时间内被要求完成的任务。随着8、16、32核的发售我会在考虑平均负载的时候再三考虑。如果我需要去检测我会在top里按了"1"这会列出所有CPU核列表这样我就可以得到一个快速计数用于比较。
我检查的第二项是第一步列出来的在第9列标记着"%CPU"。这一列的解释是新奇的:
> 任务共享上次屏幕刷新后的CPU运行时间以完全的CPU时间百分比表示。在一个完全SMP环境中如果'Irix mode'是关闭的top会在'Solaris mode'下操作这里一个任务的cpu使用率将被全部的CPU分割。你可以用'I'这个交互命令触发Irix或Solaris模式。
一点也不清楚,是么?这里要记住的主要意思是如果单个进程由于某个原因或者其他因素占用率升高,那么他很有可能会以%CPU很高的数字显示在top的第一行。
我下一个看到的区域是"Cpu(s):"这一行,这头部的中间。特别地,我对%us、sy%、%id、和%wa感兴趣它们分别是用户进程、系统进程、空闲时间和CPU用于等待I/O流执行的时间比例。这个百分比应该接近于0高于5%时需要密切关注。
最后我想要检测系统up时间这显示在左上角。如果我对一台服务器有疑问并且这台服务器最近重启过这里或许有一个关联或许是一个守护进程没有启动。
这些测试只需要几秒。如果我只是观察我可能让top运行几分钟并观察进程、CPU和负载但是通常地我很快地进入和退出top。top是给你一个系统健康概况的那些奇妙系统管理员工具之一并允许你快速诊断潜在的问题。
--------------------------------------------------------------------------------
via: http://ostatic.com/blog/deciphering-top
译者:[geekpi](https://github.com/geekpi) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:http://www.flickr.com/photos/51724787@N06/10847969205/
[2]:http://www.linuxmanpages.com/man1/top1.php