### 系统资源监控 ###



摘自[Collectl官网][1] ...

> 不同于或聚焦于一小组统计数据、采用唯一输出方式,或采用迭代、作为守护进程运行的大部分监测工具,collectl可以同时全部实现。用户可选择各种子系统中的任一系统来监测包括内存,CPU,磁盘,索引节点,无线带宽,lustre,内存,网络,网络文件系统,进程,二次型,slabs,套接口及TCP等信息。


    $ collectl
    waiting for 1 second sample...
    #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
       0   0   864   1772      0      0      0      0      0      1      0       0 
       5   2  1338   2734      0      0      8      2      0      0      0       1 
       1   0  1222   2647      0      0     92      3      0      2      0       1 
       1   0   763   1722      0      0     80      3      0      1      0       2


### 安装collectl工具 ###


    $ sudo apt-get install collectl


    $ yum install collectl

### 使用 ###

#### 必要的了解 - Collectl子系统 ####




    b - buddy info (内存片段)
    c - CPU
    d - Disk
    f - NFS V3 Data
    i - Inode and File System
    j - Interrupts
    l - Lustre
    m - Memory
    n - Networks
    s - Sockets
    t - TCP
    x - Interconnect
    y - Slabs (system object caches)

    C - CPU
    D - Disk
    E - Environmental data (fan, power, temp),  via ipmitool
    F - NFS Data
    J - Interrupts
    L - Lustre OST detail OR client Filesystem detail
    M - Memory node data, which is also known as numa data
    N - Networks
    T - 65 TCP counters only available in plot format
    X - Interconnect
    Y - Slabs (system object caches)
    Z - Processes


#### 1. 监测cpu使用率 ####


    $ collectl -sc
    waiting for 1 second sample...
    #cpu sys inter  ctxsw 
       3   0  1800   3729 
       3   0  1767   3599


       $ collectl -sC
    waiting for 1 second sample...

    #   Cpu  User Nice  Sys Wait IRQ  Soft Steal Idle
          0     3    0    0    0    0    0     0   96
          1     3    0    0    0    0    0     0   96
          2     2    0    0    0    0    0     0   97
          3     1    0    0    0    0    0     0   98
          0     2    0    0    0    0    0     0   97
          1     2    0    2    0    0    0     0   95
          2     1    0    0    0    0    0     0   98
          3     4    0    1    0    0    0     0   95


#### 2. 内存监测 ####


    $ collectl -sm
    waiting for 1 second sample...
    #Free Buff Cach Inac Slab  Map 
       2G 220M   1G   1G 210M   3G 
       2G 220M   1G   1G 210M   3G 
       2G 220M   1G   1G 210M   3G


    $ collectl -sM
    waiting for 1 second sample...


    # Node    Total     Used     Free     Slab   Mapped     Anon   Locked    Inact Hit%
         0    7975M    5939M    2036M  215720K  372184K        0    6652K    1434M    0
         0    7975M    5939M    2036M  215720K  372072K        0    6652K    1433M    0


#### 3. 查看磁盘使用情况 ####


    $ collectl -sd
    waiting for 1 second sample...
    #KBRead  Reads KBWrit Writes 
          4      1    136     24 
          0      0     80     13

    $ collectl -sD
    waiting for 1 second sample...

    # DISK STATISTICS (/sec)
    #          <---------reads---------><---------writes---------><--------averages--------> Pct
    #Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
    sda              0      0    0    0       0      0    0    0       0     0     0      0    0
    sda              0      0    0    0       0      0    0    0       0     0     0      0    0
    sda              1      0    2    1      17      1    5    3       2     2     6      2    1
    sda              0      0    0    0      92     11    5   18      18     1    12     12    5


    $ collectl -sd --verbose

#### 4. 同时报告多系统情况 ####


    $ collectl -scmd
    waiting for 1 second sample...
    #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes 
       4   0  2187   4334   1G 221M   1G   1G 210M   3G      0      0      0      0 
       3   0  1896   4065   1G 221M   1G   1G 210M   3G      0      0     20      5

#### 5. 显示统计时间 ####


    $ collectl -scmd -oT
    waiting for 1 second sample...
    #         <--------CPU--------><-----------Memory-----------><----------Disks----------->
    #Time     cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes 
    12:03:05    3   0  1961   4013   1G 225M   1G   1G 212M   3G      0      0      0      0 
    12:03:06    3   0  1884   3810   1G 225M   1G   1G 212M   3G      0      0      0      0 
    12:03:07    3   0  2011   4060   1G 225M   1G   1G 212M   3G      0      0      0      0


#### 6. 改变样本计数 ####


    $ collectl -c1 -sm
    waiting for 1 second sample...
    #Free Buff Cach Inac Slab  Map 
       1G 261M   1G   1G 228M   3G


    $ collectl -sm -i2
    waiting for 2 second sample...
    #Free Buff Cach Inac Slab  Map 
       1G 261M   1G   1G 229M   3G


#### 7. 像iotop一样使用collectl ####


    $ collectl --top iokb


    # TOP PROCESSES sorted by iokb (counters are /sec) 09:44:57
    # PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
     3104  enlighte 20  2683    3 S  938M   33M  0  0.00  0.00   0  00:09.16    0    4    0    0 /usr/bin/ktorrent 
        1  root     20     0    0 S   26M    3M  2  0.00  0.00   0  00:01.30    0    0    0    0 /sbin/init 
        2  root     20     0    0 S     0     0  3  0.00  0.00   0  00:00.00    0    0    0    0 kthreadd 
        3  root     20     2    0 S     0     0  0  0.00  0.00   0  00:00.02    0    0    0    0 ksoftirqd/0 
        4  root     20     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 kworker/0:0 
        5  root      0     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 kworker/0:0H 
        7  root     RT     2    0 S     0     0  0  0.00  0.00   0  00:00.08    0    0    0    0 migration/0 
        8  root     20     2    0 S     0     0  2  0.00  0.00   0  00:00.00    0    0    0    0 rcu_bh 
        9  root     20     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 rcuob/0



    $ collectl --top iokb,5


    $ collectl --showtopopts


      vsz    virtual memory
      rss    resident (physical) memory
      syst   system time
      usrt   user time
      time   total time
      accum  accumulated time
      rkb    KB read
      wkb    KB written
      iokb   total I/O KB
      rkbc   KB read from pagecache
      wkbc   KB written to pagecache
      iokbc  total pagecacge I/O
      ioall  total I/O KB (iokb+iokbc)
      rsys   read system calls
      wsys   write system calls
      iosys  total system calls
      iocncl Cancelled write bytes
    Page Faults
      majf   major page faults
      minf   minor page faults
      flt    total page faults
    Context Switches
      vctx   volunary context switches
      nctx   non-voluntary context switches
    Miscellaneous (best when used with --procfilt)
      cpu    cpu number
      pid    process pid
      thread total process threads (not counting main)
      numobj    total number of slab objects
      actobj    active slab objects
      objsize   sizes of slab objects
      numslab   number of slabs
      objslab   number of objects in a slab
      totsize   total memory sizes taken by slabs
      totchg    change in memory sizes
      totpct    percent change in memory sizes
      name      slab names

#### 8. 像top一样使用collectl ####


    $ collectl --top


    # TOP PROCESSES sorted by time (counters are /sec) 14:08:46
    # PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
     9471  enlighte 20  9102    0 R   63M   22M  3  0.03  0.10  13  00:00.81    0    0    0    3 /usr/bin/perl 
     3076  enlighte 20  2683    2 S  521M   40M  2  0.00  0.03   3  00:55.14    0    0    0    2 /usr/bin/yakuake 
     3877  enlighte 20  3356   41 S    1G  218M  1  0.00  0.03   3  10:10.50    0    0    0    0 /opt/google/chrome/chrome 
     4625  enlighte 20  2895   36 S    1G  241M  2  0.00  0.02   2  08:24.39    0    0    0   12 /usr/lib/firefox/firefox 
     5638  enlighte 20  3356    3 S    1G  265M  1  0.00  0.02   2  09:55.04    0    0    0    2 /opt/google/chrome/chrome 
     1186  root     20  1152    4 S  502M   76M  0  0.00  0.01   1  03:02.96    0    0    0    0 /usr/bin/X 
     1334  www-data 20  1329    0 S   87M    1M  2  0.00  0.01   1  00:00.85    0    0    0    0 nginx:


    $ collectl --top -scm

#### 9. 像ps一样列出进程 ####

    $ collectl -c1 -sZ -i:1

上面的命令将会列出类似“ps -e”命令的所有进程。“procfilt”用于从所有进程中过滤出特定的进程信息。“procopts”用于指定另一组微调进程列表显示的命令。

#### 10. 像vmstat一样使用collectl ####


    $ collectl --vmstat
    waiting for 1 second sample...
    #procs ---------------memory (KB)--------------- --swaps-- -----io---- --system-- ----cpu-----
    # r  b   swpd   free   buff  cache  inact active   si   so    bi    bo   in    cs us sy  id wa
      1  0      0  1733M   242M  1922M  1137M   710M    0    0     0   108 1982  3918  2  0  95  1
      1  0      0  1733M   242M  1922M  1137M   710M    0    0     0     0 1906  3886  1  0  98  0
      1  0      0  1733M   242M  1922M  1137M   710M    0    0     0     0 1739  3480  3  0  96  0

#### 11. 子系统的详细信息 ####


    $ collectl -sc -c5 -i1 --verbose -oT
    waiting for 1 second sample...

    #Time      User  Nice   Sys  Wait   IRQ  Soft Steal  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15 RunT BlkT
    14:22:10     11     0     0     0     0     0     0    87     4  1312   2691     0   866     1   0.78  0.86  0.78    1    0
    14:22:11     15     0     0     0     0     0     0    84     4  1283   2496     0   866     1   0.78  0.86  0.78    1    0
    14:22:12     17     0     0     0     0     0     0    82     4  1342   2658     0   866     0   0.78  0.86  0.78    0    0
    14:22:13     15     0     0     0     0     0     0    84     4  1241   2429     0   866     1   0.78  0.86  0.78    1    0
    14:22:14     11     0     0     0     0     0     0    88     4  1270   2488     0   866     0   0.80  0.87  0.78    0    0


### 总结 ###



Collectl同另一批可用于处理分析收集数据的名为[Collectl实用工具][2] (colmux, colgui, colplot)的功能相契合。如果有机会,我们在之后的文章中会介绍它们。




译者:[icybreaker]( 校对:[wxy](

本文由 [LCTT]( 原创翻译,[Linux中国]( 荣誉推出
