Search This Blog

Monday, March 12, 2012

Tools & commands for analyzing server performance.

To check for disk space.

[root@london ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              26G   16G  8.3G  66% /
/dev/sda1              99M   12M   83M  12% /boot
tmpfs                 742M     0  742M   0% /dev/shm



To check for space consumed by an folder.

[root@canada ~]# du -s -h /u01/
5.9G    /u01/


To check for system bottlenecks use vmstat.

Usage:

vmstat 5

displays system resource usage every 5 seconds. Use ctrl + c to exit.

vmstat 5 10

displays system reource usage every 5 seconds for upto 10 reports.


[root@london ~]# vmstat 5 10

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 1003736  32756 369368    0    0   647    50  920  225  4 12 79  5  0
 0  0      0 1003736  32756 369368    0    0     0     0  892  132  0  0 99  0  0
 0  0      0 1003736  32764 369368    0    0     0     5  887  130  0  0 98  1  0
 0  0      0 1003736  32764 369368    0    0     0     0  884  128  0  0 99  0  0
 0  0      0 1003736  32764 369368    0    0     0     0  886  137  0  3 97  0  0
 0  0      0 1003736  32772 369368    0    0     0    13  894  131  0  0 100  0  0
 0  0      0 1003736  32772 369368    0    0     0     0  887  133  0  0 99  0  0
 0  0      0 1003736  32772 369368    0    0     0     0  888  141  0  0 100  0  0
 0  0      0 1003736  32772 369368    0    0     0     0  882  133  0  1 99  0  0
 0  0      0 1003736  32772 369368    0    0     0    12  892  133  0  0 100  0  0


Vmstat columns description.

r  number of processes in queue waiting for run time or for CPU resources.
   If this number exceeds the number of CPUs on the server, then that means
   there is CPU bottleneck.

b  The number of processes in uninterruptible sleep. (b=blocked queue, waiting
   for resource (e.g. filesystem I/O blocked, inode lock))

swpd  amount of virtual memory.
free  amount of idle memory.
buff  amount of buffer memory.
cache  amount of cache memory.

si  amount of memory swapped from disk per second.
    A swap in operation occurs when the server is experiencing
    a shortage of RAM memory.High value for this indicates shortage
    of RAM memory.

so  amount of memory swapped to disk per second.

bi  blocks read per second from disk.
bo  blocks written per second to disk.

in  number of interrupts per second.
cs  number of context switches per second.

us  CPU time running non-kernel code or servicing user tasks .
sy  CPU time running kernel code or servicing system tasks.
Id  CPU time idle.
wa  CPU time waiting for I/O Ex: disk I/O.
st  CPU time taken from virtual machine.

Interpret CPU utilization using vmstat.

1) us+sy is greater than or equal to 80% => Cpu is about to reach its load capacity.
2) us+sy is 100% => It means there is CPU bottleneck.
3) sy is high => application is making lot of system calls to kernel. 


Identify top processes that are consuming server resources.


[root@london ~]# top

top - 19:46:42 up  2:44,  2 users,  load average: 1.98, 0.88, 0.40
Tasks: 147 total,   6 running, 141 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.5%us, 64.2%sy, 16.1%ni,  0.0%id, 10.9%wa,  0.9%hi,  2.4%si,  0.0%st
Mem:   1518744k total,  1314932k used,   203812k free,   128716k buffers
Swap:  3068404k total,        0k used,  3068404k free,   957496k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                  
 5374 root      34  19  6272 3772  628 R 30.0  0.2   0:43.26 prelink                                  
 3880 root      15   0 37372  11m 5208 R  5.3  0.8   0:21.19 Xorg                                     
 4218 root      15   0 40716  13m 9332 R  1.3  0.9   0:02.32 gnome-terminal                           
  520 root      10  -5     0    0    0 D  0.7  0.0   0:01.23 kjournald                                
  244 root      15   0     0    0    0 S  0.2  0.0   0:00.23 pdflush                                  
 2968 root      15   0  5288 2608 2164 S  0.2  0.2   0:00.83 vmtoolsd                                 
 3478 haldaemo  15   0  6368 4444 1688 S  0.2  0.3   0:04.36 hald                                     
 4085 root      16   0  110m  15m  12m S  0.2  1.1   0:01.11 nautilus                                 
 9134 root      15   0  2332 1060  804 R  0.2  0.1   0:00.03 top                                      
    1 root      18   0  2072  656  560 S  0.0  0.0   0:01.61 init                                     
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0                              
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0                              
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                               
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 events/0                                 
    6 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khelper                                  
    7 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread                                  
   10 root      10  -5     0    0    0 S  0.0  0.0   0:00.46 kblockd/0 


In the first row CPU load average is shown in intervals of 1 min, 5 min and 15 minutes.
Only the 5 and 15 min load averages are important. if in a single core system
the 5 and specially 15 min load averages increase beyond the mark of "1.00" then better
eliminate unwanted processes as it means the load on the system is reaching undesirable levels.
 
Cloumn description for "TOP".

PID  Unique process identifier
USER  OS username that is runing the process.
PR  Priority of the process.
NI  Nice value('-'= high priority, '+' = low priority)
VIRT  total virtual memory used by processes.
RES  Non swapped physical memory used.
SHR  shared memory used by process.
S  Process status
%CPU  Percent of CPU consumption since last screen refresh.
%MEM  Percent of physical memory consumed by process.
TIME+  Toatal CPU time, showing hundredths of seconds.



Identify CPU and Memory consuming resorces.

Top processes consuming CPU.

[root@london ~]#  ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head 

21.0  5027 oracle   ?        ora_m000_dup
 5.5 11313 root     ?        /bin/bash /usr/sbin/makewhatis -w
 0.3  8849 oracle   ?        ora_mman_dup
 0.2  3880 root     tty7     /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
 0.1  9023 oracle   ?        ora_cjq0_dup
 0.1  8861 oracle   ?        ora_mmon_dup
 0.1  8857 oracle   ?        ora_smon_dup
%CPU   PID USER     TT       COMMAND
 0.0  9037 oracle   ?        ora_q001_dup
 0.0  9035 oracle   ?        ora_q000_dup


Top processes consuming memory.

[root@london ~]# ps -e -o pmem,pid,user,tty,args | sort -n -k 1 -r | head 

 4.7  8857 oracle   ?        ora_smon_dup
 3.6  8861 oracle   ?        ora_mmon_dup
 3.2  9023 oracle   ?        ora_cjq0_dup
 2.1  8853 oracle   ?        ora_lgwr_dup
 1.9  8949 oracle   ?        ora_arc3_dup
 1.9  8947 oracle   ?        ora_arc2_dup
 1.9  8945 oracle   ?        ora_arc1_dup
 1.9  8936 oracle   ?        ora_arc0_dup
 1.8  9035 oracle   ?        ora_q000_dup
 1.7 12644 oracle   ?        ora_w000_dup

Shortcut trick: create alias for above commands and then use it.
                It is something similar to synonyms in database.


alias topc='ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head' 
alias topm='ps -e -o pmem,pid,user,tty,args | sort -n -k 1 -r | head'

Next time instead of typing the full command just use the alias.

Example:

[root@london ~]# topc

 5.7 11313 root     ?        /bin/bash /usr/sbin/makewhatis -w
 0.2  8849 oracle   ?        ora_mman_dup
 0.2  3880 root     tty7     /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
 0.1  8861 oracle   ?        ora_mmon_dup
 0.1  8857 oracle   ?        ora_smon_dup
%CPU   PID USER     TT       COMMAND
 0.0  9037 oracle   ?        ora_q001_dup
 0.0  9035 oracle   ?        ora_q000_dup
 0.0  9023 oracle   ?        ora_cjq0_dup
 0.0  8988 oracle   ?        ora_qmnc_dup


[root@london ~]# topm

 4.8  8857 oracle   ?        ora_smon_dup
 3.6  8861 oracle   ?        ora_mmon_dup
 3.2  9023 oracle   ?        ora_cjq0_dup
 2.1  8853 oracle   ?        ora_lgwr_dup
 1.9  8949 oracle   ?        ora_arc3_dup
 1.9  8947 oracle   ?        ora_arc2_dup
 1.9  8945 oracle   ?        ora_arc1_dup
 1.9  8936 oracle   ?        ora_arc0_dup
 1.8  9035 oracle   ?        ora_q000_dup
 1.7  8851 oracle   ?        ora_dbw0_dup


Identify I/O problems.

iostat  10

--The above command shows device statistics every 10 seconds.

[root@london ~]# iostat 10

Linux 2.6.18-164.el5 (london)   03/11/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.96    1.42   11.95    4.54    0.00   81.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              11.36       279.51       213.89    2322996    1777614
sda1              0.02         0.24         0.00       1992         22
sda2             11.31       279.03       213.88    2319010    1777592
sda3              0.02         0.19         0.00       1618          0

Column description.

Device  partition name.
tps  I/O transfers per second to the device.
Blk_read/s Blocks read per second from the device.
Blk_wrtn/s Blocks written per second to the device.
Blk_read Number of blocks read.
Blk_wrtn Number of blocks written.

iostat with extended statistics.

iostat -xd 5 ------------------->(x is for extended statistics where d is for disk only statistics)


Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              43.37  4462.65 65.46 89.56 17712.45 36642.57   350.63     3.25   20.98   5.14  79.74
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2             39.96  4462.65 63.86 89.56 17672.29 36642.57   354.04     3.24   21.13   5.20  79.74
sda3              3.41     0.00  1.61  0.00    40.16     0.00    25.00     0.01    6.12   3.62   0.58
hdc               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00


rrqm/s and wrqm/s The number of merged read and write requests queued per second. 
“Merged” means the operating system took multiple logical requests and grouped them
into a single request to the actual device.

r/s and w/s The number of read and write requests sent to the device per second.

rsec/s and wsec/s The number of sectors read and written per second. Some systems

also output rkB/s and wkB/s, the number of kilobytes read and written per second.

avgrq-sz The request size in sectors.

avgqu-sz The number of requests waiting in the device’s queue.

await The number of milliseconds required to respond to requests, including queue time 
and service time. Unfortunately, iostat doesn’t show separate service time statistics 
for read and write requests, which are so different that they really shouldn’t be averaged 
together. However, you can probably chalk up high I/O waits to reads, because writes can 
often be buffered but reads usually have to be served directly from the spindles.

svctm The number of milliseconds spent servicing requests, from beginning to end, including 
queue time and the time the device actually takes to fulfill the request.

%util The percentage of CPU time during which requests were issued. This really shows 
the device utilization, as the name implies, because when the value approaches 100%, the device is saturated.



Out of these following columns are really important

"await","%util", "avgqu-sz"

In the above example sda2 has the highest I/O utilization of 79.74%, with avg wait of 21.13
and number of requests waiting in the device queue 3.24. 


To identify number of processors in linux.

[root@canada ~]# cat /proc/cpuinfo|grep processor|wc -l

1

Check Memory Usage.

[root@canada sa]# free -m
             total       used       free     shared    buffers     cached
Mem:          1010        749        260          0        121        488
-/+ buffers/cache:        140        870
Swap:         2047          0       2047


In this the 2nd row gives you the accurate picture of the actual memory
utilization.

The 3rd row gives you the details regarding swap memory utilization.

Command or utility to print directory structure.

Using ls -R

[root@canada /]# ls -R /u02 | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
   /u02
   |-app
   |---oracle
   |-----product
   |-------10.2.0
   |---------db_1

using tree command

[root@canada /]# tree /u02
/u02
`-- app
    `-- oracle
        `-- product
            `-- 10.2.0
                `-- db_1

5 directories, 0 files


0 comments:

Post a Comment