Network Performance Tuning

                                 1992.12.22

SUBJECT: Network Performance Tuning

STATUS OF THIS MEMO:
System Performance Tuning과 관련하여 NETWORK Performance의 측정을 하기위한
utility를 정리한 것임

CONTENTS:
1. UNIX Networking
2. Network Performance Issues
   1) Data Corruption on the network
   2) Gathering Network Integrity data from NFS
   3) Network and CPU Load
   4) Reducing the NFS Workload
   5) Timeout
   6) NFS Workload and Kernel Table Size
3. RFS: System V Remote File Sharing

DESCRIPTION:

1. UNIX Networking

   - TCP/IP networking은 BSD UNIX Release 4.1C에서 porting되었고
     Release 4.2에서 rlogin, rsh, rcp등의 network tool들이 포함되었다

   - Berkeley의 TCP/IP는 거의 모든 System V에 option으로 porting되었고

   - SUN의 NFS도 option으로 porting되었다

   - NFS와 TCP/IP는 System V Release 4에서 완전히 채택되었다

   - AT&T의 RFS(remote File System) facility도 System V.3와 최근의 SunOS에서 
     available하다

   - System V.4는 NFS와 함께 RFS도 포함

   - TCP/IP package는 BSD에서는 socket mechanism을 System V에서는 STREAMS
     facility를 채택

   ☞ Network performance는 시스템에 매우 중요한 영향을 끼치므로 가급적 모든
      workstation들은 local disk를 가지는것이 바람직하며

      만약 반드시 diskless workstation을 가져야 한다면 file server당 4-5
      시스템을 넘지않도록 구성하는것이 바람직하다

      만약 ls, cat, editor등의 간단한 utility를 수행할수 있는 superintelligent
      terminal을 접속시는 file server당 8대 정도가 적당하며

      file server당 workstation의 접속수는 15-20대가 적당하다.

2. Network Performance Issues

   network performance의 최적화를 위해 network은 다음 3가지 조건을 갖춰야 한다

   - 정확한 data의 전송이 이루어져야 한다

   - network user들의 요구에 부합하는 충분한bandwidth을 제공해야한다
     만약, bandwidth이 충분치못하면 두 point간의 전송시 매우많은 시간이 소요된다

   - network에 있는 각system들은 network traffic을 제어하기 위해 충분히
     빨라야 한다

   1) Data Corruption on the network

      - network problem을 간단히 diagnosing하기 위한 tool로 netstat -i가 있다
    
        % netstat -i 

        . system이 booting이후에 발생한 모든 input/output packet의 수등이
          report되는데

        . input-error나 output-error는 0.025%이하여야 하며
          collision이 10%에 근접하면 network에 overload가 초래된다

        . 다음은 netstat를 simulation하는 sample script이다
 
          #!/bin/sh
          # get a series of netstat reports & normalize
          # invoked as: program-name interval interface-name

          ( while true  # simulate vmstat behavior: one report every
            do          # interval seconds
                  sleep $1
                  netstat -i
            done ) | awk \
            "BEGIN { printf \"%12s%12s%12s%12s%12s\n\", \"New Ipkts\",\
                    \"New Ierrs\", \"New opkts\",\"New Oerrs\",\"collis\";
                    pipkts=0; pierrs=0; popkts=0; poerrs=0; pcollis=0
                   }
             # find the line describing the interface we care about
             /^$2/ { ipkts=\$5 - pipkts; ierrs=\$6 -pierrs; opkts=\$7 - popkts
                     oerrs=\$8 - poerrs; collis=\$9 - pcollis
                     printf \"%12d%12d%12d%12d%12d\n", ipkts, ierrs, opkts,\
                             oerrs, collis
                     pipkts=\$5; pierrs=\$6; popkts=\$7; poerrs=\$8;\
                             pcollis=\$9
                   }" -


      - gateway에 발생한 error의 근원을 발견하기위해  "netstat -s"를 사용할수 
        있으며 이는 ip, icmp, tcp, udp별로 전송된 data량및 발생된 error의수를
        report한다
 
        . gateway에 bad checksum error의 발생여부를 simulate하는 shell script

          # !/bin/sh
          # LOOK FOR ERRORS WHILE CROSSING GATEWAYS
          # invoke as program-name host1 host2 ... hostn
          bigfile=/vmunix       # pick any large file you want,
                                # the bigger the better
                                # or even better, create a file
                                # that's even larger
          netstst -s | grep "checksum"      # get initial report
          for host in $*
          do
              echo  "Testing copies from $host to $myself"
              rcp $bigfile $host:/tmp
              rcp $host:/tmp/$bigfile /dev/null
              netstat -s | grep "checksum"  # has anything changed?
              rsh $host "rm /tmp/$bigfile"
          done

   2) Gathering Network Integrity data from NFS

      nfsstat -c command를 사용하여 system의 client-side NFS statistics를
      report할수 있다

      % nfsstat -c

      Client rpc:
      calls    badcalls retrans  badxid   timeout  wait     newcred  timers
      15       0        0        0        0        0        0        0        
 
      Client nfs:
      calls      badcalls   nclget     nclsleep
      15         0          15         0          
      null       getattr    setattr    root       lookup     readlink   read    
      0  0%      4 26%      0  0%      0  0%      8 53%      0  0%      2 13%   
      wrcache    write      create     remove     rename     link       symlink 
      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%   
      mkdir      rmdir      readdir    fsstat     
      0  0%      0  0%      0  0%      1  6%   
   
      - retrans field는 이 host가 어떤 RPC client에 retransmit한 packet의 수를
        나타내며, 어떤 NFS file을 read/write할때 발생하는데
        만약 Client nfs call의 total수의 5%를 넘으면 심각한 문제가 있다

      - badxid field와 retrans filed를 비교하여 대략 같으면 network의 NFS
        server는 client의 요구에 대해 trouble을 가지고 있음을 의미한다

   3) Network and CPU Load

      CPU에 load가 많이 걸리면 network의 performance가 떨어지게 되는데 spray
      utility를 이용하여 system의 CPU를 ckeck할 수 있다

      % /etc/spray otherhost
       sending 1162 packets of lnth 86 to hyundai6 ...
               in 1.2 seconds elapsed time,
               53 packets (4.56%) dropped by hyundai6
       Sent:   942 packets/sec, 79.2K bytes/sec
       Rcvd:   899 packets/sec, 75.6K bytes/sec

       여기서 중요한 요소는 drop된 packet의 수인데 drop된수가 5% 이하의 적은
       수라면 문제가 없으나 그 수가 많다면 packet을 receive하는 otherhost보다
       더 빠르게 packet을 generate하는것을 나타내므로
       otherhost가 network에 반응할수 있도록 빠르지 못하며 CPU에 load가 많음을
       의미한다

   4) Reducing the NFS Workload

      NFS server의 workload를 줄일려면 client system의 /etc/fstab file을 
      수정하여 read와 write buffer size를 늘여주는것이 좋으며
      만약 두 시스템의 pagesize가 4096 byte라면 
    
      server:/remfs/dataspace /space nfs rw,hard,wsize=4096,rsize=4096 0 0

      시스템의 page-size는 "pagesize" command를 사용하여 확인할수 있으며
      rsize와 wsize는 remote filesystem에만 적용되며 local filesystem에
      사용해서는 안된다

   5) Timeout

      NFS client가 어떤 주어진 시간동안 NFS request에 대한 response를 받지 
      못하면 times out이 발생하며, 이는 NFS server에 load가 많이 걸려 충분히
      빠르게 NFS request를 처리해 주지못함을 의미한다

      이런경우 timeout period를 증가시켜 time out을 방지, reponse를 얻을수 
      있는데 /etc/fstab에 timeout period를 정의할수 있다

      server:/mf /mf nfs noquota,hard,bg,intr,timeo=15 0 0
             (이것은 timeout period가 1.5 second임을 의미)

      "nfsstat -c" command를 사용 timeout된 수를 check할수 있고 이때 call의
      수에 비해 5% 이상이 발생되면 problem을 가지고 있음을 의미한다

   6) NFS Workload and Kernel Table Size

      NFS와 관련 CPU의 performance에 영향를 주는 또하나의 요소가 있는데 
      NFS server의 file들은 다른 많은 system의 사용자에 의해 access 되므로
      non-NFS system보다 많은 kernel의 inode와 file table이 필요함을 의미하며
      부족시 system에 심각한 overhea를 끼친다
      
      BSD UNIX에서는 MAXUSERS configuration constant를 수정하여
      inode table을 포함한 중요한 table의 size를 늘여줄수 있으며

      System V에서는 ninodes나 inodes parameter의 setting을 증가시켜 kernel
      size fmf   

3. RFS: System V Remote File Sharing

   - RFS는 NFS에 비교되는 것으로 대부분의 System V에 option이며 System V.4에
     standard이다

   - NFS외 같이 각각의 filesystem를 network을 통해 mount할수 있게하여 user들이
     local에서와 같이 access할수 있도록 한다

   % sar -Du    RFS operation을 처리하기 위해 사용된 CPU의 %를 report

   % sar -Dc    system이 얼마나 많은 RFS request를 처리 했는가를 report

   % fusage     network structure의 balance를 check하는데 도움을 주는 tool로
                각 filesystem별로 local과 remote user들에 의해 access된 data의
                량을 summary하여 report

   % sar -Db    RFS Buffer usage를 report

   % sar -S     available한 server process의 수를 report
                server process는 NFS daemon과 같은 개념이며 MINSERVE와
                MAXSERVE configuration parameter에 의해 process수를 조절

   % netstat -m Special Considerations for STREAMS



Revision History
Created     on Dec  22 ,1992